I have pushed to the class github repo our code from today (see the file demo_autodiff.py.)
Btw, I may have completely forgotten to mention the name of the awesome algorithm used to systematically back-compute the partial derivatives of the loss function with respect to all the model inputs. It is called autodiff. In a humorous twist, the people at Meta who developed PyTorch apparently misheard the name and thought it was “autograd” (which makes sense, actually, since the gradient is precisely the vector containing all those partial derivatives) and so you will see references throughout the PyTorch docs to “autograd.” I prefer to use the original name.

