Natural Language Processing – Page 3

Performance metrics code posted

The code that computes classification performance metrics like accuracy, precision, recall, F-measure (F_β) and the macro/micro-averaged versions of these, has been posted to the class github repo in the file measures.py.

Oct 25
TF-IDF binary classifier code posted
I have posted to the class repo the IMDB movie review classification code from the last few days in class:

wordcount_encoder.py

train_movies.py

eval_movies.py

interact_movies.py

These four files work together to accomplish all the wonderful things we wrote and demo’d together in class. They are also a key starting point for your upcoming homework #4, should you choose to use them.
Oct 22
Quiz #3 posted!

Quiz #3 has been posted, and is due Monday at midnight! We’ll have covered all the material for this quiz by Thursday the 23rd.

Oct 21
It’s the Jacobian, not the Hessian

I misspoke today in response to Garrett’s question about a vector-valued loss function (instead of a scalar loss function). If your loss (or any other) function is a vector of values, then computing the partial derivative of each of those values with respect to each of those inputs is called the Jacobian matrix. It’s normally denoted as \( J_{\!f}(x) \), and its entries are \( J_{{\!f}_{ij}}(x) = \frac{\partial f_i}{\partial x_j} \).

The Hessian matrix, \( H_{\!f}(x) \), is actually similar to the Jacobian but has second-order partial derivatives. (In other words, its entries are \( H_{{\!f}_{ij}}=\frac{\partial^2 f}{\partial x_i\,\partial x_j} \).)

Oct 9
Today’s code posted

I have pushed to the class github repo our code from today (see the file demo_autodiff.py.)

Btw, I may have completely forgotten to mention the name of the awesome algorithm used to systematically back-compute the partial derivatives of the loss function with respect to all the model inputs. It is called autodiff. In a humorous twist, the people at Meta who developed PyTorch apparently misheard the name and thought it was “autograd” (which makes sense, actually, since the gradient is precisely the vector containing all those partial derivatives) and so you will see references throughout the PyTorch docs to “autograd.” I prefer to use the original name.

Oct 9
XP cards

I forgot to say that I would accept people’s XP cash cards today as a mid-semester cash out. So how about this: if you want to cash in your cards mid-semester, you can do so next Thursday (right after fall break).

(Note that if you don’t turn them in mid-semester, there is no grade disadvantage: it just means the points won’t appear on the scoreboard until December, and it means that you have to keep track of your cards for that much longer.)

Oct 9
Quiz #2 posted!

Quiz #2 has been posted, and is open-Python and timed at 60 minutes.

So as not to rush anybody, I made it due on Oct. 15th instead of Oct. 10th. But we’ve already covered everything needed for the quiz.

Oct 7
logreg.py (and logreg_distilled.py) posted

In the class git repo.

Oct 7
Office hours time change — 10/7

On Tuesday the 7th, my office hours will be 1:30-3:30pm instead of the normal 12-2pm.

Oct 6
Homework #3 errors

A student has just pointed out a couple errors in pytorch_practice.py, which are now fixed. If you’ve already git pulled (or copied the contents of that file some other way), then git pull again (or re-copy).

Oct 5

Performance metrics code posted

TF-IDF binary classifier code posted

Quiz #3 posted!

It’s the Jacobian, not the Hessian

Today’s code posted

XP cards

Quiz #2 posted!

logreg.py (and logreg_distilled.py) posted

Office hours time change — 10/7

Homework #3 errors