Here’s an interesting recent post about ChatGPT’s impact on the environment, which makes for good food for thought. (Spoiler: the article claims it’s essentially negligible.)
-
LLMs and the environment
-
Stephen’s neural-nets-in-torch primer posted
Here’s my primer on using the basic neural net functionality in PyTorch. Enjoy!
-
A penny (+5XP, actually) for your thoughts
You can earn up to +5XP (or maybe even +10XP if what you write is exceptionally good) by reading and responding to this blog post. You don’t need to do anything to “turn it in” per se; I’ll automatically notice the comment coming in via the comment form. Do use some semblance of either your real name, or your class screen name, though.
Quality beats quantity: I’m not looking for a novel, but rather thoughtful and meaty thoughts. The first reply on there is from my daughter Lizzy; if you want a benchmark, I would call her comments “exceptionally good” and would award her the higher total if she were in the class.
This is due one week from today (Thu Nov 13 at midnight).
-
Accuracy metrics: the binary vs multiclass case
One thing I didn’t make sufficiently clear (and which our in-class multiclass XP example unfortunately probably didn’t help) is how metrics are treated differently for binary classification vs. multiclass classification.
Here’s the deal. Whenever you perform a classification task, you have one of the following two scenarios:
- Binary. You have only one “thing” you’re trying to detect. Example: you’re detecting “politically polarized texts.” (Everything else is a “not-politically-polarized text.”)
- Multiclass. You have multiple “things” you’re trying to detect. Example: you’re detecting whether a Federalist Paper was authored by Hamilton, Madison, or Jay.
In the binary case, one normally designates one of the two options as the “primary option” (for instance, “politically-polarized”) and computes precision, recall, and F1-score based on only that primary option. One does not normally compute precision/recall/F1-score for “politically-polarized” and also precision/recall/F1-score for “not politically polarized” and then use micro- or macro-averaging.
The only time you need to (and should) use micro/macro-averaging is in the multiclass case, when you have more than two labels you’re classifying everything in. Then, the only real way to take into account “how well do I do in identifying Hamilton? Madison? Jay?” is to compute three separate precision/recall/F1-scores and average them.
It’s quite possible that I didn’t make this sufficiently clear, and that the fact that we did a multiclass example in lecture reinforced the idea that you always needed to compute separate metrics and average them, even in the binary case.
All this to say: if on Quiz #3 — which had a binary classification example (“passive-aggressive” or not) — you did the multiclass technique of computing scores for “passive-aggressive” and “non-passive-aggressive” separately and then averaging them, I will forgive this venial sin and give you your points back for that. If this is the case, please send me an email with the number of XP you missed for that reason and I’ll post on the scoreboard.
-
Book version!
Gah! Somebody just pointed out to me that Dan and Jim, on their website, have a new version of the book available. This is great but the page numbers DO NOT match the ones I’m using!
So please keep up with the specified reading pages in the version I have posted in the link above, not on the updated version!
-
Go team!
Quiz #3 has now been graded and recorded on the scoreboard. You’ll be delighted to know that we had a perfect 21 out of 21 correct Honor Pledges, so everyone has earned the XP I promised. Keep it up!!
(Btw, if you still think some part of your quiz was graded in error, please let me know and explain your reasoning.)
-
Don’t freak out about the quiz answers
There’s more than one legitimate way to compute some of the things on quiz #3, and I will be awarding credit to any of those legitimate ways. So don’t freak out if your initial quiz score computed by Canvas is lower than you think it should be. I’ll be doing systematic, surgical regrades after everyone submits.
-
Homework #4 (team assignment) posted!
Homework #4 has finally been posted, and is due a week from Wednesday. (Hint: DO! NOT! PROCRASTINATE! on this one! There are a lot of moving parts.)
As announced in class, this is a team assignment of sorts. If you wish, your whole team of 3 or 4 can turn in just one submission. Or, if you prefer, you can simply share corpora with each other but each write your own software yourself. It’s up to you.
Finally, if you do not already have a team, and want a team, but you don’t know who to ask, email me and I will assign you to a team.
Good luck!
-
Performance metrics code posted
The code that computes classification performance metrics like accuracy, precision, recall, F-measure (Fβ) and the macro/micro-averaged versions of these, has been posted to the class github repo in the file measures.py.
-
TF-IDF binary classifier code posted
I have posted to the class repo the IMDB movie review classification code from the last few days in class:
- wordcount_encoder.py
- train_movies.py
- eval_movies.py
- interact_movies.py
These four files work together to accomplish all the wonderful things we wrote and demo’d together in class. They are also a key starting point for your upcoming homework #4, should you choose to use them.

