DATA 101 – Introduction to Data Science

Anticlimax

Well, kind of a downer to have the last day of class canceled, but it is what it is. (I was born and raised in Colorado, and the amount of snow we got today honestly wouldn’t even be considered “snow.” But alas, Virginians seem to get in a tizzy over a dusting like this…)

We’ll just cancel the last quiz (leaving 11 for the semester). I’ll send you an email later today with all your quiz scores, just so you can confirm I didn’t miss anything in my data collection.

I’ll see you for the final exam on Monday at 3:30pm!

Dec 5
scikit-learn Decision Tree code posted

From class this week, the Python code that uses the sklearn package to induce decision trees on the election data set.

Dec 3
Office hours switch

Tomorrow (Tuesday Dec 2nd) I will have office hours on Zoom instead of in person, and I will hold it at this revised time: 1-3pm.

If you’d like to “attend” office hours, just send me an email during that time period and I’ll reply with a link to the Zoom chat.

Dec 1
No class Monday 11/17

As announced in class on Friday, we will not have class on Monday the 17th, due to an unbreakable engagement Stephen has committed to. See you on Wednesday!

Nov 14

Generating a practice problem

If you’d like another practice problem to get ready for your perfect 10 on tomorrow’s quiz, go to ChatGPT (or other generative AI) and paste in this prompt:

here's a dataset my class used to help us learn how to perform decision
tree induction: 

caffeine_cups,free_hours_sat,dorm,attends_movie
0,6,Eagle,Yes 
4,5,Jefferson,Yes
2,4,Randolph,No
1,7,Eagle,Yes
2,3,Jefferson,No
3,2,Randolph,No
2,1,Randolph,Yes
5,8,Eagle,Yes
0,1,Jefferson,No
3,4,Randolph,No
1,6,Jefferson,Yes

we were to split each numeric attribute only once (splitting into high and 
low groups) considering all possible split points. the target label is the 
last one (attends_movie). the decision tree algorithm we used is greedy, and 
uses "# of examples correct if we stop branching there" as the metric for 
determining what should go at a node (not entropy or information gain).

for the example above, it turned out that splitting free_hours_sat between 
1-4 and 5-8 was the best feature to put at the root of the tree, since that 
got 10 out of 11 examples correct just with one feature.

can you please make up another data set for me so I can practice this 
decision tree induction for an upcoming quiz? also tell me what "the right 
answer" is (i.e., what node should be at the root, at each of the branches,
etc., when using the greedy decision tree induction algorithm.)

Then, see how well you do on whatever data it gives you.

Nov 13

Extra credit opportunities
To claim one-half-of-one-quiz’s worth of extra credit, complete either of the following two items:

Sign up (by emailing me) for one of the experiment times listed here. Scroll to the second page to see the days/times. This experiment will take place in Farmer 022 (just down the hall from class) and will take no more than one hour of your time. (Be sure to show up on time!)

By the last day of class, write a one-page, single-spaced, well-researched and documented essay on the current state of political polarization in social media. You should cite at least two reputable academic sources on the topic, and your essay should synthesize their findings and present an overview of how this important subset of online communication has developed over the recent years. This essay is due to me by hardcopy on the last day of class.
Nov 12
Auto-decision-tree induction code posted
Forgot to post our code from the other day:

dt.py — the code that automatically searches for the best root to split on, and the best branches at level two of the decision tree

election.csv — the toy dataset we ran this on
Nov 12
Lab #6 posted!

Lab #6 has been posted, and is ready for your perusal!

https://colab.research.google.com/github/divilian/data101_lab6/blob/main/lab6.ipynb

Nov 7
Lab aide hours cancelled

FYI: lab aide hours this Sunday, Nov 2, and Tues, Nov 4, have been cancelled.

Oct 29
Lab #5 partners

If you don’t have a lab 5 partner yet, and would like one, please email me so I can pair you up!

Oct 29

Anticlimax

scikit-learn Decision Tree code posted

Office hours switch

No class Monday 11/17

Generating a practice problem

Extra credit opportunities

Auto-decision-tree induction code posted

Lab #6 posted!

Lab aide hours cancelled

Lab #5 partners