DATA 101

    • Syllabus
    • Office
    • Colab
    • Python
    • Numpy
    • Pandas
    • Crystal
    • Readings
  • Anticlimax

    Well, kind of a downer to have the last day of class canceled, but it is what it is. (I was born and raised in Colorado, and the amount of snow we got today honestly wouldn’t even be considered “snow.” But alas, Virginians seem to get in a tizzy over a dusting like this…)

    We’ll just cancel the last quiz (leaving 11 for the semester). I’ll send you an email later today with all your quiz scores, just so you can confirm I didn’t miss anything in my data collection.

    I’ll see you for the final exam on Monday at 3:30pm!

    Dec 5
  • scikit-learn Decision Tree code posted

    From class this week, the Python code that uses the sklearn package to induce decision trees on the election data set.

    Dec 3
  • Office hours switch

    Tomorrow (Tuesday Dec 2nd) I will have office hours on Zoom instead of in person, and I will hold it at this revised time: 1-3pm.

    If you’d like to “attend” office hours, just send me an email during that time period and I’ll reply with a link to the Zoom chat.

    Dec 1
  • No class Monday 11/17

    As announced in class on Friday, we will not have class on Monday the 17th, due to an unbreakable engagement Stephen has committed to. See you on Wednesday!

    Nov 14
  • Generating a practice problem

    If you’d like another practice problem to get ready for your perfect 10 on tomorrow’s quiz, go to ChatGPT (or other generative AI) and paste in this prompt:

    here's a dataset my class used to help us learn how to perform decision
    tree induction: 
    
    caffeine_cups,free_hours_sat,dorm,attends_movie
    0,6,Eagle,Yes 
    4,5,Jefferson,Yes
    2,4,Randolph,No
    1,7,Eagle,Yes
    2,3,Jefferson,No
    3,2,Randolph,No
    2,1,Randolph,Yes
    5,8,Eagle,Yes
    0,1,Jefferson,No
    3,4,Randolph,No
    1,6,Jefferson,Yes
    
    we were to split each numeric attribute only once (splitting into high and 
    low groups) considering all possible split points. the target label is the 
    last one (attends_movie). the decision tree algorithm we used is greedy, and 
    uses "# of examples correct if we stop branching there" as the metric for 
    determining what should go at a node (not entropy or information gain).
    
    for the example above, it turned out that splitting free_hours_sat between 
    1-4 and 5-8 was the best feature to put at the root of the tree, since that 
    got 10 out of 11 examples correct just with one feature.
    
    can you please make up another data set for me so I can practice this 
    decision tree induction for an upcoming quiz? also tell me what "the right 
    answer" is (i.e., what node should be at the root, at each of the branches,
    etc., when using the greedy decision tree induction algorithm.)
    
    

    Then, see how well you do on whatever data it gives you.

    Nov 13
  • Extra credit opportunities

    To claim one-half-of-one-quiz’s worth of extra credit, complete either of the following two items:

    1. Sign up (by emailing me) for one of the experiment times listed here. Scroll to the second page to see the days/times. This experiment will take place in Farmer 022 (just down the hall from class) and will take no more than one hour of your time. (Be sure to show up on time!)
    2. By the last day of class, write a one-page, single-spaced, well-researched and documented essay on the current state of political polarization in social media. You should cite at least two reputable academic sources on the topic, and your essay should synthesize their findings and present an overview of how this important subset of online communication has developed over the recent years. This essay is due to me by hardcopy on the last day of class.
    Nov 12
  • Auto-decision-tree induction code posted

    Forgot to post our code from the other day:

    • dt.py — the code that automatically searches for the best root to split on, and the best branches at level two of the decision tree
    • election.csv — the toy dataset we ran this on
    Nov 12
  • Lab #6 posted!

    Lab #6 has been posted, and is ready for your perusal!


    https://colab.research.google.com/github/divilian/data101_lab6/blob/main/lab6.ipynb

    Nov 7
  • Lab aide hours cancelled

    FYI: lab aide hours this Sunday, Nov 2, and Tues, Nov 4, have been cancelled.

    Oct 29
  • Lab #5 partners

    If you don’t have a lab 5 partner yet, and would like one, please email me so I can pair you up!

    Oct 29
1 2 3 4
Next Page→
DATA 101

DATA 101

stephendavies.org