{"id":458,"date":"2025-11-04T10:05:08","date_gmt":"2025-11-04T15:05:08","guid":{"rendered":"http:\/\/stephendavies.org\/nlp\/?p=458"},"modified":"2025-11-04T10:06:24","modified_gmt":"2025-11-04T15:06:24","slug":"accuracy-metrics-the-binary-vs-multiclass-case","status":"publish","type":"post","link":"http:\/\/stephendavies.org\/nlp\/index.php\/2025\/11\/04\/accuracy-metrics-the-binary-vs-multiclass-case\/","title":{"rendered":"Accuracy metrics: the binary vs multiclass case"},"content":{"rendered":"<p>One thing I didn&#8217;t make sufficiently clear (and which our in-class multiclass XP example unfortunately probably didn&#8217;t help) is how metrics are treated differently for binary classification vs. multiclass classification.<\/p>\n<p>Here&#8217;s the deal. Whenever you perform a classification task, you have one of the following two scenarios:<\/p>\n<ol>\n<li>Binary. You have only one &#8220;thing&#8221; you&#8217;re trying to detect. Example: you&#8217;re detecting &#8220;politically polarized texts.&#8221; (Everything else is a &#8220;not-politically-polarized text.&#8221;)<\/li>\n<li>Multiclass. You have multiple &#8220;things&#8221; you&#8217;re trying to detect. Example: you&#8217;re detecting whether a Federalist Paper was authored by Hamilton, Madison, or Jay.<\/li>\n<\/ol>\n<p>In the binary case, one normally designates one of the two options as the &#8220;primary option&#8221; (for instance, &#8220;politically-polarized&#8221;) and computes precision, recall, and F1-score based on only that primary option. One does <i><b>not<\/b><\/i> normally compute precision\/recall\/F1-score for &#8220;politically-polarized&#8221; and also precision\/recall\/F1-score for &#8220;not politically polarized&#8221; and then use micro- or macro-averaging.<\/p>\n<p>The only time you need to (and should) use micro\/macro-averaging is in the multiclass case, when you have more than two labels you&#8217;re classifying everything in. Then, the only real way to take into account &#8220;how well do I do in identifying Hamilton? Madison? Jay?&#8221; is to compute three separate precision\/recall\/F1-scores and average them.<\/p>\n<p>It&#8217;s quite possible that I didn&#8217;t make this sufficiently clear, and that the fact that we did a multiclass example in lecture reinforced the idea that you always needed to compute separate metrics and average them, even in the binary case.<\/p>\n<p>All this to say: if on Quiz #3 &mdash; which had a binary classification example (&#8220;passive-aggressive&#8221; or not) &mdash; you did the multiclass technique of computing scores for &#8220;passive-aggressive&#8221; and &#8220;non-passive-aggressive&#8221; separately and then averaging them, I will forgive this venial sin and give you your points back for that. If this is the case, please send me an email with the number of XP you missed <i>for that reason<\/i> and I&#8217;ll post on the scoreboard.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One thing I didn&#8217;t make sufficiently clear (and which our in-class multiclass XP example unfortunately probably didn&#8217;t help) is how metrics are treated differently for binary classification vs. multiclass classification. Here&#8217;s the deal. Whenever you perform a classification task, you have one of the following two scenarios: Binary. You have only one &#8220;thing&#8221; you&#8217;re trying [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":"","_links_to":"","_links_to_target":""},"categories":[1],"tags":[],"class_list":["post-458","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"_links":{"self":[{"href":"http:\/\/stephendavies.org\/nlp\/index.php\/wp-json\/wp\/v2\/posts\/458","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/stephendavies.org\/nlp\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/stephendavies.org\/nlp\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/stephendavies.org\/nlp\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/stephendavies.org\/nlp\/index.php\/wp-json\/wp\/v2\/comments?post=458"}],"version-history":[{"count":6,"href":"http:\/\/stephendavies.org\/nlp\/index.php\/wp-json\/wp\/v2\/posts\/458\/revisions"}],"predecessor-version":[{"id":464,"href":"http:\/\/stephendavies.org\/nlp\/index.php\/wp-json\/wp\/v2\/posts\/458\/revisions\/464"}],"wp:attachment":[{"href":"http:\/\/stephendavies.org\/nlp\/index.php\/wp-json\/wp\/v2\/media?parent=458"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/stephendavies.org\/nlp\/index.php\/wp-json\/wp\/v2\/categories?post=458"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/stephendavies.org\/nlp\/index.php\/wp-json\/wp\/v2\/tags?post=458"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}