{"id":13255,"date":"2024-06-24T15:37:06","date_gmt":"2024-06-24T14:37:06","guid":{"rendered":"https:\/\/emilkirkegaard.dk\/en\/?p=13255"},"modified":"2024-06-24T15:37:06","modified_gmt":"2024-06-24T14:37:06","slug":"wikipedias-political-bias-demonstrated-by-sentiment-analysis","status":"publish","type":"post","link":"https:\/\/emilkirkegaard.dk\/en\/2024\/06\/wikipedias-political-bias-demonstrated-by-sentiment-analysis\/","title":{"rendered":"Wikipedia&#8217;s political bias demonstrated by sentiment analysis"},"content":{"rendered":"<p>The great David Rozado has a new study out on the politics of Wikipedia: <a href=\"https:\/\/manhattan.institute\/article\/is-wikipedia-politically-biased#notes\">Is Wikipedia Politically Biased?<\/a>.<\/p>\n<p>He downloaded the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Database_download\">English Wikipedia database<\/a>, and analyzed whether the articles covering different politically aligned actors or organizations were systematically more positive or negative in tone (<a href=\"https:\/\/en.wikipedia.org\/wiki\/Sentiment_analysis\">sentiment analysis<\/a>). Tone was assessed using machine learning approaches based on modern LLMs or older methods. The results were very clear for US politicians:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-13256\" src=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/figure-3-Average-Sentiment-with-Which-Names-of-U.S.-Politicians-Are-Used-in-Wikipedia-Articles.png\" alt=\"\" width=\"1518\" height=\"2882\" srcset=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/figure-3-Average-Sentiment-with-Which-Names-of-U.S.-Politicians-Are-Used-in-Wikipedia-Articles.png 1518w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/figure-3-Average-Sentiment-with-Which-Names-of-U.S.-Politicians-Are-Used-in-Wikipedia-Articles-158x300.png 158w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/figure-3-Average-Sentiment-with-Which-Names-of-U.S.-Politicians-Are-Used-in-Wikipedia-Articles-539x1024.png 539w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/figure-3-Average-Sentiment-with-Which-Names-of-U.S.-Politicians-Are-Used-in-Wikipedia-Articles-768x1458.png 768w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/figure-3-Average-Sentiment-with-Which-Names-of-U.S.-Politicians-Are-Used-in-Wikipedia-Articles-809x1536.png 809w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/figure-3-Average-Sentiment-with-Which-Names-of-U.S.-Politicians-Are-Used-in-Wikipedia-Articles-1079x2048.png 1079w\" sizes=\"auto, (max-width: 1518px) 100vw, 1518px\" \/><\/p>\n<p>As well as journalists:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-13257\" src=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/figure-4-Average-Sentiment-with-which-Prominent-Public-Figures-Are-Used-in-Wikipedia-Articles.png\" alt=\"\" width=\"1437\" height=\"1972\" srcset=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/figure-4-Average-Sentiment-with-which-Prominent-Public-Figures-Are-Used-in-Wikipedia-Articles.png 1437w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/figure-4-Average-Sentiment-with-which-Prominent-Public-Figures-Are-Used-in-Wikipedia-Articles-219x300.png 219w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/figure-4-Average-Sentiment-with-which-Prominent-Public-Figures-Are-Used-in-Wikipedia-Articles-746x1024.png 746w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/figure-4-Average-Sentiment-with-which-Prominent-Public-Figures-Are-Used-in-Wikipedia-Articles-768x1054.png 768w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/figure-4-Average-Sentiment-with-which-Prominent-Public-Figures-Are-Used-in-Wikipedia-Articles-1119x1536.png 1119w\" sizes=\"auto, (max-width: 1437px) 100vw, 1437px\" \/><\/p>\n<p>Overall, one can compute the difference in positive vs. negative sentiment across various categories, and the results are quite clear:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-13258\" src=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/figure-7-Average-Sentiment-with-Which-Terms-With-Ideological-Connotations-Are-Used-in-Wikipedia-Articles.png\" alt=\"\" width=\"1470\" height=\"1682\" srcset=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/figure-7-Average-Sentiment-with-Which-Terms-With-Ideological-Connotations-Are-Used-in-Wikipedia-Articles.png 1470w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/figure-7-Average-Sentiment-with-Which-Terms-With-Ideological-Connotations-Are-Used-in-Wikipedia-Articles-262x300.png 262w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/figure-7-Average-Sentiment-with-Which-Terms-With-Ideological-Connotations-Are-Used-in-Wikipedia-Articles-895x1024.png 895w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/figure-7-Average-Sentiment-with-Which-Terms-With-Ideological-Connotations-Are-Used-in-Wikipedia-Articles-768x879.png 768w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/figure-7-Average-Sentiment-with-Which-Terms-With-Ideological-Connotations-Are-Used-in-Wikipedia-Articles-1342x1536.png 1342w\" sizes=\"auto, (max-width: 1470px) 100vw, 1470px\" \/><\/p>\n<p>The average effect size is about 1 standard deviation (Cohen&#8217;s d), though there was some difference across categories. UK parliament members had the smallest difference, perhaps because the mostly US-based Wikipedia writers don&#8217;t care too much about them, so they won&#8217;t spend a lot of time lobbying for adding negative material. It would be informative to see if this holds for non-English speaking areas, say, German or Spanish politicians. One could run the sentiment analysis on Wikipedias of various languages to see how they differ, e.g. English vs. German Wikipedia coverage of Anglo vs. German politicians.<\/p>\n<p>This bias inherent in Wikipedia is important because Wikipedia is one of the most visited websites in the world (<a href=\"https:\/\/en.wikipedia.org\/wiki\/List_of_most-visited_websites\">#6 perhaps<\/a>), and has a veneer of neutrality. It is in fact mostly neutral because the overwhelming proportion of pages concern topics not relevant to current political debates (e.g., this or that mineral, this or that park). However, anything related to the Current Thing in politics is certainly left-biased. Many readers don&#8217;t know that, so this bias works its way into people&#8217;s minds. Even if you know it&#8217;s biased, you don&#8217;t necessarily know what it is leaving out, or giving improper attention to.<\/p>\n<p>Another way that Wikipedia&#8217;s bias is important is that <del>our future overlords<\/del> LLMs are trained based, in part, on the Wikipedia corpus. Any bias from this corpus is thus potentially introduced into the AIs directly from training. Rozado gives some evidence of this as well.<\/p>\n<p>Wikipedia itself does probably not originate all of this bias since the sources it draws upon &#8212; mainstream media and academic writings &#8212; are themselves strongly left-biased. But in fact, Wikipedia left-wing activist editors have lobbied to get various non-left sources banned, so they cannot even be used to avoid the bias. You can see which sources are banned or listed as questionable on <a href=\"https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Deprecated_sources\">these<\/a> <a href=\"https:\/\/en.wikipedia.org\/wiki\/Wikipedia:Reliable_sources\/Perennial_sources\">pages<\/a>. There are some curious pairs of reliable vs. unreliable sources:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-13259\" src=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/amnesty-vs.-american-conservative.png\" alt=\"\" width=\"2492\" height=\"1054\" srcset=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/amnesty-vs.-american-conservative.png 2492w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/amnesty-vs.-american-conservative-300x127.png 300w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/amnesty-vs.-american-conservative-1024x433.png 1024w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/amnesty-vs.-american-conservative-768x325.png 768w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/amnesty-vs.-american-conservative-1536x650.png 1536w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/amnesty-vs.-american-conservative-2048x866.png 2048w\" sizes=\"auto, (max-width: 2492px) 100vw, 2492px\" \/><\/p>\n<p>Amnesty is a fairly radical left-wing advocacy organization, but it is somehow listed as reliable, unlike The American Conservative, which is obviously a conservative magazine. A proper classification would list both as having political bias. Curiously, the ADL, a Jewish hate (defamation) group, is listed as kinda reliable, but depending on the topic:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-13260\" src=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/ADL.png\" alt=\"\" width=\"2476\" height=\"2170\" srcset=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/ADL.png 2476w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/ADL-300x263.png 300w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/ADL-1024x897.png 1024w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/ADL-768x673.png 768w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/ADL-1536x1346.png 1536w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/ADL-2048x1795.png 2048w\" sizes=\"auto, (max-width: 2476px) 100vw, 2476px\" \/><\/p>\n<p>In fact, even rather tame center-right magazines like Quillette are complete banned:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-13261\" src=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/wikipedia-source-quillette.png\" alt=\"\" width=\"2478\" height=\"530\" srcset=\"https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/wikipedia-source-quillette.png 2478w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/wikipedia-source-quillette-300x64.png 300w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/wikipedia-source-quillette-1024x219.png 1024w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/wikipedia-source-quillette-768x164.png 768w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/wikipedia-source-quillette-1536x329.png 1536w, https:\/\/emilkirkegaard.dk\/en\/wp-content\/uploads\/wikipedia-source-quillette-2048x438.png 2048w\" sizes=\"auto, (max-width: 2478px) 100vw, 2478px\" \/><\/p>\n<p>It would be informative to get a left-right score for each of these outlets and organizations and see how much they relate to their current source credibility standing on this list. The list is quite important because it controls which sources can be used on Wikipedia, and basically anything written based on banned or questionable sources can be immediately removed by an opposing editor with a sound Wikipedia legal basis. As such, one way that left activist editors have been very active was to lobby to get their opponents&#8217; favorite sources banned. Sneaky but effective. You can find the discussions by following the links on the page.<\/p>\n<p>What can be done? I have a positive take. It possibly won&#8217;t matter. AIs will takeover the role of Wikipedia. People will not be consulting Wikipedia to manually search for the information they want. They will just be asking their favorite AI to give them the answers. Wikipedia will fade as the go-to resource that humans use. AIs will still use it, but the <a href=\"https:\/\/davidrozado.substack.com\/p\/the-political-preferences-of-llms\">same David Rozado has also previously shown that current LLMs can seemingly take into account the bias inherent in the source material<\/a>, the same way historians do this when reading ancient history. The bias seems to come in during the fine-tuning. It is therefore possible that someone like Elon Musk could successfully make an unbiased or right-biased AI (Grok), which will become very popular and won&#8217;t suffer from this Wikipedia editor political bias. Based on this take, the key information battle to come is that of who gets to decide what bias goes into the LLMs. Will they be 2+2=4 or 2+2=5 AIs? We will find out in the next few years.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The great David Rozado has a new study out on the politics of Wikipedia: Is Wikipedia Politically Biased?. He downloaded the English Wikipedia database, and analyzed whether the articles covering different politically aligned actors or organizations were systematically more positive or negative in tone (sentiment analysis). Tone was assessed using machine learning approaches based on [&hellip;]<\/p>\n","protected":false},"author":17,"featured_media":13256,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3524,2856,2448],"tags":[3525,2676,1400],"class_list":["post-13255","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-artificial-intelligence","category-media-studies","category-wikipedia","tag-david-rozado","tag-political-bias","tag-wikipedia","entry","has-media"],"_links":{"self":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/13255","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/users\/17"}],"replies":[{"embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/comments?post=13255"}],"version-history":[{"count":2,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/13255\/revisions"}],"predecessor-version":[{"id":13263,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/posts\/13255\/revisions\/13263"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/media\/13256"}],"wp:attachment":[{"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/media?parent=13255"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/categories?post=13255"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/emilkirkegaard.dk\/en\/wp-json\/wp\/v2\/tags?post=13255"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}