You are currently viewing Wikipedia editors and politics

Wikipedia editors and politics

I’ve written a few times about the political bias (defined as deviation from population average) of journalists and academics. Earlier this year, I wrote about how one can study the sentiment of Wikipedia pages to determine the political slant, by looking at whether left vs. right politicians and other people are covered equally fairly in tone (negative vs. positive). The results was that there was a strong left-wing tilt, especially for American politicians. How does this bias originate? There are some options:

  1. Editors themselves lean left.
  2. Admins lean left and police the editors.
  3. Paid staff leans left and police the admins.
  4. The available source material leans left, coloring the content (these being media articles and academic articles).

These options are not mutually exclusive, but still it makes sense to look into each of them to see what one can find. In our new study, we looked at the two options, the editors themselves and the admins.

We scraped user pages of 7,739 Wikipedia editors. Of these, 224 users positioned themselves politically using the semi-standardized “userboxes”. Based on this sample, Wikipedia editors’ views had a strong tilt towards the left. The results are congruent with the political leanings of related occupations, such as journalists and academics. Keywords: Wikipedia, Bias, Ideology

Since Wikipedia is mostly edited by volunteers, who can be pseudonymous or edit as an IP address, it can be difficult to know what their politics are exactly. However, many users have signals in their profiles, either in form of text, or in semi-standardized boxes that show their various personal characteristics and beliefs. Most user pages look like this:

Which is to say, they have nothing of interest, or may be empty, or not even created. Others like this:

This user has both text and userboxes (on the right), though they don’t seem to have anything about politics.

This one is more aesthetic but still manages to confirm the first law of bisexuals (joke: How do you know if a girl is bisexual? Don’t worry about, she will tell you.). And finally, we get the extreme cases:

This user has 100s of userboxes, of which only the political ones are shown above.

So our study was relatively simple. We downloaded the user pages for all the most active Wikipedia users, including all the admins. Then we looked for the most commonly used userboxes relating to political ideology and classified them (left or right by American standards). Then we counted up how many occur within a person and scored that person on that basis. That’s it. The result one gets is this:

This image excludes users without any data, otherwise the green bar would be very large and uninformative. Right now the green group consists of users who had one left-wing and one right-wing userbox.

We used political ideology based on multiple areas:

Which also means one can do the plot for each of them:

One can of course argue with the specifics. For instance, there seems to be relatively many gender-critical or right-wing views on LGBTQ+. Really? Well, sort of. There were only two userboxes that were common for this position, which are:

[1] man-kind Regarding gender, this user will use the vernacular, not what is politically correct.
[2] neutrality This user could not care less about the use of gender-neutral language.

Which are very mild indeed. On the side hand, there were 33 common userboxes expressing the left-wing position. I won’t list them all, but you get the idea:

[1] +This editor expects recognition as gender neutral.
[2] This user is agender.
[3] SOC This user is an expert in Sociology, specializing in gender and criminology.
[4] This user identifies as genderqueer.
[5] This user tries to monitor the LGBT Watchall list.

The paper’s appendix lists all the userboxes we used.

Overall, the data were too sparse to analyze further, for instance, using item response theory for the scoring, or comparing the admins to users (too few of them had any political userboxes). There was simply too much missing data because most users don’t explicitly tell others about their politics. Why would they? Other users can use it against them in the endless lawfare that characterizes Wikipedia editing.

Mysterious results from a big survey

When we were writing our study, we found there was a recent huge survey of Wikipedians that included questions about politics. There have been many surveys of Wikipedians over the years, mostly about getting new users to stay, and about getting more precious minorities to edit. Curiously, then, they never previously asked about politics, even though conservatives or right-wingers presumably would be a small minority. Anyway, the new study is described thus:

 The dataset focuses on Wikipedia users and contains information about demographic and socioeconomic characteristics of the respondents and their activity on Wikipedia. The data was collected using a questionnaire available online between June and July 2023. The link to the questionnaire was distributed via a banner published in 8 languages on the Wikipedia page. Filling out the questionnaire was voluntary and not incentivised in any way. The survey includes 200 questions about: what people were doing on Wikipedia before clicking the link to the questionnaire; how they use Wikipedia as readers (“professional” and “personal” uses); their opinion on the quality, the thematic coverage, the importance of the encyclopaedia; the making of Wikipedia (how they think it is made, if they have ever contributed and how); their social, sport, artistic and cultural activities, both online and offline; their socio-economic characteristics including political beliefs, and trust propensities. More than 200 000 people opened the questionnaire, 100 332 started to answer, and constitute our dataset, and 10 576 finished it. Among other themes identified by future researchers, the dataset can be useful for advancing the research regarding the features of readers vs contributors of online commons, the relationship between trust, information, sources, and the use made of this information.

Actually, a proper analysis of the study was never published by the authors, but the data were. One person looked at them and made some simple plots for Wikipedia’s research news:

The distribution of politics looks unnatural. 1000s of far-right votes, but very few regular conservatives? It seems fake. So we downloaded the data looking for the signs of fraud. However, we weren’t able to find anything wrong with it. Most of the variables correlate as one would expect. The data distributions for the far-right votes aren’t much different from the other users. But the result remains unbelievable, even though it still shows a big bias towards left-wing views. Maybe the question was incorrectly translated? No, the results are reasonably consistent across languages and Wikipedias. Was it reversed by mistake at some point? Not as far as I can tell. Maybe one of the readers of this post can analyze it further and find out what’s wrong.