Statisticians Investigate Political Bias On Wikipedia 221
Hugh Pickens writes "The Global Economic Intersection reports on a project to statistically measure political bias on Wikipedia. The team first identified 1,000 political phrases based on the number of times these phrases appeared in the text of the 2005 Congressional Record and applied statistical methods to identify the phrases that separated Democratic representatives from Republican representatives, under the model that each group speaks to its respective constituents with a distinct set of coded language. Then the team identified 111,000 Wikipedia articles that include 'republican' or 'democrat' as keywords, and analyzed them to determine whether a given Wikipedia article used phrases favored more by Republican members or by Democratic members of Congress. The results may surprise you. 'The average old political article in Wikipedia leans Democratic' but gradually, Wikipedia's articles have lost the disproportionate use of Democratic phrases and moved to nearly equivalent use of words from both parties (PDF), akin to an NPOV [neutral point of view] on average. Interestingly, some articles have the expected political slant (civil rights tends Democrat; trade tends Republican), but at the same time many seemingly controversial topics, such as foreign policy, war and peace, and abortion have no net slant. 'Most articles arrive with a slant, and most articles change only mildly from their initial slant. The overall slant changes due to the entry of articles with opposite slants, leading toward neutrality for many topics, not necessarily within specific articles.'"
Re:Hope they don't do just word frequency analysis (Score:5, Informative)
Would it kill you to read the paper?
We obtain a list of 111,216 articles. We then eliminate these articles that cover countries other than the United States.
[...]
For each of these articles, we construct a slant index by applying the methods and estimates developed by Gentzkow and Shapiro (2010), hereafter G&S. G&S select 1,000 phrases based on the number of times these phrases appear in the text of the 2005 Congressional Record, applying statistical methods to identify phrases that separate Democratic representatives from Republican representatives, under the model that each group speaks to its respective constituents with a distinct set of coded language. In brief, we ask whether a given Wikipedia article uses phrases favored more by Republican members or by Democratic members of Congress.
And the corresponding footnote:
The words “republican” and “democrat” do not appear exclusively in entries about United States politics. If a country name shows up in the title or category names, we then check whether the phrase “United States” or “America” shows up in the title or category names. If yes, we keep this article. Otherwise, we search the text for “United States” or “America.” If these phrases do not show up more than 3 times in the text, this article is dropped. This process keeps articles such as “Iraq War” but drop articles related to political parties in foreign countries.
Researchers do think of this stuff, you know.
Re:Hope they don't do just word frequency analysis (Score:5, Informative)
Bias is rhetoric. Apodixis For Example (Score:4, Informative)
Re:Hope they don't do just word frequency analysis (Score:2, Informative)
Did you even read their metric?
Let each input line consist of the article title, followed by all category names (tab-separated or whatever). The countrynames regex matches any country name. The following AWK script approximates their algorithm (yes, I know egrep misses multiple matches on one line -- but you get the idea). /United States|America/) print $1;
{
if ($1 ~ countrynames) {
if ($0 ~
} else {
title = $1;
"egrep -c 'United States|America' wiki/" title | getline;
if ($1 > 2) print title;
}
}
Since "Irish Republicanism" contains "Irish", a form of country name, it would go through the first branch, and would require "America" to appear in a section heading.