Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
Wikipedia News

The Curious Case of Increasing Misspelling Rates On Wikipedia 285

An anonymous reader writes "The crowd-sourced nature of Wikipedia might imply that its content should be more 'correct' than other sources. As the saying goes, the more eyes the better. One particular student who was curious about this conducted rudimentary text mining on a sampling of the Wikipedia corpus to discover how misspelling rates on Wikipedia change through time. The results appear to indicate an increasing rate of misspellings through time. The author proposes that this consistent increase is the result of Wikipedia contributors using more complex language, which the test is unable to cope with. How do the results of this test compare to your own observations on the detail accuracy of massively crowd-sourced applications?"
This discussion has been archived. No new comments can be posted.

The Curious Case of Increasing Misspelling Rates On Wikipedia

Comments Filter:
  • by icebike ( 68054 ) * on Friday December 23, 2011 @07:44PM (#38477428)

    Every web browser as auto spell-check capabilities these days. Most of them correct as you type.
    So why should there be any misspellings on something that is managed strictly from a web interface?

    Is it part of the arrogance of those electing themselves to write and editing articles on wiki that they refuse to use a spell checker, or
    is it that the words are simply unknown to the normal spell-check dictionaries?

    I find occasional misspellings in mainstream news articles as well (and I am by no means a natural born speller).

    But most maddening to me is the "they're their there" errors, and similar wrong word usage.
    Spell checkers offer little help in catching these, but a 6th grade education usually suffices.

    Maybe the same people who wont waist there time checking they're spelling also cant be bothered to use the write word. ;-)

  • by Anonymous Coward on Friday December 23, 2011 @07:51PM (#38477480)

    Whether it's open source software or online collaborative projects, the smart people always get driven away over the long term. Smarter people are usually more interested in creating high-quality content, whereas stupider people end up putting out crap purely for political reasons. Eventually these stupider people start trying to modify the work of the smarter people, but do a poor job at it. When they're called out on their shitty work by the smart people, the fools make a huge stink. This soon devolves into a political mess where the smarter contributor is severely inhibited from contributing by the constant moaning and bitching of the idiots. Not wanting to waste time with such shenanigans, the smarter person leaves for some other endeavor. After a while, many of the smarter people are driven away, and the end result is that the stupider people make up the bulk of the project's contributions.

    We've seen this happen with many open source software projects, and I don't think that other kinds of online collaborative projects are any different.

  • by hedwards ( 940851 ) on Friday December 23, 2011 @07:58PM (#38477546)

    No, it's our language when it comes to international communication. We don't own the varieties spoken in Australia, Guyana, India and whatever other regions use English, but if you want to be understood you really ought to be sticking fairly close to either British English or American English.

  • by timholman ( 71886 ) on Friday December 23, 2011 @08:00PM (#38477568)

    I can offer my own opinion of this phenomenon: the bad is driving out the good. Fewer competent writers are bothering to edit Wikipedia articles nowadays. Not only do contributions get reverted / deleted by editors who think they "own" the article, but good writers simply get tired of fixing the semi-literate ramblings of people who cannot write a coherent sentence.

    It's the old axiom that incompetent people cannot recognize their own incompetence, and so do not realize that their "contributions" are not improving the article, but instead are making it worse. Eventually the good contributors get tired of sweeping back the ocean with a broom, and just walk away from Wikipedia.

  • Worth Posting. (Score:5, Insightful)

    by DarwinSurvivor ( 1752106 ) on Friday December 23, 2011 @08:04PM (#38477594)
    So slashdot has just posted an article about a test where even the test's AUTHOR believes the results are due to shortcomings in the test itself. This has to be the most pointless article I've read in a while...
  • by FridayBob ( 619244 ) on Friday December 23, 2011 @08:12PM (#38477680)
    ... and the growth in size of many articles, combined with the limited number of Wikipedia editors, is one possible reason why spelling errors may be on the increase. Also, one form of vandalism is the intentional introduction of spelling errors.
  • by somersault ( 912633 ) on Friday December 23, 2011 @09:06PM (#38478120) Homepage Journal

    It might also be that there are specialist words being used on Wikipedia that aren't in the dictionary.. unless this test is explicitly looking for common misspellings..

  • by FridayBob ( 619244 ) on Saturday December 24, 2011 @12:11AM (#38479278)

    Totally agree! I spend the best part of *three years* working on a relatively obscure corner of WP's biology department involving some 500 articles and over 20,000 edits before finally throwing in the towel. I learned a lot during my time there, but eventually the idea of putting more effort into it just didn't make any more sense. One of their main problems is that the only thing preventing good articles from deteriorating is constant policing by knowledgeable editors -- and preferably by the people who are responsible for all the important contributions. I like to think that my contributions to WP have not been a complete waste, but if enough time goes by before anyone fills my shoes, I fear they will be. After all, what good is an article that's now only 99% accurate? 98%, 97%, 96%...

  • by WhatAreYouDoingHere ( 2458602 ) on Saturday December 24, 2011 @12:34AM (#38479384)

    The rule on places like Slashdot and other Internet forums is that so long as the text can be understood, variations in spelling and grammar are acceptable, should not be corrected, and usually should not even be mentioned.

    Are you new here? :)

  • by Wolfling1 ( 1808594 ) on Saturday December 24, 2011 @01:43AM (#38479662) Journal
    Variations in the use of the letter 'u' in certain words is largely irrelevant to people's understanding of the written word. Even misspellings of there, their and they're (or 'you're' and 'your') are not an issue. The human mind can easily make sense of the sentence.

    The bigger problem is the differences in short date formats. dd/mm/yyyy vs mm/dd/yyyy can easily generate significant errors in calculation. Anyone who's integrated more than one Microsoft product together in both countries will have encountered the challenge.

    Personally, I think our (AU) reverse polish date notation is ridiculous, but at least its not inside out notation (US).

    Can we just settle on yyyy/mm/dd and be done with it? Please?
  • by Idarubicin ( 579475 ) on Saturday December 24, 2011 @02:39PM (#38483508) Journal

    Is it part of the arrogance of those electing themselves to write and editing articles on wiki that they refuse to use a spell checker, or is it that the words are simply unknown to the normal spell-check dictionaries?

    You might know the answer to this if you had read the linked article instead of immediately jumping in to editorialize (and no, I'm not new here).

    While there are a number of serious methodological concerns I've discussed in another post [slashdot.org], the author's Table 4 ought to raise a screaming red flag. The algorithm the author used flagged about 5% of articles as having more than 25% of their words misspelled--and the author didn't discuss any sort of manual follow-up on those articles to determine where the problem lay. I'm sorry, but misspelling one word in four just isn't a plausible result.

    I suspect that the parser is failing to properly handle tables of data, scientific terminology, some unusual formatting and template markup, and foreign words. All of these categories will have been expanded greatly since Wikipedia's early days, and their presence is a sign that the encyclopedia is increasing in quality and coverage, not being degraded.

The one day you'd sell your soul for something, souls are a glut.

Working...