Forgot your password?
Wikipedia News

Developing a Vandalism Detector For Wikipedia 116

Posted by kdawson
from the false-positives-would-hurt dept.
marpot writes "In an effort to assist Wikipedia's editors in their struggle to keep articles clean, we are conducting a public lab on vandalism detection. The goal is the development of a practical vandalism detector that is capable of telling apart ill-intentioned edits from well-intentioned edits. Such a tool, which will work somewhat like a spam detector, will release the crowd's workforce currently occupied with manual and semi-automatic edit filtering. The performance of submitted detectors will be evaluated based on a large collection of human-annotated edits, which has been crowdsourced using Amazon's Mechanical Turk. Everyone is welcome to participate."
This discussion has been archived. No new comments can be posted.

Developing a Vandalism Detector For Wikipedia

Comments Filter:
  • Existing (Score:3, Insightful)

    by ShakaUVM (157947) on Sunday February 28, 2010 @04:48PM (#31308660) Homepage Journal

    Apparently, how their vandalism detector works right now is by automatically reverting any edits done by anonymous editors.

    (And yeah, that's a bit sarcastic, but somewhat true.)

  • Re:Existing (Score:5, Insightful)

    by broken_chaos (1188549) on Sunday February 28, 2010 @04:55PM (#31308726)

    I'm assuming it's also known to revert good edits in under 30 seconds?

    Just thinking out loud here, but is raw speed of reversion really what should be bragged about, as opposed to accuracy?

  • by Anonymous Coward on Sunday February 28, 2010 @05:02PM (#31308772)

    Wikipedia, the encyclopedia that anyone can edit - in my ass.

    Harry Potter:
    "The novels revolve around [[Harry Potter (character)|Harry Potter]], an orphan who discovers at the age of eleven that he is a wizard.{{cite web|url=|title=Review: Gladly drinking from Rowling's 'Goblet of Fire'|date=14 July 2000|publisher=CNN|accessdate=28 September 2008}} Wizard ability is inborn, but children are sent to wizarding school to learn the magical skills necessary to succeed in the [[wizarding world]]. Harry is invited to attend the boarding school called [[Hogwarts|Hogwarts School of Witchcraft and Wizardry]]. Each book chronicles one year in Harry's life, and most of the events take place at Hogwarts.{{cite news|url=|title=Harry Potter, Hogwarts and Home|last=Frauenfelder|first=David|date=17 July 2007|publisher=The News & Observer Publishing Company |accessdate=29 September 2008}} As he struggles through adolescence, Harry learns to overcome many magical, social and emotional hurdles.{{cite web|url=,0,6711375.story|title=Plot summaries for the first five Potter books|last=Hajela|first=Deepti|date=14 July 2005||accessdate=29 September 2008}}"

    "=== Supplementary works ===
    {{see also|J. K. Rowling#Philanthropy|l1=J. K. Rowling: Philanthropy}}

    Rowling has expanded the [[Harry Potter universe]] with several short books produced for various charities.{{cite web|url=|title=How Rowling conjured up millions|publisher=BBC|accessdate=7 September 2008 | date=19 July 2007}}{{cite web|url=|title=Comic Relief : Quidditch through the ages|publisher=Albris|accessdate=7 September 2008}} In 2001, she released ''[[Fantastic Beasts and Where to Find Them]]'' (a purported Hogwarts textbook) and ''[[Quidditch Through the Ages]]'' (a book Harry read for fun). Proceeds from the sale of these two books benefitted the charity [[Comic Relief]].{{cite web|url=|title=The Money|publisher=Comic Relief|accessdate=25 October 2007}} In 2007, Rowling composed seven handwritten copies of ''[[The Tales of Beedle the Bard]]'', a collection of fairy tales that is featured in the final novel, one of which was auctioned to raise money for the Children's High Level Group, a fund for mentally disabled children in poor countries. The book was published internationally on 4 December 2008.{{cite web|title=
    JK Rowling Fairy Tales To Go On Sale For Charity|work=ANI|year=2008|url=
    |accessdate=2 August 2008}}{{cite news|url=|title=JK Rowling book fetches £2m|date= 13 December 2007|publisher=BBC|accessdate=13 December 2007}}{{cite web|url=|title=Amazon purchase book| Inc|accessdate=14 December 2007}} Rowling also wrote an 800-word [[Harry Potter prequel|prequel]] in 2008 as part of a fundraiser organised by the bookseller [[Waterstones]].{{cite web|title=Rowling pens Potter prequel for charities|author=Williams, Rachel |year=2008|publisher=''[[The Guardian]]''|url=}} Retrieved on 31 May 2008.

    == Structure and genre ==
    {{see also|Harry Potter influences and analogues}}

    The ''Harry Potter'' novels fall within the genre of [[fantasy literature]]; however, in many respects they are also [[bildungsroman]]s, or [[coming of age]] novels.{{cite web|url=|title=Wizards and wainscots: generic structures and genre themes in the Harry Potter series|last=Anne Le Lievre|first=Kerrie|ye

  • by Anonymous Coward on Sunday February 28, 2010 @05:04PM (#31308800)

    I've had many more problems with admin abuse than vandalism. Vandalism is quick and easy to deal with. Admins are the biggest problem in Wikipedia editing; they have no accountability and abuse their power.

    How about a log of each admin's activities, including reversions, bans, etc, and a way for non-admins to challenge actions (without spending countless hours in an appeal process worthy of a federal court).

  • Step One (Score:5, Insightful)

    by owlnation (858981) on Sunday February 28, 2010 @05:16PM (#31308884)
    Before any more detectors are rolled out, how about they come up with a workable definition of vandalism? And actually use it fairly, ethically and logically.

    There's a great deal of evidence to suggest the current definition of "vandalism," is something a wikiadmin decides he just doesn't like, or disagrees with, or in some way interferes with his power-trip.
  • by MillionthMonkey (240664) on Sunday February 28, 2010 @05:39PM (#31309050)
    On the other hand, do gibberish pages like this need much more editing, or is Harry Potter's Wikipedia entry basically finished as far as anyone cares?
  • Re:Existing (Score:5, Insightful)

    by beakerMeep (716990) on Sunday February 28, 2010 @06:41PM (#31309506)
    The problem is not so simple though. You cant quantify something as subjective as vandalism. You cant reduce it to your mathematical formula no matter how statistically fancy your 6 page pdf is.

    I had a particularly nasty run it with cluebot where I removed large portions of spam from an article, only to have cluebot revert it back and put the spam back in. When I again removed the spam, some other editor strolled by and again put the spam back in because he trusted the bot more than humans and he didnt read the talk page where many had requested the removal of this spam. Finally, after a rather rude conversation with the human he realized he had no business reverting it. This person was a long time editor and contributor too but it just serves as an example that any criteria used to determine spam is based upon assumptions. Assumptions that it will be true in other cases and assumptions that others will agree with the classification.

    The whole point of Wikipedia is that it is a community edited encyclopedia. I have no interest in a computer edited encyclopedia. If people want to program bots to review an editor's work, perhaps we should program bots to write the work? Perhaps you can call it Botopedia. Furthermore, many of the bots ask you to report false positive to their personal pages off of Wikipedia's website on some other .com or .edu domain. They ask you to be accountable to them, but who are they accountable to? What's to stop spammers from programming bots to annoy editors as a phishing exercise?

    Now don't get me wrong though, if someone wants to use a bot to aid in finding vandalism, that would help. But if the system is so frail that Wikipedia cant exist without computer program editors, It may be time to revisit the system. As others have stated, pushing edits into a queue would be much more sane than direct to live edits.

    Editing bots are wrong for Wikipedia, and if they allow it they are letting go of their vision of community participation in favor of the visions (or delusions) of grand technological solutions.
  • Re:Step One (Score:2, Insightful)

    by Anonymous Coward on Sunday February 28, 2010 @06:48PM (#31309544)
    I completely agree. The worst vandalism on wikipedia is done by self righteous page owners and admins on power trips that hate to be corrected. I used to help out on a number of pages (areas where I am a genuine expert not just someone with an opinion) but having my updates constantly deleted just got too frustrating, now I just make sure people in my field know not to use wikipedia.
  • Re:Been done? (Score:3, Insightful)

    by LifesABeach (234436) on Sunday February 28, 2010 @06:59PM (#31309636)
    One of the quirks I've noticed is when a business makes, or invents something, then uses the Wiki to advertise. I can't help but wonder is this could also be considered a form of vandalism?
  • by Neoprofin (871029) <neoprofin@hotm a i> on Sunday February 28, 2010 @07:03PM (#31309662)
    Seems to me like one user is trying to add a highly bias account of a single incident in her life that is many times longer than the rest of the article and throwing a screaming fir when multiple Admins tell him that it would be in violation of multiple measures of quality. Further mention of an "edit war" implies to me that the user tried to force his section in after repeated warnings and when told to file an RfC he just continues to argue first with just about anyone he can find.

    Those power abusing fuck wads.
  • by Tango42 (662363) on Sunday February 28, 2010 @07:13PM (#31309768)

    Officially, vandalism is defined as edits made in bad faith. If you are trying to improve the article but are an idiot (which includes people that don't realise their own bias), that isn't vandalism, it's just idiocy. It is only if you are editing with the intention of making the article worse that you are vandalising.

  • Re:Existing (Score:3, Insightful)

    by Tango42 (662363) on Sunday February 28, 2010 @07:30PM (#31309914)

    In that paper, you say you think high-recall (ie. low false negatives) should be preferred to high-precision (low false positives) since it reduces the chance of a reader seeing a vandalised version. I disagree. You underestimate the harm caused by losing editors that get annoyed when their legitimate edits are reverted by a bot. The upcoming feature, Flagged Revisions ( [] ), will provide a much better way of preventing readers from seeing vandalised versions while not costing us useful editors.

  • by Jedi Alec (258881) on Monday March 01, 2010 @07:57AM (#31314132)

    Hey, this is Slashdot. We're qualified to discuss any subject we damn well please based on our own prejudices and assumptions, while pretending that our high IQ's and common sense qualify us to pretend we're experts on whatever the discussion may or may not be about. What right do wiki admins have to assault our ivory towers when we sprinkle our droplets of distilled wisdom on their pages as well?

Our business in life is not to succeed but to continue to fail in high spirits. -- Robert Louis Stevenson