Follow Slashdot stories on Twitter


Forgot your password?
Wikipedia News

Competition Produces Vandalism Detection For Wikis 62

marpot writes "Recently, the 1st International Competition on Wikipedia Vandalism Detection (PDF) finished: 9 groups (5 from the USA, 1 affiliated with Google) tried their best in detecting all vandalism cases from a large-scale evaluation corpus. The winning approach (PDF) detects 20% of all vandalism cases without misclassifying regular edits; moreover, it can be adjusted to detect 95% of the vandalism edits while misclassifying only 30% of all regular edits. Thus, by applying both settings, manual double-checking would only be required on 34% of all edits. Nothing is known, yet, whether the rule-based bots on Wikipedia can compete with this machine learning-based strategy. Anyway, there is still a lot potential for improvements since the top 2 detectors use entirely different detection paradigms: the first analyzes an edit's content, whereas the second (PDF) analyzes an edit's context using WikiTrust."
This discussion has been archived. No new comments can be posted.

Competition Produces Vandalism Detection For Wikis

Comments Filter:
  • First Vandal (Score:1, Interesting)

    by Anonymous Coward on Sunday September 26, 2010 @10:49AM (#33703216)

    here we are knocking at the friendly gates of wikipedia and now they want to throw us out.
    Is there no hospitality in this world ?

    ad: Searching for experienced pillagers and fortress busters. Exp 5 years min. Trebuchet skills a plus. Bring your own broadsword.

  • by structural_biologist ( 1122693 ) on Sunday September 26, 2010 @11:36AM (#33703500)
    I don't know where that 34% figure comes from for the manual double checking. The test set contains about 60% vandalism and 40% real edits, so I'll assume this represents the rate of vandalism on wikipedia. Now, consider a set of 1000 edits. 600 would be vandalism while 400 would be real edits. The second filter would catch 570 instances of real vandalism along with 120 false positives. Even if you used the first filter to automatically remove the 120 instances of vandalism it finds, you would still be left with a set of 450 instances of vandalism + 120 false positives to check. This means that you would have to sort through about 57% of the original edits in order to identify the 120 false positives.
  • by pieterh ( 196118 ) on Sunday September 26, 2010 @11:45AM (#33703546) Homepage

    This comes from personally maintaining some 200+ wikis on

    There are two kinds of vandals: those in the community of contributors, and those outside it. The first class of vandals cannot easily be detected automatically but when a wiki is actively built, the community will easily and happily fix damage done by these. The second class are usually spammers and come along when the wiki is stale. They are easily detected by the fact that a long static page is suddenly edited by an unknown person. It's very rare to find a real edit happening late after a wiki has solidified. We handle the second type of vandalism trivially by getting email notifications on any edits.

    Trick is, wikis (maybe not Wikipedia but then certainly individual pages) don't have random life cycles but go through growth and stasis.

Think of it! With VLSI we can pack 100 ENIACs in 1 sq. cm.!