Follow Slashdot blog updates by subscribing to our blog RSS feed


Forgot your password?

Slashdot videos: Now with more Slashdot!

  • View

  • Discuss

  • Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

Wikipedia News

Competition Produces Vandalism Detection For Wikis 62

Posted by timothy
from the citation-needed dept.
marpot writes "Recently, the 1st International Competition on Wikipedia Vandalism Detection (PDF) finished: 9 groups (5 from the USA, 1 affiliated with Google) tried their best in detecting all vandalism cases from a large-scale evaluation corpus. The winning approach (PDF) detects 20% of all vandalism cases without misclassifying regular edits; moreover, it can be adjusted to detect 95% of the vandalism edits while misclassifying only 30% of all regular edits. Thus, by applying both settings, manual double-checking would only be required on 34% of all edits. Nothing is known, yet, whether the rule-based bots on Wikipedia can compete with this machine learning-based strategy. Anyway, there is still a lot potential for improvements since the top 2 detectors use entirely different detection paradigms: the first analyzes an edit's content, whereas the second (PDF) analyzes an edit's context using WikiTrust."
This discussion has been archived. No new comments can be posted.

Competition Produces Vandalism Detection For Wikis

Comments Filter:
  • by Dan East (318230) on Sunday September 26, 2010 @11:19AM (#33703402) Homepage Journal

    If the algorithm can detect 20% with perfection then that must constitute extremely low hanging fruit. That type of vandalism is just annoyance. It is so obvious that the end user readily recognizes it as such and can skip over it or revert the edit.

    The real issue is disinformation, which is vastly more subtle. The only defense is fact-checking or seeking out references. If the algorithm is capable of recognizing that kind of vandalism then the developers should have the software writing all the articles in the first place, because it'd have to be pretty spectacular to manage that.

  • by Anonymous Coward on Sunday September 26, 2010 @11:20AM (#33703410)

    The people who "own" a page with the assistance of powerful insiders and revert any changes to their "pet" pages, even spelling fixes or simple corrections to bad information?

    Will edits of *those* insiders, who are ruining wikipedia for the rest of us, be flagged by the algorithm as vandalism?

  • top 2 (Score:3, Insightful)

    by trb (8509) on Sunday September 26, 2010 @12:08PM (#33703672)

    Anyway, there is still a lot potential for improvements since the top 2 detectors use entirely different detection paradigms

    This implies that the lower-scoring detectors are less valuable in terms of looking for sources of improvement. That's not true, and that wasn't stated in the paper's "Conclusions" section. If the lowest scoring detector finds 5% of the bad data, and it's a different slice from what the other detectors find, then that's quite valuable.

  • by Grimbleton (1034446) on Sunday September 26, 2010 @12:23PM (#33703756)

    They already compete to be the first to revert edits they disagree with.

  • by bunratty (545641) on Sunday September 26, 2010 @12:29PM (#33703782)
    Care to show us even one article where 99% of good edits are reverted? Remember, that will mean that over 99% of all edits are reverted.
  • Hah, bout time. (Score:3, Insightful)

    by OnePumpChump (1560417) on Sunday September 26, 2010 @02:15PM (#33704384)
    4chan and Somethingawful have been having Wikipedia vandalizing competitions for years. (Usually, whoever's edit or fake article stays the longest wins.)

Real Users find the one combination of bizarre input values that shuts down the system for days.