Competition Produces Vandalism Detection For Wikis 62
marpot writes "Recently, the 1st International Competition on Wikipedia Vandalism Detection (PDF) finished: 9 groups (5 from the USA, 1 affiliated with Google) tried their best in detecting all vandalism cases from a large-scale evaluation corpus. The winning approach (PDF) detects 20% of all vandalism cases without misclassifying regular edits; moreover, it can be adjusted to detect 95% of the vandalism edits while misclassifying only 30% of all regular edits. Thus, by applying both settings, manual double-checking would only be required on 34% of all edits. Nothing is known, yet, whether the rule-based bots on Wikipedia can compete with this machine learning-based strategy. Anyway, there is still a lot potential for improvements since the top 2 detectors use entirely different detection paradigms: the first analyzes an edit's content, whereas the second (PDF) analyzes an edit's context using WikiTrust."
First Vandal (Score:1, Interesting)
here we are knocking at the friendly gates of wikipedia and now they want to throw us out.
Is there no hospitality in this world ?
ad: Searching for experienced pillagers and fortress busters. Exp 5 years min. Trebuchet skills a plus. Bring your own broadsword.
Manual double checking? (Score:2, Interesting)
There is a pretty simple heuristic (Score:3, Interesting)
This comes from personally maintaining some 200+ wikis on Wikidot.com.
There are two kinds of vandals: those in the community of contributors, and those outside it. The first class of vandals cannot easily be detected automatically but when a wiki is actively built, the community will easily and happily fix damage done by these. The second class are usually spammers and come along when the wiki is stale. They are easily detected by the fact that a long static page is suddenly edited by an unknown person. It's very rare to find a real edit happening late after a wiki has solidified. We handle the second type of vandalism trivially by getting email notifications on any edits.
Trick is, wikis (maybe not Wikipedia but then certainly individual pages) don't have random life cycles but go through growth and stasis.