Please create an account to participate in the Slashdot moderation system

Developing a Vandalism Detector For Wikipedia 116

Posted by kdawson on Sunday February 28, 2010 @04:45PM from the false-positives-would-hurt dept.

marpot writes "In an effort to assist Wikipedia's editors in their struggle to keep articles clean, we are conducting a public lab on vandalism detection. The goal is the development of a practical vandalism detector that is capable of telling apart ill-intentioned edits from well-intentioned edits. Such a tool, which will work somewhat like a spam detector, will release the crowd's workforce currently occupied with manual and semi-automatic edit filtering. The performance of submitted detectors will be evaluated based on a large collection of human-annotated edits, which has been crowdsourced using Amazon's Mechanical Turk. Everyone is welcome to participate."

This discussion has been archived. No new comments can be posted.

Developing a Vandalism Detector For Wikipedia

Load All Comments

Search 116 Comments Log In/Create an Account

Comments Filter:

Existing (Score:3, Insightful)

by ShakaUVM ( 157947 ) writes: on Sunday February 28, 2010 @04:48PM (#31308660) Homepage Journal

Apparently, how their vandalism detector works right now is by automatically reverting any edits done by anonymous editors.
(And yeah, that's a bit sarcastic, but somewhat true.)

Share
twitter facebook
- Re:Existing (Score:4, Interesting)
  
  by Rik Sweeney ( 471717 ) writes: on Sunday February 28, 2010 @04:53PM (#31308704) Homepage
  
  It's called Clue Bot. It's been known to revert vandalism in under 30 seconds :)
  
  Parent Share
  twitter facebook
  - Re:Existing (Score:5, Insightful)
    
    by broken_chaos ( 1188549 ) writes: on Sunday February 28, 2010 @04:55PM (#31308726)
    
    I'm assuming it's also known to revert good edits in under 30 seconds?
    Just thinking out loud here, but is raw speed of reversion really what should be bragged about, as opposed to accuracy?
    
    Parent Share
    twitter facebook
    - Re:Existing (Score:5, Informative)
      
      by Rik Sweeney ( 471717 ) writes: on Sunday February 28, 2010 @04:59PM (#31308750) Homepage
      
      Further Reading
      http://en.wikipedia.org/wiki/User:ClueBot [wikipedia.org]
      
      Parent Share
      twitter facebook
      - Re:Existing (Score:4, Informative)
        
        by broken_chaos ( 1188549 ) writes: on Sunday February 28, 2010 @05:09PM (#31308836)
        
        Oh yes, it definitely hits a large number of false positives, presumably also 'fixed' within 30 seconds. For every one that goes reported [wikipedia.org] (including the hundreds or thousands of archived reports), there must be many that go unreported, by 'non-Wikipedians' who edited a page with an error, and then went on their way. Or by people who didn't stick around to 'watch' that their edit doesn't get 'fixed' by an automated process...
        
        Parent Share
        twitter facebook
        
        Re:Existing (Score:4, Informative)
        
        by Ignorant Aardvark ( 632408 ) writes: <cydeweys AT gmail DOT com> on Sunday February 28, 2010 @05:14PM (#31308872) Homepage Journal
        
        The false positive rate on the anti-vandalism bots is a lot lower than you would think. The bots are written quite conservatively, take a lot of factors into account, and only pull the revert trigger when they are quite sure.
        It's the type II error rate that's pretty high. Unfortunately, that's not solvable without strong AI.
        
        Parent Share
        twitter facebook
        
        Re:Existing (Score:5, Informative)
        
        by marpot ( 1311479 ) writes: on Sunday February 28, 2010 @05:38PM (#31309048) Homepage
        
        We have studied the accuracy of ClueBot, and found that (on a small corpus) it has very good precision (low falsy positive rate), but a very low recall (low true positive rate). (see: http://www.uni-weimar.de/medien/webis/publications/downloads/papers/stein_2008c.pdf [uni-weimar.de]) But the picture might look quite different on a large scale.
        
        Parent Share
        twitter facebook
        
        Re:Existing (Score:5, Insightful)
        
        by beakerMeep ( 716990 ) writes: on Sunday February 28, 2010 @06:41PM (#31309506)
        
        The problem is not so simple though. You cant quantify something as subjective as vandalism. You cant reduce it to your mathematical formula no matter how statistically fancy your 6 page pdf is.
        
        I had a particularly nasty run it with cluebot where I removed large portions of spam from an article, only to have cluebot revert it back and put the spam back in. When I again removed the spam, some other editor strolled by and again put the spam back in because he trusted the bot more than humans and he didnt read the talk page where many had requested the removal of this spam. Finally, after a rather rude conversation with the human he realized he had no business reverting it. This person was a long time editor and contributor too but it just serves as an example that any criteria used to determine spam is based upon assumptions. Assumptions that it will be true in other cases and assumptions that others will agree with the classification.
        
        The whole point of Wikipedia is that it is a community edited encyclopedia. I have no interest in a computer edited encyclopedia. If people want to program bots to review an editor's work, perhaps we should program bots to write the work? Perhaps you can call it Botopedia. Furthermore, many of the bots ask you to report false positive to their personal pages off of Wikipedia's website on some other .com or .edu domain. They ask you to be accountable to them, but who are they accountable to? What's to stop spammers from programming bots to annoy editors as a phishing exercise?
        
        Now don't get me wrong though, if someone wants to use a bot to aid in finding vandalism, that would help. But if the system is so frail that Wikipedia cant exist without computer program editors, It may be time to revisit the system. As others have stated, pushing edits into a queue would be much more sane than direct to live edits.
        
        Editing bots are wrong for Wikipedia, and if they allow it they are letting go of their vision of community participation in favor of the visions (or delusions) of grand technological solutions.
        
        Parent Share
        twitter facebook
        
        Re: (Score:1)
        
        by marpot ( 1311479 ) writes:
        
        I cannot agree more with what you say, but I'd like to give it a twist: I want computers to assist me, and I want them to to it good, reliable, and robust. If I happen to be a Wikipedia editor that doesn't change a thing, I still want the computer to assist me with what I'm doing. Now, currently there is no such thing, and the only thing I'd like to foster research in doing so.
        Now, some always go ten steps further, when someone talks about a new "solution" based on computers. They directly envision a worl
        
        Re: (Score:2, Funny)
        
        by The Wild Norseman ( 1404891 ) writes:
        
        I cannot agree more with what you say, but I'd like to give it a twist: I want computers to assist me, and I want them to DO it WELL, RELIABLY, and ROBUSTLY.
        -Slashbot Editor 0.95 beta
        
        DWIM, PDCH (Score:2, Funny)
        
        by symbolset ( 646467 ) writes:
        
        You're looking for a DWIM (Do What I Meant) interpreter with PDCH (Predictive Digital Concierge Heuristics). While the technology is available it's currently quite costly. Bugs, errata, and maintenance can deliver less than an optimal experience. Might I instead offer you this mail order bride? We have imported personal assistants in stock from less privileged nations - and if you have the means we can outsource minute-to-minute management of them to our Bangalore VPDT (Virtual Presence Discipline Team).
        
        Re: (Score:1, Troll)
        
        by Moryath ( 553296 ) writes:
        
        Finally, after a rather rude conversation with the human he realized he had no business reverting it.
        And if the editor had been one of wikipedia's "admins", he would have simply gone "ban. lock talkpage." And he'd have gone right on his merry way to abuse someone else.
        Now don't get me wrong though, if someone wants to use a bot to aid in finding vandalism, that would help. But if the system is so frail that Wikipedia cant exist without computer program editors, It may be time to revisit the system. As oth
        
        Re: (Score:2)
        
        by Eivind ( 15695 ) writes:
        
        I dunno. I find a bot okay, but it should be extremely conservative, because it's so bad if it reverts edits that are in fact made in good faith (even if the edits themselves are bad). It's possible Cluebot isn't conservative ENOUGH, but if you have a look at say the last 100 edits it's made, it's really hard to argue that they're not 99%+ bad-faith vandalism.
        
        Re: (Score:3, Insightful)
        
        by Tango42 ( 662363 ) writes:
        
        In that paper, you say you think high-recall (ie. low false negatives) should be preferred to high-precision (low false positives) since it reduces the chance of a reader seeing a vandalised version. I disagree. You underestimate the harm caused by losing editors that get annoyed when their legitimate edits are reverted by a bot. The upcoming feature, Flagged Revisions ( http://en.wikipedia.org/wiki/Wikipedia:Flagged_revisions [wikipedia.org] ), will provide a much better way of preventing readers from seeing vandalised ve
        
        Re: (Score:1)
        
        by LQ ( 188043 ) writes:
        
        I'm an occasional "recent changes patroller" and I don't really care how many false positives cluebot gets in anonymous edits. It's too busy weeding out the thousands of "Bob is gay" and "I like pie" edits. Why they still allow anonymous "editors", I really don't know.
    - Re: (Score:3, Informative)
      
      by Ignorant Aardvark ( 632408 ) writes:
      
      I'm not sure why he bragged about reversion speed. All that's really dependent on is your network connection. For one, your network connection has to be good enough to download, in real time, the diffs of all edits to Wikipedia. Most aren't.
      Anyway, a decision as to whether a given diff is vandalism or not needs to be made in a small fraction of a second, as there are dozens of edits coming in every second, and if you continuously fall farther and farther behind, you lose. Given an ideal network connecti
      - Re: (Score:2, Interesting)
        
        by marpot ( 1311479 ) writes:
        
        This is by far overestimated. Dependent on how elaborate your edit model ist, you can analyse edits live on a laptop.
        
        Re: (Score:2, Offtopic)
        
        by Ignorant Aardvark ( 632408 ) writes:
        
        Which part is over-estimated? All I can speak on from experience is AntiVandalBot. I ran that on an Athlon XP 2500+ (which wasn't particularly amazing at the time). It wasn't the computation that was hard, it was the network usage of downloading the diff of every edit by a non-trusted user from the RC feed. I would not have been able to run it on any home Internet connection. Thankfully I was able to place my server on an unthrottled 100 Mbps dorm connection at the University of Maryland.
        I will grant y
        
        Re: (Score:1)
        
        by marpot ( 1311479 ) writes:
        
        Me too, experience that is. We tooke the feauteres from our research with high througput, and implemented a live edit analysis for the English portion of Wikipedia. It listens on the IRC channel, downloads edits wikitexts of old and new revision, and then does its magic. And it did so once on an old laptop. The computer was connected at max 1 GBit/s.
  - Re: (Score:3, Funny)
    
    by Yvan256 ( 722131 ) writes:
    
    A Clue Bot, eh? I wonder what happens if you register the username "Colonel Mustard".
    - Re: (Score:2)
      
      by v1 ( 525388 ) writes:
      
      A Clue Bot, eh?
      Every time I see that in this thread, my eyes substitute "Clue Bat" and it totally changes meaning of the post while still making some degree of sense, making it hard to filter out.
      The base problem here is going overlooked. There isn't one kind of edit they're trying to combat, there's several. And each requires a different approach because they are incompatible.
      1- Spam (monte pithon kind) : ok that's easy for the bots to get rid of. even very loose definitions are easy to code with a good
- Re:Existing (Score:4, Interesting)
  
  by DamonHD ( 794830 ) writes: <d@hd.org> on Sunday February 28, 2010 @05:18PM (#31308902) Homepage
  
  Amazingly my small sample is to the contrary.
  I fix small errors of syntax/grammar/fact when I run across them, have never created an account, and almost all of my edits seem to stick.
  Rgds
  Damon
  
  Parent Share
  twitter facebook
  - Re: (Score:1)
    
    by jonadab ( 583620 ) writes:
    
    Yeah, my edits generally stick too, and almost all of them are anon/IP, not because I haven't created an account, but because Wikipedia's session-timeout policy is so short that logging in seldom does any good. You can log in, but by the time you're ready to commit an edit, you're typically not logged in any more. I can't imagine what possessed them to make it so short. I've got better things to do than log in *again* each and every time I want to make an edit. So I usually don't bother. And it doesn't
- Re: (Score:2, Interesting)
  
  by Big Jojo ( 50231 ) writes:
  
  Apparently, how their vandalism detector works right now is by automatically reverting any edits done by anonymous editors.
  I've seen signs of that too. Not always ... but often enough to have acquired a rather negative understanding of the role of some folk with admin privileges at WP. It's clear when they haven't even bothered to read (much less understand!) the edits they revert. Or that they just revert anything that offends an ideology they want WP to present on any particular topics. They think NPV
Been done? (Score:2, Informative)

by Ignorant Aardvark ( 632408 ) writes:

Whoever posted this clearly isn't aware of the actual work being done in the field. For instance, I was running an anti-vandalism bot [wikipedia.org] in 2006, and it wasn't new at the time. They've gotten gotten much more sophisticated since then.
Why are they so intent on reinventing the wheel? Do they not even realize that the wheel exists already? Why not just improve on it instead?
- Re:Been done? (Score:5, Funny)
  
  by Kratisto ( 1080113 ) writes: on Sunday February 28, 2010 @05:04PM (#31308794)
  
  The article on anti-vandalism bots had been recently vandalized when they were doing their preliminary research.
  
  Parent Share
  twitter facebook
- Re: (Score:2, Informative)
  
  by marpot ( 1311479 ) writes:
  
  We are very aware of the existing tools (Huggle, Twinkle, and so on). See the links in the above post, and see the links in the resources section of the competition Web page. An accurate vandalism detector will take a lot of research an development, just like spam detectors did... Why did you stop developing your tool, anyway?
  - Re: (Score:2)
    
    by Tango42 ( 662363 ) writes:
    
    Huggle and Twinkle are tools to help humans deal with vandalism. AntiVandalBot and ClueBot, etc., are bots that deal with (the most obvious) vandalism themselves. They are very different things.
    - Re: (Score:1)
      
      by marpot ( 1311479 ) writes:
      
      Exactly, but both kinds of tools need to solve the same underlying problem: given an edit, is it vandalism? The better those tools answer this question, the more time of Wikipedia editors is saved.
      - Re: (Score:2)
        
        by Tango42 ( 662363 ) writes:
        
        No, they solve very different problems. Something like Huggle needs to work out if a given edit can be almost guaranteed *not* to be vandalism (usually because the editor is on a whitelist), everything else gets shown to a human. The important thing for something like Huggle is making it easy for humans to review edits, not judging the edits automatically in any way. Something like ClueBot needs to work out if it can almost guarantee that a given edit *is* vandalism. They are very different.
- Re: (Score:2)
  
  by pipatron ( 966506 ) writes:
  
  Why are they so intent on reinventing the wheel? Do they not even realize that the wheel exists already? Why not just improve on it instead?
  Sometimes it's more practical to start from scratch. You might want to change the design from the ground up, and to do that with an already working bot would not be as constructive. The current bots are probably very well tweaked and polished, for their given design and methods of spam detection.
  - Re: (Score:3, Insightful)
    
    by LifesABeach ( 234436 ) writes:
    
    One of the quirks I've noticed is when a business makes, or invents something, then uses the Wiki to advertise. I can't help but wonder is this could also be considered a form of vandalism?
    - Re:Been done? (Score:4, Interesting)
      
      by pipatron ( 966506 ) writes: <pipatron@gmail.com> on Sunday February 28, 2010 @08:13PM (#31310328) Homepage
      
      I edit wikipedia occasionally, and one thing I remove is unmotivated links to companies, or unnecessary mentioning of specific products. So yes, I consider it a case of vandalism. Since my edits are usually (always?) kept, I think most people agree. There is probably some policy about it, but I act on common sense there.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by Marcika ( 1003625 ) writes:
        
        I edit wikipedia occasionally, and one thing I remove is unmotivated links to companies, or unnecessary mentioning of specific products. So yes, I consider it a case of vandalism. Since my edits are usually (always?) kept, I think most people agree. There is probably some policy about it, but I act on common sense there.
        The policy is on the page Wikipedia:Spam [wikipedia.org], quite logically. It's probably one of the oldest official polices, given that it was already needed back in 2003...
    - Re: (Score:2)
      
      by cerberusss ( 660701 ) writes:
      
      One of the quirks I've noticed is when a business makes, or invents something, then uses the Wiki to advertise. I can't help but wonder is this could also be considered a form of vandalism?
      It's advertising if they use advertisement text. Recently, I edited the article on the LEON processor. It originally had texts like:
      "It offers all basic functions of a pipelined in-order processor, making it a good experimentation vehicle."
      "Making it a good experimentation vehicle" for who? What type of experiments? What is good? How is it measured?
      It's very interesting if the bot could see the difference between such texts.
- Nice template (Score:5, Funny)
  
  by MillionthMonkey ( 240664 ) writes: on Sunday February 28, 2010 @06:09PM (#31309298)
  
  Whoever posted this clearly isn't aware of the actual work being done in the field. For instance, I was running a ___[thing]___ in _[year]_, and it wasn't new at the time. They've gotten much more sophisticated since then. Why are they so intent on reinventing the wheel? Do they not even realize that the wheel exists already? Why not just improve on it instead?
  * * *
  This looks like a useful template for the standard "why reinvent the wheel" Slashdot post; I hope you don't mind if I reuse it.
  
  Parent Share
  twitter facebook
Wikipedia needs a Flash editor (Score:1, Insightful)

by Anonymous Coward writes:

Wikipedia, the encyclopedia that anyone can edit - in my ass.
Harry Potter:
"The novels revolve around [[Harry Potter (character)|Harry Potter]], an orphan who discovers at the age of eleven that he is a wizard.{{cite web|url=http://edition.cnn.com/2000/books/reviews/07/14/review.potter.goblet/|title=Review: Gladly drinking from Rowling's 'Goblet of Fire'|date=14 July 2000|publisher=CNN|accessdate=28 September 2008}} Wizard ability is inborn, but children are sent to wizarding school to learn the magical skil
- Re: (Score:3, Insightful)
  
  by MillionthMonkey ( 240664 ) writes:
  
  On the other hand, do gibberish pages like this need much more editing, or is Harry Potter's Wikipedia entry basically finished as far as anyone cares?
- Re: (Score:2)
  
  by bertok ( 226922 ) writes:
  
  Wikipedia, the encyclopedia that anyone can edit - in my ass.
  Actually, it's possible to make a wysiwyg editor for Wiki markup in HTML with a little Javascript, there's no need for Flash!
  It's not even hard, I did one for a corporate project in about a week, and I'm by no means an expert at Javascript.
  It's even possible to do a split-screen view where it shows you the markup AND the preview, and the user can edit either.
  The trick is that doing this has a prerequisite: the wiki syntax has to have a nice unambiguous grammar, and you need a parser generator that can emit
- Re: (Score:2)
  
  by jgrahn ( 181062 ) writes:
  
  Wikipedia, the encyclopedia that anyone can edit - in my ass. [---] I am convinced that the current state of affairs is a conscious choice. The way to maximise 'insider power' and minimize 'outsider power' is to make editing as hard as possible, and the rules and traditions needed not to be revoked as many as possible.
  Yes. That was also the main driving force behind RUNOFF, troff, TeX, LaTeX, HTML and all other non-WYSIWYG systems back into the 1960s. It's a conspiracy.
  Seriously: no. It's just that it's
How about an Admin Abuse Detector? (Score:3, Insightful)

by Anonymous Coward writes: on Sunday February 28, 2010 @05:04PM (#31308800)

I've had many more problems with admin abuse than vandalism. Vandalism is quick and easy to deal with. Admins are the biggest problem in Wikipedia editing; they have no accountability and abuse their power.
How about a log of each admin's activities, including reversions, bans, etc, and a way for non-admins to challenge actions (without spending countless hours in an appeal process worthy of a federal court).

Share
twitter facebook
- - Re: (Score:2)
    
    by MillionthMonkey ( 240664 ) writes:
    
    In 1993, Wells published Mary Ann's Gilligan's Island Cookbook with co-writers Ken Beck & Jim Clark, including a foreword by Bob Denver, to whom she had sent an envelope of marijuana through the mail years earlier.
    
    There, that wasn't so hard. (BTW, to all you cookbook writers out there: I can write a good foreword.)
  - Re: (Score:2, Insightful)
    
    by Neoprofin ( 871029 ) writes:
    
    Seems to me like one user is trying to add a highly bias account of a single incident in her life that is many times longer than the rest of the article and throwing a screaming fir when multiple Admins tell him that it would be in violation of multiple measures of quality. Further mention of an "edit war" implies to me that the user tried to force his section in after repeated warnings and when told to file an RfC he just continues to argue first with just about anyone he can find.
    
    Those power abusing fu
- Re: (Score:1, Informative)
  
  by Anonymous Coward writes:
  
  I've had many more problems with admin abuse than vandalism. Vandalism is quick and easy to deal with. Admins are the biggest problem in Wikipedia editing; they have no accountability and abuse their power.
  How about a log of each admin's activities, including reversions, bans, etc, and a way for non-admins to challenge actions (without spending countless hours in an appeal process worthy of a federal court).
  What are you talking about? All users have logs that track their actions:
  http://en.wikipedia.org/wiki/Special:Contributions/Jimbo_Wales
  http://en.wikipedia.org/w/index.php?title=Special%3ALog&type=&user=Jimbo+Wales&page=&year=&month=-1&tagfilter=
  Actions can be challenged at any point on the talk page or the administrator boards.
- Re:How about an Admin Abuse Detector? (Score:5, Informative)
  
  by OverlordQ ( 264228 ) writes: on Sunday February 28, 2010 @07:59PM (#31310226) Journal
  
  How about a log of each admin's activities, including reversions, bans, etc, and a way for non-admins to challenge actions (without spending countless hours in an appeal process worthy of a federal court).
  Reversions: http://en.wikipedia.org/wiki/Special:Contributions [wikipedia.org]
  Bans: http://en.wikipedia.org/wiki/Special:Log/block [wikipedia.org]
  Deletes: http://en.wikipedia.org/wiki/Special:Log/delete [wikipedia.org]
  Anything else you're too lazy to find yourself?
  
  Parent Share
  twitter facebook
  - In Wikipedia, everything is transparent (Score:5, Informative)
    
    by saibot834 ( 1061528 ) writes: on Sunday February 28, 2010 @09:17PM (#31310760)
    
    If I had mod points, I'd mod the parent up and the grandparent down. Seriously, almost everything in Wikipedia is transparent. Search the revision history and logs and look for the information you need. RTFM.
    A lot of people on /. seem to derive very general opinions about admins from a personal disappointing encounter. They do not include diffs of their edits or their username. From my experience in most cases the guy who got reverted by an admin broke some kind of rule (and often enough they just got reverted by a regular non-admin, but they assume it was an admin). Instead of RTFM those people post as AC complaining generally about admins without providing any traceable cases of admin abuse. I know my opinion isn't very popular, but unless you give concrete examples your allegations are just FUD.
    
    Parent Share
    twitter facebook
    - Re: (Score:2, Insightful)
      
      by Jedi Alec ( 258881 ) writes:
      
      Hey, this is Slashdot. We're qualified to discuss any subject we damn well please based on our own prejudices and assumptions, while pretending that our high IQ's and common sense qualify us to pretend we're experts on whatever the discussion may or may not be about. What right do wiki admins have to assault our ivory towers when we sprinkle our droplets of distilled wisdom on their pages as well?
  - Re: (Score:1, Interesting)
    
    by Anonymous Coward writes:
    
    I'm the OP.
    Anything else you're too lazy to find yourself?
    I recognize that voice anywhere; you must be a Wikipedia Admin. I've been editing Wikipedia for years, but didn't know about the second two lists (the first isn't really a list of reversions, but perhaps there's a way to make it work). If I don't, then I suspect many others don't.
    Which brings us back to my point: Those lists need to be part of a system -- an easily accessible, understandable system -- "for non-admins to challenge actions (without spe
- Re: (Score:2)
  
  by Hurricane78 ( 562437 ) writes:
  
  If you think about it, it’s not much different form a country with total censorship. This small establishment’s view always overrides over everybody else. And they massively make use of that power.
  As I said: As long as it is even possible for a subset of humanity, to control what’s going onto Wikipedia, it can by definition not be the encyclopedia for all of humanity.
  It’s obvious that to solve this, central servers and admins are out of the question... resulting in a P2P system of ca
- Re: (Score:3, Informative)
  
  by Ignorant Aardvark ( 632408 ) writes:
  
  In response to whether those two examples are vandalism, the answer is no, they are not.
  You'd need a strong AI to be able to make those determinations, and if such a thing existed, it'd make more sense just to have the strong AI write the encyclopedia.
  What we're talking about here is obvious vandalism (blanking, insertion of curse words, etc.) of the type that can be detected by an algorithmic/heuristic program.
- The Art and Science of Wikipedia Vandalism (Score:5, Interesting)
  
  by MillionthMonkey ( 240664 ) writes: on Sunday February 28, 2010 @05:51PM (#31309146)
  
  There is an art to Wikipedia abuse. If someone cites a Wikipedia article in some argument they're making, you can always just go to Wikipedia and edit the page so that they're wrong. But that's what a novice Wikipedia vandal does.
  
  A pro knows to edit the article in a very subtle way, so that it looks like the person has poor reading comprehension. Let's say the person cites a Wikipedia article with a sentence like this, in order to support the argument that Colbert is a Democrat.
  
  Although by his own account he was not particularly political before joining the cast of The Daily Show, Colbert is a self-described Democrat.[12][13]
  
  This bears the mark of authority, because of the footnote subscripts that are already on it. (We can skip the step where we maliciously relocate them here.)
  
  A novice might change it to this (correctly preserving the authoritative footnote superscripts):
  
  Although by his own account he was not particularly political before joining the cast of The Daily Show, Colbert is a self-described Republican.[12][13]
  
  It makes the person appear to be wrong- and the vandalism is obvious- like swapping Eurasia for Eastasia. There's no way he could have misread that.
  
  But change it to this
  
  Although by his own account he was not particularly political before joining the cast of The Daily Show, Colbert has even been described as a Democrat.[12][13]
  
  and the person looks not only wrong, but plausibly wrong because it looks like he can't read. That's what makes successful Wikipedia vandalism an art.
  
  Parent Share
  twitter facebook
Should work. Bogofilter for autotagging emails (Score:2)

by Colin Smith ( 2679 ) writes:

Bayesian statistics are an interesting thing. Mwhwhwhwhaaaa. Who thought they would say that about stats?
Anyway. you can tell spam with a remarkably high degree of accuracy... Guess what. You can tell "Important" and "friends" emails with a similar degree of accuracy (you define what's important or who are friends). No offence to most vandals (of any type), but usually they are complete fuckwits. I suspect they and what they write are probably even more predictable than spammers.
- Re: (Score:2)
  
  by EdIII ( 1114411 ) * writes:
  
  The goal is the development of a practical fuckwit detector that is capable of telling apart ill-intentioned posts from well-intentioned posts.
  You gave me a good idea....
Step One (Score:5, Insightful)

by owlnation ( 858981 ) writes: on Sunday February 28, 2010 @05:16PM (#31308884)

Before any more detectors are rolled out, how about they come up with a workable definition of vandalism? And actually use it fairly, ethically and logically.

There's a great deal of evidence to suggest the current definition of "vandalism," is something a wikiadmin decides he just doesn't like, or disagrees with, or in some way interferes with his power-trip.

Share
twitter facebook
- Re: (Score:2)
  
  by gbjbaanb ( 229885 ) writes:
  
  It should be inaccurate revisions, however who is to say that a revision is inaccurate or not. We could have a panel of experts for each given topic, but that'd only work if you divided WP up into sections and had an admin sitting like a judge on each section.
  As a result: ""vandalism," is something a wikiadmin decides he just doesn't like, or disagrees with, or in some way interferes with his power-trip."
  - - Re: (Score:1, Informative)
      
      by Anonymous Coward writes:
      
      mod parent up
- Re: (Score:2, Insightful)
  
  by Anonymous Coward writes:
  
  I completely agree. The worst vandalism on wikipedia is done by self righteous page owners and admins on power trips that hate to be corrected. I used to help out on a number of pages (areas where I am a genuine expert not just someone with an opinion) but having my updates constantly deleted just got too frustrating, now I just make sure people in my field know not to use wikipedia.
- Re: (Score:2)
  
  by Homburg ( 213427 ) writes:
  
  There's a great deal of evidence to suggest...
  And yet you don't include any reference to this supposed evidence.
  - Re: (Score:1)
    
    by Nihiltres ( 1161891 ) writes:
    
    Or, in other words, [citation needed]. (also, is [citation needed] a meme when discussing Wikipedia? ) There's a wide variety of material that will result in reverts or blocks that isn't really vandalism, though. Behaviour that's disruptive, trolling, a breaching experiment, etc. will elicit roughly the same response as vandalism, and that needs to be taken into account both for automatic vandalism-repair systems (should this process treat it as vandalism?) and for making the statement that vandalism is i
The problem is the edits going live... (Score:2, Interesting)

by Anonymous Coward writes:

Right now, you can think of wikipedia as having two columns per article - first is the working article column, with the second being the discussion column.
What we really need is a third column, one for the currently published version of the article.
While this may not be popular, it would go a long way to getting rid of the spam, and might even solve some of the other issues facing wikipedia.
With such a system, you could even assign articles to a subject matter expert as the editor, who could approve change
- Re:The problem is the edits going live... (Score:4, Informative)
  
  by Shoe Puppet ( 1557239 ) writes: on Sunday February 28, 2010 @06:04PM (#31309242)
  
  A system like this has been implemented for the German Wikipedia. Almost everybody who has an account can verify articles to be vandalism-free, unless you are logged in you see the last verified version by default.
  
  Parent Share
  twitter facebook
  - Re: (Score:1)
    
    by s1lverl0rd ( 1382241 ) writes:
    
    And does it work?
    - Re: (Score:2)
      
      by Vintermann ( 400722 ) writes:
      
      I don't know, but I'll say this for German Wikipedia: It's a much better piece of work in my opinion. You can find huge articles with lots of great information on obscure topics, but which are written by "true fans" in a slightly unorthodox style - stuff that would be deleted in a heartbeat on English wikipedia. I don't know what they are doing, but they appear to be much more successful at accepting casual contributions.
      - Re: (Score:1)
        
        by Shoe Puppet ( 1557239 ) writes:
        
        My experience is very different: When looking for obscure topics, I usually head straight to the English one since the German one is often the only one that does not consider the topic to be "notable" enough for an article.
quite a bit of work on this (Score:3, Interesting)

by Trepidity ( 597 ) writes: <delirium-slashdo ... minus physicist> on Sunday February 28, 2010 @05:47PM (#31309110)

Since the problem is tantalizingly easy to frame as a standard data-mining or machine-learning problem, albeit with some quirks, there's quite a lot of work from a lot of research groups that seems to be looking at it. Some examples: one [upenn.edu], two [google.com], three [google.com], four [ucsc.edu], five [arxiv.org], six [google.com], seven [acm.org].

Share
twitter facebook
- Re: (Score:1)
  
  by marpot ( 1311479 ) writes:
  
  Your right, it's machine learning, data mining, NLP, and information retrieval. But the fun thing is turning a research prototype into a tool that can be left alone most of the time. That hasn't happened yet. Also, research on this problem hast started only in 2008, rule-based tools developed by Wikipedians are there since 2006. All the works you listed are acutally all there is! That's not much to work with, is it?
A good step forward (Score:1)

by allo ( 1728082 ) writes:

If it stops Deletionists from deleting well-intended edits. Better a short article than no article.
The Answer has existed for years (Score:1)

by jhary-a-conel ( 1101325 ) writes:

It was just too visionary for its time http://www.everytopicintheuniverseexceptchickens.com/ [everytopic...ickens.com]
An arms race? (Score:2, Interesting)

by fysdt ( 1597143 ) writes:

I believe that vandalism on Wikipedia can be limited. But would it really be possible to detect all kinds of vandalism?

FTA:
"Yahoo! Research will award a cash prize of 500 Euros to the winner of the plagiarism detection task. "

500 Euro's doesn't sound much for detecting plagiarism on a site like Wikipedia...
- Re: (Score:2)
  
  by LtGordon ( 1421725 ) writes:
  
  I believe that vandalism on Wikipedia can be limited. But would it really be possible to detect all kinds of vandalism?
  Without strong AI, the system can only really look for statistical and language patterns for clues on vandalism.
  If I replace an entire body section on the Fox News page with "GLENN BECK BLOWS GOATS", I would hope that a vandalism detector would flag this. If, however, I randomly insert the sentence "Glenn Beck has also been accused of inappropriate relations with barnyard animals" into a large section, then automated detection comes down to statistics or one hell of a clever context algorithm.
What counts as vandalism on Wikipedia? (Score:2)

by cptnapalm ( 120276 ) writes:

I ask because I don't know. I can see turning a page into a screed as vandalism, but that doesn't differ greatly from many of the wikipedia articles that I've read; quite a few of them are overwhelmingly dedicated to hostility to the topic or advocates of the topic. Earlier today, when I was reading the news, there was a link to the Wikipedia article on the Tea Party movement: well over half of the article was dedicated to quotes from anti-Tea Party people (MSNBC, NYT, LAT, etc.) spouting off hostility to
- Re:What counts as vandalism on Wikipedia? (Score:4, Insightful)
  
  by Tango42 ( 662363 ) writes: on Sunday February 28, 2010 @07:13PM (#31309768)
  
  Officially, vandalism is defined as edits made in bad faith. If you are trying to improve the article but are an idiot (which includes people that don't realise their own bias), that isn't vandalism, it's just idiocy. It is only if you are editing with the intention of making the article worse that you are vandalising.
  
  Parent Share
  twitter facebook
  - qualifying your adversary (Score:2)
    
    by epine ( 68316 ) writes:
    
    Officially, vandalism is defined as edits made in bad faith.
    In other words, the scope of the problem does not include discovering the cure for human stupidity, however laudable that might be.
    Furthermore, people here are failing to apply the 80-20 rule: if you can clean up 80% of the vandalism at 20% of the human effort currently expended, the attention available to deal with the difficult twenty percent would more than triple. I've seen entire pages replaced with the word "penis" or a crass four word comment about some pimple twit schoolmate. There's a lot of low
- Re: (Score:2)
  
  by WolfWithoutAClause ( 162946 ) writes:
  
  The Wikipedia is trying to fairly reflect the reliable sources multiple positions so including 'spouting off' is not necessarily vandalism, if the neutral point of view of the reliable sources is that there is some hostility to the tea party.
nuances and language mechanisms galore (Score:2)

by icepick72 ( 834363 ) writes:

As soon as you start trusting a vandalism detector over manual monitoring a lot of stuff will start to slip through, gets through the news, then the detector won't be trusted any longer. It will have a short life but will be interesting to watch.
Sew m@ny things that can bee done to bypass mechanisms. Even simple euphemisms like cleaning the old rifle http://images.clipartof.com/small/5039-Man-Cleaning-Inside-The-Barrel-Of-His-Unloaded-Rifle-Gun-Clipart.jpg [clipartof.com] ...are sure to slip through. There are so many lang
Yeah but what about when it's not vandalism ... (Score:3, Funny)

by PaganRitual ( 551879 ) writes: <splaga@@@internode...on...net> on Sunday February 28, 2010 @07:15PM (#31309778)

... but the truth?
http://en.wikipedia.org/w/index.php?title=Nick_Xenophon&oldid=326486984#As_federal_senator [wikipedia.org]

Share
twitter facebook
Well-intentioned edits? (Score:2)

by MSTCrow5429 ( 642744 ) writes:

There are well-intentioned edits on Wikipedia? Even if there were, how could you tell...
Vandalism Detector Unecessary? (Score:1)

by sixknowspring ( 1740030 ) writes:

From my experience with contributing to Wikipedia, and from reading some of the talkback (is that what they're called?) discussions, I don't think there's much need for such a tool; there seems to be an elite class of Wiki users that delete anything that they deem unworthy while giving the most bizarre reasons for doing so.
Solution: color coding for edits (Score:2)

by nephridium ( 928664 ) writes:

I still think the best solution would be a color coding overlay over the text that would show the reader immediately 1.) how trustworthy the author has been and 2.) how long before the edit has been done (without being reverted). That way it would be easy to see the sections written by reputable authors who have always added useful info and distinguish it from "amendments" that have been entered just a few minutes ago by an anonymous coward.

And for those who do not want to log in to edit, that would be f
Vandalize the Vandals?? (Score:1)

by draco_00 ( 198022 ) writes:

Who cares 90% of the info on those sites are bougus anyway, it's like trying to fix the preputally broken!!
Total waste of time (Score:1)

by BradMajors ( 995624 ) writes:

Wikipedians administrators don't seem to have a clue about the effects of vandalism.
The time wasted by humans who's job is solely to revert vandalism is irrelevant. There are more than enough people who are willing to do this work and if they weren't doing this work they would not be contributing useful content to Wikipedia.
The negative effects are concentrated on the knowledgeable editors who are adding useful new content. There may be 5 to 10 persons activietyl adding content to an article. Each time a
Vandalism, as defined by Wikipedia, (Score:2)

by Hurricane78 ( 562437 ) writes:

is everything that the admin establishment doesn’t agree with. Just like in a state with total censorship.
And on top of that, the admins often don’t know shit about anything.
Which is not surprising, considering that they most likely sit in underpants in their basement all day long. Why else would they have so much time to troll around Wikipedia on a deletion spree? Which is obviously not a very mentally healthy thing to do either.
It’s simple: As long as Wikipedia can at all be controlled b
it's good at detecting OBVIOUS vandalism (Score:2)

by capoccia ( 312092 ) writes:

there is a subset of vandalism that a bot can be very good at detecting. this bot can never handle every kind of vandalism. for example, adding some subtly false statement to a biographical article, but spelling everything correctly, using correct grammar and adding something that looks like it could be a legitimate source is difficult for even human editors to recognize as vandalism.
adding 1s everywhere or deleting the entire article is very easy to detect.
- Re: (Score:1)
  
  by s1lverl0rd ( 1382241 ) writes:
  
  Luckily, there is a lot more obvious vandalism than there is vandalism of the sneaky kind.
Come on... (Score:1)

by GofG ( 1288820 ) writes:

Rogue admins abusing their power? An "in" club? If you have a problem with an admin, provide evidence (a diff of the admin abusing his power) here [wikipedia.org]. Follow the case, argue it out, and the admin will be dealt with. Every admin is elected in, guys. If you think Wikipedia is important enough that all the scary "rogue admins" are actually doing harm, go become a part of the election process. Anyone can vote, and your opinion matters regardless of how many edits you have, or how many articles you've worked on.
Automated vs waiting for a human (Score:1)

by tawker ( 860711 ) writes:

As owner of one of the first vandalism reverto bots out there (although pattern speaking, tawkerbot2 didn't do nearly as much as CB) the first take there was if you remove the perceived vandalism almost immediately people don't get any fun out of vandalizing and stop doing it. There was massive opposition at the offset, but then, as volumes increased, people began to freak when the bot was non operational. Yes, it had false positives which needed to be dealt with, but if I recall correctly, statistically
- {{uw-vandalism1}} (Score:5, Funny)
  
  by Anonymous Coward writes: on Sunday February 28, 2010 @04:57PM (#31308738)
  
  Welcome to Slashdot. Although everyone is welcome to contribute to Slashdot, at least one of your recent posts did not appear to be constructive and has been modded down. Please use TrollTalk [slashdot.org] for any test edits you would like to make, and read the welcome page to learn more about contributing constructively to this web site. Thank you.
  
  Parent Share
  twitter facebook
  - take your POV somewhere else (Score:1, Troll)
    
    by H4x0r Jim Duggan ( 757476 ) writes:
    
    Just because the tag exists, doesn't mean you can slap it everywhere you see an edit that doesn't support your world view! You deletionists are ruining Wikipedia for the rest of us. Assume Good Faith!
    - Re: (Score:1, Funny)
      
      by Anonymous Coward writes:
      
      Assume Good Faith!
      I have candy. Get in the van.
      - Re: (Score:1)
        
        by Kitkoan ( 1719118 ) writes:
        
        Assume Good Faith!
        I have candy. Get in the van.
        Ehhhhhh.... in the words of Ogden Nash 'Candy is dandy, but liquor is quicker'.
  - How do you define *Vandalism* ? (Score:5, Interesting)
    
    by Taco Cowboy ( 5327 ) writes: on Monday March 01, 2010 @03:42AM (#31312904) Journal
    
    Case in point --- There is an article in Wikipedia about a certain country.
    In that article, they blame their previous British colonial master for everything.
    I tried to make some corrections to that article to make it more "neutral", and they changed it back within 10 minutes.
    I tried again, and again they changed it back.
    For the third time, I was warned by someone from Wikipedia (dunno if it's a volunteer or something) that I have no right to make any correction to that particular article anymore.
    The "THEY" in question is the government of that country. They have a "cyber-patrol" group in charge of "online propaganda" and that Wikipedia article is one of their many lies, aka propaganda, they have put online.
    Now, how do you define vandalism in this case?
    
    Parent Share
    twitter facebook
    - Re: (Score:1)
      
      by svick ( 1158077 ) writes:
      
      This is not vandalism, but violation of neutral point of view. You should try to talk with them first, not start and edit war. If talking fails, you should ask others to help you resolve the dispute, in this case the Neutral point of view noticeboard [wikipedia.org] is probably the best place.
- Re: (Score:2)
  
  by Tango42 ( 662363 ) writes:
  
  If the world doesn't want Wikipedia, they are more than welcome to stop reading it. In truth, however, it seems the world very much wants Wikipedia, since it is the 5th most popular website in the world (by unique visitors per month, if memory serves).
  - Re: (Score:1)
    
    by jonathansamuel2 ( 1756290 ) writes:
    
    My hope would be that whether they read Wikipedia or not, people would not support projects like this one which place more power in the hands of Wikipedia admins. Such projects by definition place less power in the hands of ordinary Wikipedia users.
    Hopefully companies like Google will also question whether Google is deserving of $2M contributions, especially when in terms of democratic process Wikipedia is getting worse instead of better, as admins go off on their power trips with more and more powerful
    - Re: (Score:1)
      
      by Jedi Alec ( 258881 ) writes:
      
      Read the Wikipedia talk page for the Martin Heidegger article and you can see that parts of Wikipedia are infested with Neo-Nazi sympathizers who have the protection of a particular Wikipedia admin.
      Really? Because I actually read through the damn thing, and all I see is a debate about the difference between being a Nazi or being a National Socialist. Add a number of people acting like pompous twats, and you get an edit war, not the coming of the third reich.
      - Re: (Score:1)
        
        by jonathansamuel2 ( 1756290 ) writes:
        
        Those who opposed any use of the term "Nazi" in the Heidegger article argued that it was pejorative, and that Heidegger was not a Nazi, he was a National Socialist.
        A later commentator said that whether intentional or not, those who posted this drivel were attempting to rehabilitate the Nazis by arguing that they weren't Nazis at all, they were National Socialists. He mentioned Lithuania, where this process is farther along than it is here.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Existing (Score:3, Insightful)

Re:Existing (Score:4, Interesting)

Re:Existing (Score:5, Insightful)

Re:Existing (Score:5, Informative)

Re:Existing (Score:4, Informative)

Re:Existing (Score:4, Informative)

Re:Existing (Score:5, Informative)

Re:Existing (Score:5, Insightful)

Re: (Score:1)

Re: (Score:2, Funny)

DWIM, PDCH (Score:2, Funny)

Re: (Score:1, Troll)

Re: (Score:2)

Re: (Score:3, Insightful)

Re: (Score:1)

Re: (Score:3, Informative)

Re: (Score:2, Interesting)

Re: (Score:2, Offtopic)

Re: (Score:1)

Re: (Score:3, Funny)

Re: (Score:2)

Re:Existing (Score:4, Interesting)

Re: (Score:1)

Re: (Score:2, Interesting)

Been done? (Score:2, Informative)

Re:Been done? (Score:5, Funny)

Re: (Score:2, Informative)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

Re:Been done? (Score:4, Interesting)

Re: (Score:2)

Re: (Score:2)

Nice template (Score:5, Funny)

Wikipedia needs a Flash editor (Score:1, Insightful)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

How about an Admin Abuse Detector? (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2, Insightful)

Re: (Score:1, Informative)

Re:How about an Admin Abuse Detector? (Score:5, Informative)

In Wikipedia, everything is transparent (Score:5, Informative)

Re: (Score:2, Insightful)

Re: (Score:1, Interesting)

Re: (Score:2)

Re: (Score:3, Informative)

The Art and Science of Wikipedia Vandalism (Score:5, Interesting)

Should work. Bogofilter for autotagging emails (Score:2)

Re: (Score:2)

Step One (Score:5, Insightful)

Re: (Score:2)

Re: (Score:1, Informative)

Re: (Score:2, Insightful)

Re: (Score:2)

Re: (Score:1)

The problem is the edits going live... (Score:2, Interesting)

Re:The problem is the edits going live... (Score:4, Informative)

Re: (Score:1)

Re: (Score:2)

Re: (Score:1)

quite a bit of work on this (Score:3, Interesting)

Re: (Score:1)

A good step forward (Score:1)

The Answer has existed for years (Score:1)

An arms race? (Score:2, Interesting)

Re: (Score:2)

What counts as vandalism on Wikipedia? (Score:2)

Re:What counts as vandalism on Wikipedia? (Score:4, Insightful)

qualifying your adversary (Score:2)

Re: (Score:2)

nuances and language mechanisms galore (Score:2)

Yeah but what about when it's not vandalism ... (Score:3, Funny)

Well-intentioned edits? (Score:2)

Vandalism Detector Unecessary? (Score:1)

Solution: color coding for edits (Score:2)

Vandalize the Vandals?? (Score:1)

How do you define Vandalism ? (Score:5, Interesting)