Forgot your password?
typodupeerror
Wikipedia News

Developing a Vandalism Detector For Wikipedia 116

Posted by kdawson
from the false-positives-would-hurt dept.
marpot writes "In an effort to assist Wikipedia's editors in their struggle to keep articles clean, we are conducting a public lab on vandalism detection. The goal is the development of a practical vandalism detector that is capable of telling apart ill-intentioned edits from well-intentioned edits. Such a tool, which will work somewhat like a spam detector, will release the crowd's workforce currently occupied with manual and semi-automatic edit filtering. The performance of submitted detectors will be evaluated based on a large collection of human-annotated edits, which has been crowdsourced using Amazon's Mechanical Turk. Everyone is welcome to participate."
This discussion has been archived. No new comments can be posted.

Developing a Vandalism Detector For Wikipedia

Comments Filter:
  • Existing (Score:3, Insightful)

    by ShakaUVM (157947) on Sunday February 28, 2010 @04:48PM (#31308660) Homepage Journal

    Apparently, how their vandalism detector works right now is by automatically reverting any edits done by anonymous editors.

    (And yeah, that's a bit sarcastic, but somewhat true.)

    • Re:Existing (Score:4, Interesting)

      by Rik Sweeney (471717) on Sunday February 28, 2010 @04:53PM (#31308704) Homepage

      It's called Clue Bot. It's been known to revert vandalism in under 30 seconds :)

      • Re:Existing (Score:5, Insightful)

        by broken_chaos (1188549) on Sunday February 28, 2010 @04:55PM (#31308726)

        I'm assuming it's also known to revert good edits in under 30 seconds?

        Just thinking out loud here, but is raw speed of reversion really what should be bragged about, as opposed to accuracy?

        • Re:Existing (Score:5, Informative)

          by Rik Sweeney (471717) on Sunday February 28, 2010 @04:59PM (#31308750) Homepage

          Further Reading

          http://en.wikipedia.org/wiki/User:ClueBot [wikipedia.org]

          • Re:Existing (Score:4, Informative)

            by broken_chaos (1188549) on Sunday February 28, 2010 @05:09PM (#31308836)

            Oh yes, it definitely hits a large number of false positives, presumably also 'fixed' within 30 seconds. For every one that goes reported [wikipedia.org] (including the hundreds or thousands of archived reports), there must be many that go unreported, by 'non-Wikipedians' who edited a page with an error, and then went on their way. Or by people who didn't stick around to 'watch' that their edit doesn't get 'fixed' by an automated process...

            • Re:Existing (Score:4, Informative)

              by Ignorant Aardvark (632408) <cydeweys.gmail@com> on Sunday February 28, 2010 @05:14PM (#31308872) Homepage Journal

              The false positive rate on the anti-vandalism bots is a lot lower than you would think. The bots are written quite conservatively, take a lot of factors into account, and only pull the revert trigger when they are quite sure.

              It's the type II error rate that's pretty high. Unfortunately, that's not solvable without strong AI.

              • Re:Existing (Score:5, Informative)

                by marpot (1311479) on Sunday February 28, 2010 @05:38PM (#31309048) Homepage
                We have studied the accuracy of ClueBot, and found that (on a small corpus) it has very good precision (low falsy positive rate), but a very low recall (low true positive rate). (see: http://www.uni-weimar.de/medien/webis/publications/downloads/papers/stein_2008c.pdf [uni-weimar.de]) But the picture might look quite different on a large scale.
                • Re:Existing (Score:5, Insightful)

                  by beakerMeep (716990) on Sunday February 28, 2010 @06:41PM (#31309506)
                  The problem is not so simple though. You cant quantify something as subjective as vandalism. You cant reduce it to your mathematical formula no matter how statistically fancy your 6 page pdf is.

                  I had a particularly nasty run it with cluebot where I removed large portions of spam from an article, only to have cluebot revert it back and put the spam back in. When I again removed the spam, some other editor strolled by and again put the spam back in because he trusted the bot more than humans and he didnt read the talk page where many had requested the removal of this spam. Finally, after a rather rude conversation with the human he realized he had no business reverting it. This person was a long time editor and contributor too but it just serves as an example that any criteria used to determine spam is based upon assumptions. Assumptions that it will be true in other cases and assumptions that others will agree with the classification.

                  The whole point of Wikipedia is that it is a community edited encyclopedia. I have no interest in a computer edited encyclopedia. If people want to program bots to review an editor's work, perhaps we should program bots to write the work? Perhaps you can call it Botopedia. Furthermore, many of the bots ask you to report false positive to their personal pages off of Wikipedia's website on some other .com or .edu domain. They ask you to be accountable to them, but who are they accountable to? What's to stop spammers from programming bots to annoy editors as a phishing exercise?

                  Now don't get me wrong though, if someone wants to use a bot to aid in finding vandalism, that would help. But if the system is so frail that Wikipedia cant exist without computer program editors, It may be time to revisit the system. As others have stated, pushing edits into a queue would be much more sane than direct to live edits.

                  Editing bots are wrong for Wikipedia, and if they allow it they are letting go of their vision of community participation in favor of the visions (or delusions) of grand technological solutions.
                  • by marpot (1311479)

                    I cannot agree more with what you say, but I'd like to give it a twist: I want computers to assist me, and I want them to to it good, reliable, and robust. If I happen to be a Wikipedia editor that doesn't change a thing, I still want the computer to assist me with what I'm doing. Now, currently there is no such thing, and the only thing I'd like to foster research in doing so.

                    Now, some always go ten steps further, when someone talks about a new "solution" based on computers. They directly envision a worl

                    • Re: (Score:2, Funny)

                      I cannot agree more with what you say, but I'd like to give it a twist: I want computers to assist me, and I want them to DO it WELL, RELIABLY, and ROBUSTLY.

                      -Slashbot Editor 0.95 beta

                    • DWIM, PDCH (Score:2, Funny)

                      by symbolset (646467)

                      You're looking for a DWIM (Do What I Meant) interpreter with PDCH (Predictive Digital Concierge Heuristics). While the technology is available it's currently quite costly. Bugs, errata, and maintenance can deliver less than an optimal experience. Might I instead offer you this mail order bride? We have imported personal assistants in stock from less privileged nations - and if you have the means we can outsource minute-to-minute management of them to our Bangalore VPDT (Virtual Presence Discipline Team).

                  • Re: (Score:1, Troll)

                    by Moryath (553296)

                    Finally, after a rather rude conversation with the human he realized he had no business reverting it.

                    And if the editor had been one of wikipedia's "admins", he would have simply gone "ban. lock talkpage." And he'd have gone right on his merry way to abuse someone else.

                    Now don't get me wrong though, if someone wants to use a bot to aid in finding vandalism, that would help. But if the system is so frail that Wikipedia cant exist without computer program editors, It may be time to revisit the system. As oth

                  • by Eivind (15695)
                    I dunno. I find a bot okay, but it should be extremely conservative, because it's so bad if it reverts edits that are in fact made in good faith (even if the edits themselves are bad). It's possible Cluebot isn't conservative ENOUGH, but if you have a look at say the last 100 edits it's made, it's really hard to argue that they're not 99%+ bad-faith vandalism.
                • Re: (Score:3, Insightful)

                  by Tango42 (662363)

                  In that paper, you say you think high-recall (ie. low false negatives) should be preferred to high-precision (low false positives) since it reduces the chance of a reader seeing a vandalised version. I disagree. You underestimate the harm caused by losing editors that get annoyed when their legitimate edits are reverted by a bot. The upcoming feature, Flagged Revisions ( http://en.wikipedia.org/wiki/Wikipedia:Flagged_revisions [wikipedia.org] ), will provide a much better way of preventing readers from seeing vandalised ve

              • by LQ (188043)

                I'm an occasional "recent changes patroller" and I don't really care how many false positives cluebot gets in anonymous edits. It's too busy weeding out the thousands of "Bob is gay" and "I like pie" edits. Why they still allow anonymous "editors", I really don't know.

        • Re: (Score:3, Informative)

          I'm not sure why he bragged about reversion speed. All that's really dependent on is your network connection. For one, your network connection has to be good enough to download, in real time, the diffs of all edits to Wikipedia. Most aren't.

          Anyway, a decision as to whether a given diff is vandalism or not needs to be made in a small fraction of a second, as there are dozens of edits coming in every second, and if you continuously fall farther and farther behind, you lose. Given an ideal network connecti

          • Re: (Score:2, Interesting)

            by marpot (1311479)
            This is by far overestimated. Dependent on how elaborate your edit model ist, you can analyse edits live on a laptop.
            • Re: (Score:2, Offtopic)

              Which part is over-estimated? All I can speak on from experience is AntiVandalBot. I ran that on an Athlon XP 2500+ (which wasn't particularly amazing at the time). It wasn't the computation that was hard, it was the network usage of downloading the diff of every edit by a non-trusted user from the RC feed. I would not have been able to run it on any home Internet connection. Thankfully I was able to place my server on an unthrottled 100 Mbps dorm connection at the University of Maryland.

              I will grant y

              • by marpot (1311479)
                Me too, experience that is. We tooke the feauteres from our research with high througput, and implemented a live edit analysis for the English portion of Wikipedia. It listens on the IRC channel, downloads edits wikitexts of old and new revision, and then does its magic. And it did so once on an old laptop. The computer was connected at max 1 GBit/s.
      • Re: (Score:3, Funny)

        by Yvan256 (722131)

        A Clue Bot, eh? I wonder what happens if you register the username "Colonel Mustard".

        • by v1 (525388)

          A Clue Bot, eh?

          Every time I see that in this thread, my eyes substitute "Clue Bat" and it totally changes meaning of the post while still making some degree of sense, making it hard to filter out.

          The base problem here is going overlooked. There isn't one kind of edit they're trying to combat, there's several. And each requires a different approach because they are incompatible.

          1- Spam (monte pithon kind) : ok that's easy for the bots to get rid of. even very loose definitions are easy to code with a good

    • Re:Existing (Score:4, Interesting)

      by DamonHD (794830) <d@hd.org> on Sunday February 28, 2010 @05:18PM (#31308902) Homepage

      Amazingly my small sample is to the contrary.

      I fix small errors of syntax/grammar/fact when I run across them, have never created an account, and almost all of my edits seem to stick.

      Rgds

      Damon

      • by jonadab (583620)
        Yeah, my edits generally stick too, and almost all of them are anon/IP, not because I haven't created an account, but because Wikipedia's session-timeout policy is so short that logging in seldom does any good. You can log in, but by the time you're ready to commit an edit, you're typically not logged in any more. I can't imagine what possessed them to make it so short. I've got better things to do than log in *again* each and every time I want to make an edit. So I usually don't bother. And it doesn't
    • Re: (Score:2, Interesting)

      by Big Jojo (50231)

      Apparently, how their vandalism detector works right now is by automatically reverting any edits done by anonymous editors.

      I've seen signs of that too. Not always ... but often enough to have acquired a rather negative understanding of the role of some folk with admin privileges at WP. It's clear when they haven't even bothered to read (much less understand!) the edits they revert. Or that they just revert anything that offends an ideology they want WP to present on any particular topics. They think NPV

  • Been done? (Score:2, Informative)

    Whoever posted this clearly isn't aware of the actual work being done in the field. For instance, I was running an anti-vandalism bot [wikipedia.org] in 2006, and it wasn't new at the time. They've gotten gotten much more sophisticated since then.

    Why are they so intent on reinventing the wheel? Do they not even realize that the wheel exists already? Why not just improve on it instead?

    • by Kratisto (1080113) on Sunday February 28, 2010 @05:04PM (#31308794)
      The article on anti-vandalism bots had been recently vandalized when they were doing their preliminary research.
    • Re: (Score:2, Informative)

      by marpot (1311479)
      We are very aware of the existing tools (Huggle, Twinkle, and so on). See the links in the above post, and see the links in the resources section of the competition Web page. An accurate vandalism detector will take a lot of research an development, just like spam detectors did... Why did you stop developing your tool, anyway?
      • by Tango42 (662363)

        Huggle and Twinkle are tools to help humans deal with vandalism. AntiVandalBot and ClueBot, etc., are bots that deal with (the most obvious) vandalism themselves. They are very different things.

        • by marpot (1311479)
          Exactly, but both kinds of tools need to solve the same underlying problem: given an edit, is it vandalism? The better those tools answer this question, the more time of Wikipedia editors is saved.
          • by Tango42 (662363)

            No, they solve very different problems. Something like Huggle needs to work out if a given edit can be almost guaranteed *not* to be vandalism (usually because the editor is on a whitelist), everything else gets shown to a human. The important thing for something like Huggle is making it easy for humans to review edits, not judging the edits automatically in any way. Something like ClueBot needs to work out if it can almost guarantee that a given edit *is* vandalism. They are very different.

    • by pipatron (966506)

      Why are they so intent on reinventing the wheel? Do they not even realize that the wheel exists already? Why not just improve on it instead?

      Sometimes it's more practical to start from scratch. You might want to change the design from the ground up, and to do that with an already working bot would not be as constructive. The current bots are probably very well tweaked and polished, for their given design and methods of spam detection.

      • Re: (Score:3, Insightful)

        by LifesABeach (234436)
        One of the quirks I've noticed is when a business makes, or invents something, then uses the Wiki to advertise. I can't help but wonder is this could also be considered a form of vandalism?
        • Re:Been done? (Score:4, Interesting)

          by pipatron (966506) <pipatron@gmail.com> on Sunday February 28, 2010 @08:13PM (#31310328) Homepage

          I edit wikipedia occasionally, and one thing I remove is unmotivated links to companies, or unnecessary mentioning of specific products. So yes, I consider it a case of vandalism. Since my edits are usually (always?) kept, I think most people agree. There is probably some policy about it, but I act on common sense there.

          • by Marcika (1003625)

            I edit wikipedia occasionally, and one thing I remove is unmotivated links to companies, or unnecessary mentioning of specific products. So yes, I consider it a case of vandalism. Since my edits are usually (always?) kept, I think most people agree. There is probably some policy about it, but I act on common sense there.

            The policy is on the page Wikipedia:Spam [wikipedia.org], quite logically. It's probably one of the oldest official polices, given that it was already needed back in 2003...

        • One of the quirks I've noticed is when a business makes, or invents something, then uses the Wiki to advertise. I can't help but wonder is this could also be considered a form of vandalism?

          It's advertising if they use advertisement text. Recently, I edited the article on the LEON processor. It originally had texts like:
          "It offers all basic functions of a pipelined in-order processor, making it a good experimentation vehicle."

          "Making it a good experimentation vehicle" for who? What type of experiments? What is good? How is it measured?

          It's very interesting if the bot could see the difference between such texts.

    • by MillionthMonkey (240664) on Sunday February 28, 2010 @06:09PM (#31309298)
      Whoever posted this clearly isn't aware of the actual work being done in the field. For instance, I was running a ___[thing]___ in _[year]_, and it wasn't new at the time. They've gotten much more sophisticated since then. Why are they so intent on reinventing the wheel? Do they not even realize that the wheel exists already? Why not just improve on it instead?
      * * *
      This looks like a useful template for the standard "why reinvent the wheel" Slashdot post; I hope you don't mind if I reuse it.
  • by Anonymous Coward

    Wikipedia, the encyclopedia that anyone can edit - in my ass.

    Harry Potter:
    "The novels revolve around [[Harry Potter (character)|Harry Potter]], an orphan who discovers at the age of eleven that he is a wizard.{{cite web|url=http://edition.cnn.com/2000/books/reviews/07/14/review.potter.goblet/|title=Review: Gladly drinking from Rowling's 'Goblet of Fire'|date=14 July 2000|publisher=CNN|accessdate=28 September 2008}} Wizard ability is inborn, but children are sent to wizarding school to learn the magical skil

    • Re: (Score:3, Insightful)

      On the other hand, do gibberish pages like this need much more editing, or is Harry Potter's Wikipedia entry basically finished as far as anyone cares?
    • by bertok (226922)

      Wikipedia, the encyclopedia that anyone can edit - in my ass.

      Actually, it's possible to make a wysiwyg editor for Wiki markup in HTML with a little Javascript, there's no need for Flash!

      It's not even hard, I did one for a corporate project in about a week, and I'm by no means an expert at Javascript.

      It's even possible to do a split-screen view where it shows you the markup AND the preview, and the user can edit either.

      The trick is that doing this has a prerequisite: the wiki syntax has to have a nice unambiguous grammar, and you need a parser generator that can emit

    • by jgrahn (181062)

      Wikipedia, the encyclopedia that anyone can edit - in my ass. [---] I am convinced that the current state of affairs is a conscious choice. The way to maximise 'insider power' and minimize 'outsider power' is to make editing as hard as possible, and the rules and traditions needed not to be revoked as many as possible.

      Yes. That was also the main driving force behind RUNOFF, troff, TeX, LaTeX, HTML and all other non-WYSIWYG systems back into the 1960s. It's a conspiracy.

      Seriously: no. It's just that it's

  • by Anonymous Coward on Sunday February 28, 2010 @05:04PM (#31308800)

    I've had many more problems with admin abuse than vandalism. Vandalism is quick and easy to deal with. Admins are the biggest problem in Wikipedia editing; they have no accountability and abuse their power.

    How about a log of each admin's activities, including reversions, bans, etc, and a way for non-admins to challenge actions (without spending countless hours in an appeal process worthy of a federal court).

    • Re: (Score:1, Informative)

      by Anonymous Coward

      I've had many more problems with admin abuse than vandalism. Vandalism is quick and easy to deal with. Admins are the biggest problem in Wikipedia editing; they have no accountability and abuse their power.

      How about a log of each admin's activities, including reversions, bans, etc, and a way for non-admins to challenge actions (without spending countless hours in an appeal process worthy of a federal court).

      What are you talking about? All users have logs that track their actions:

      http://en.wikipedia.org/wiki/Special:Contributions/Jimbo_Wales
      http://en.wikipedia.org/w/index.php?title=Special%3ALog&type=&user=Jimbo+Wales&page=&year=&month=-1&tagfilter=

      Actions can be challenged at any point on the talk page or the administrator boards.

    • by OverlordQ (264228) on Sunday February 28, 2010 @07:59PM (#31310226) Journal

      How about a log of each admin's activities, including reversions, bans, etc, and a way for non-admins to challenge actions (without spending countless hours in an appeal process worthy of a federal court).

      Reversions: http://en.wikipedia.org/wiki/Special:Contributions [wikipedia.org]
      Bans: http://en.wikipedia.org/wiki/Special:Log/block [wikipedia.org]
      Deletes: http://en.wikipedia.org/wiki/Special:Log/delete [wikipedia.org]

      Anything else you're too lazy to find yourself?

      • by saibot834 (1061528) on Sunday February 28, 2010 @09:17PM (#31310760) Homepage

        If I had mod points, I'd mod the parent up and the grandparent down. Seriously, almost everything in Wikipedia is transparent. Search the revision history and logs and look for the information you need. RTFM.

        A lot of people on /. seem to derive very general opinions about admins from a personal disappointing encounter. They do not include diffs of their edits or their username. From my experience in most cases the guy who got reverted by an admin broke some kind of rule (and often enough they just got reverted by a regular non-admin, but they assume it was an admin). Instead of RTFM those people post as AC complaining generally about admins without providing any traceable cases of admin abuse. I know my opinion isn't very popular, but unless you give concrete examples your allegations are just FUD.

        • Re: (Score:2, Insightful)

          by Jedi Alec (258881)

          Hey, this is Slashdot. We're qualified to discuss any subject we damn well please based on our own prejudices and assumptions, while pretending that our high IQ's and common sense qualify us to pretend we're experts on whatever the discussion may or may not be about. What right do wiki admins have to assault our ivory towers when we sprinkle our droplets of distilled wisdom on their pages as well?

      • Re: (Score:1, Interesting)

        by Anonymous Coward

        I'm the OP.

        Anything else you're too lazy to find yourself?

        I recognize that voice anywhere; you must be a Wikipedia Admin. I've been editing Wikipedia for years, but didn't know about the second two lists (the first isn't really a list of reversions, but perhaps there's a way to make it work). If I don't, then I suspect many others don't.

        Which brings us back to my point: Those lists need to be part of a system -- an easily accessible, understandable system -- "for non-admins to challenge actions (without spe

    • If you think about it, it’s not much different form a country with total censorship. This small establishment’s view always overrides over everybody else. And they massively make use of that power.

      As I said: As long as it is even possible for a subset of humanity, to control what’s going onto Wikipedia, it can by definition not be the encyclopedia for all of humanity.
      It’s obvious that to solve this, central servers and admins are out of the question... resulting in a P2P system of ca

  • Bayesian statistics are an interesting thing. Mwhwhwhwhaaaa. Who thought they would say that about stats?

    Anyway. you can tell spam with a remarkably high degree of accuracy... Guess what. You can tell "Important" and "friends" emails with a similar degree of accuracy (you define what's important or who are friends). No offence to most vandals (of any type), but usually they are complete fuckwits. I suspect they and what they write are probably even more predictable than spammers.
     

    • by EdIII (1114411) *

      The goal is the development of a practical fuckwit detector that is capable of telling apart ill-intentioned posts from well-intentioned posts.

      You gave me a good idea....

  • Step One (Score:5, Insightful)

    by owlnation (858981) on Sunday February 28, 2010 @05:16PM (#31308884)
    Before any more detectors are rolled out, how about they come up with a workable definition of vandalism? And actually use it fairly, ethically and logically.

    There's a great deal of evidence to suggest the current definition of "vandalism," is something a wikiadmin decides he just doesn't like, or disagrees with, or in some way interferes with his power-trip.
    • by gbjbaanb (229885)

      It should be inaccurate revisions, however who is to say that a revision is inaccurate or not. We could have a panel of experts for each given topic, but that'd only work if you divided WP up into sections and had an admin sitting like a judge on each section.

      As a result: ""vandalism," is something a wikiadmin decides he just doesn't like, or disagrees with, or in some way interferes with his power-trip."

    • Re: (Score:2, Insightful)

      by Anonymous Coward
      I completely agree. The worst vandalism on wikipedia is done by self righteous page owners and admins on power trips that hate to be corrected. I used to help out on a number of pages (areas where I am a genuine expert not just someone with an opinion) but having my updates constantly deleted just got too frustrating, now I just make sure people in my field know not to use wikipedia.
    • by Homburg (213427)

      There's a great deal of evidence to suggest...

      And yet you don't include any reference to this supposed evidence.

      • Or, in other words, [citation needed]. (also, is [citation needed] a meme when discussing Wikipedia? ) There's a wide variety of material that will result in reverts or blocks that isn't really vandalism, though. Behaviour that's disruptive, trolling, a breaching experiment, etc. will elicit roughly the same response as vandalism, and that needs to be taken into account both for automatic vandalism-repair systems (should this process treat it as vandalism?) and for making the statement that vandalism is i
  • by Anonymous Coward

    Right now, you can think of wikipedia as having two columns per article - first is the working article column, with the second being the discussion column.

    What we really need is a third column, one for the currently published version of the article.

    While this may not be popular, it would go a long way to getting rid of the spam, and might even solve some of the other issues facing wikipedia.

    With such a system, you could even assign articles to a subject matter expert as the editor, who could approve change

    • by Shoe Puppet (1557239) on Sunday February 28, 2010 @06:04PM (#31309242)

      A system like this has been implemented for the German Wikipedia. Almost everybody who has an account can verify articles to be vandalism-free, unless you are logged in you see the last verified version by default.

      • And does it work?

        • I don't know, but I'll say this for German Wikipedia: It's a much better piece of work in my opinion. You can find huge articles with lots of great information on obscure topics, but which are written by "true fans" in a slightly unorthodox style - stuff that would be deleted in a heartbeat on English wikipedia. I don't know what they are doing, but they appear to be much more successful at accepting casual contributions.

          • My experience is very different: When looking for obscure topics, I usually head straight to the English one since the German one is often the only one that does not consider the topic to be "notable" enough for an article.

  • by Trepidity (597) <.delirium-slashdot. .at. .hackish.org.> on Sunday February 28, 2010 @05:47PM (#31309110)

    Since the problem is tantalizingly easy to frame as a standard data-mining or machine-learning problem, albeit with some quirks, there's quite a lot of work from a lot of research groups that seems to be looking at it. Some examples: one [upenn.edu], two [google.com], three [google.com], four [ucsc.edu], five [arxiv.org], six [google.com], seven [acm.org].

    • by marpot (1311479)
      Your right, it's machine learning, data mining, NLP, and information retrieval. But the fun thing is turning a research prototype into a tool that can be left alone most of the time. That hasn't happened yet. Also, research on this problem hast started only in 2008, rule-based tools developed by Wikipedians are there since 2006. All the works you listed are acutally all there is! That's not much to work with, is it?
  • If it stops Deletionists from deleting well-intended edits. Better a short article than no article.
  • It was just too visionary for its time http://www.everytopicintheuniverseexceptchickens.com/ [everytopic...ickens.com]
  • An arms race? (Score:2, Interesting)

    by fysdt (1597143)
    I believe that vandalism on Wikipedia can be limited. But would it really be possible to detect all kinds of vandalism?

    FTA:
    "Yahoo! Research will award a cash prize of 500 Euros to the winner of the plagiarism detection task. "

    500 Euro's doesn't sound much for detecting plagiarism on a site like Wikipedia...
    • by LtGordon (1421725)

      I believe that vandalism on Wikipedia can be limited. But would it really be possible to detect all kinds of vandalism?

      Without strong AI, the system can only really look for statistical and language patterns for clues on vandalism.

      If I replace an entire body section on the Fox News page with "GLENN BECK BLOWS GOATS", I would hope that a vandalism detector would flag this. If, however, I randomly insert the sentence "Glenn Beck has also been accused of inappropriate relations with barnyard animals" into a large section, then automated detection comes down to statistics or one hell of a clever context algorithm.

  • I ask because I don't know. I can see turning a page into a screed as vandalism, but that doesn't differ greatly from many of the wikipedia articles that I've read; quite a few of them are overwhelmingly dedicated to hostility to the topic or advocates of the topic. Earlier today, when I was reading the news, there was a link to the Wikipedia article on the Tea Party movement: well over half of the article was dedicated to quotes from anti-Tea Party people (MSNBC, NYT, LAT, etc.) spouting off hostility to

    • by Tango42 (662363) on Sunday February 28, 2010 @07:13PM (#31309768)

      Officially, vandalism is defined as edits made in bad faith. If you are trying to improve the article but are an idiot (which includes people that don't realise their own bias), that isn't vandalism, it's just idiocy. It is only if you are editing with the intention of making the article worse that you are vandalising.

      • Officially, vandalism is defined as edits made in bad faith.

        In other words, the scope of the problem does not include discovering the cure for human stupidity, however laudable that might be.

        Furthermore, people here are failing to apply the 80-20 rule: if you can clean up 80% of the vandalism at 20% of the human effort currently expended, the attention available to deal with the difficult twenty percent would more than triple. I've seen entire pages replaced with the word "penis" or a crass four word comment about some pimple twit schoolmate. There's a lot of low

    • The Wikipedia is trying to fairly reflect the reliable sources multiple positions so including 'spouting off' is not necessarily vandalism, if the neutral point of view of the reliable sources is that there is some hostility to the tea party.

  • As soon as you start trusting a vandalism detector over manual monitoring a lot of stuff will start to slip through, gets through the news, then the detector won't be trusted any longer. It will have a short life but will be interesting to watch.

    Sew m@ny things that can bee done to bypass mechanisms. Even simple euphemisms like cleaning the old rifle http://images.clipartof.com/small/5039-Man-Cleaning-Inside-The-Barrel-Of-His-Unloaded-Rifle-Gun-Clipart.jpg [clipartof.com] ...are sure to slip through. There are so many lang

  • There are well-intentioned edits on Wikipedia? Even if there were, how could you tell...
  • From my experience with contributing to Wikipedia, and from reading some of the talkback (is that what they're called?) discussions, I don't think there's much need for such a tool; there seems to be an elite class of Wiki users that delete anything that they deem unworthy while giving the most bizarre reasons for doing so.
  • I still think the best solution would be a color coding overlay over the text that would show the reader immediately 1.) how trustworthy the author has been and 2.) how long before the edit has been done (without being reverted). That way it would be easy to see the sections written by reputable authors who have always added useful info and distinguish it from "amendments" that have been entered just a few minutes ago by an anonymous coward.

    And for those who do not want to log in to edit, that would be f
  • Who cares 90% of the info on those sites are bougus anyway, it's like trying to fix the preputally broken!!

  • Wikipedians administrators don't seem to have a clue about the effects of vandalism.

    The time wasted by humans who's job is solely to revert vandalism is irrelevant. There are more than enough people who are willing to do this work and if they weren't doing this work they would not be contributing useful content to Wikipedia.

    The negative effects are concentrated on the knowledgeable editors who are adding useful new content. There may be 5 to 10 persons activietyl adding content to an article. Each time a

  • is everything that the admin establishment doesn’t agree with. Just like in a state with total censorship.
    And on top of that, the admins often don’t know shit about anything.
    Which is not surprising, considering that they most likely sit in underpants in their basement all day long. Why else would they have so much time to troll around Wikipedia on a deletion spree? Which is obviously not a very mentally healthy thing to do either.

    It’s simple: As long as Wikipedia can at all be controlled b

  • there is a subset of vandalism that a bot can be very good at detecting. this bot can never handle every kind of vandalism. for example, adding some subtly false statement to a biographical article, but spelling everything correctly, using correct grammar and adding something that looks like it could be a legitimate source is difficult for even human editors to recognize as vandalism.

    adding 1s everywhere or deleting the entire article is very easy to detect.

  • Rogue admins abusing their power? An "in" club? If you have a problem with an admin, provide evidence (a diff of the admin abusing his power) here [wikipedia.org]. Follow the case, argue it out, and the admin will be dealt with. Every admin is elected in, guys. If you think Wikipedia is important enough that all the scary "rogue admins" are actually doing harm, go become a part of the election process. Anyone can vote, and your opinion matters regardless of how many edits you have, or how many articles you've worked on.
  • As owner of one of the first vandalism reverto bots out there (although pattern speaking, tawkerbot2 didn't do nearly as much as CB) the first take there was if you remove the perceived vandalism almost immediately people don't get any fun out of vandalizing and stop doing it. There was massive opposition at the offset, but then, as volumes increased, people began to freak when the bot was non operational. Yes, it had false positives which needed to be dealt with, but if I recall correctly, statistically

Today's scientific question is: What in the world is electricity? And where does it go after it leaves the toaster? -- Dave Barry, "What is Electricity?"

Working...