Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Books Media Software Technology

reCAPTCHA Hard At Work, Rescuing Fading Texts 112

sciencehabit writes "Computer scientists have developed a program, called reCAPTCHA, which is being used in lieu of CAPTCHA by several sites, to help digitize old books and newspapers. The reCAPTCHA takes entries from old and faded texts that optical scanners and digital-text readers have trouble with. So every time you solve that string of crooked letters, you may actually be helping historians digitally reconstruct a page from the 1908 New York Times." The Science Now story links to the longer and more informative article at Ars Technica. (We last mentioned this program last year — and now it's good to get some sense of how well it's working.)
This discussion has been archived. No new comments can be posted.

reCAPTCHA Hard At Work, Rescuing Fading Texts

Comments Filter:
  • Not new (Score:4, Informative)

    by JazzyMusicMan ( 1012801 ) on Thursday August 14, 2008 @09:07PM (#24609239)
    Ticketmaster and other sites have already been doing this for a while. Go to ticketmaster and search for tickets, you'll see two words. One is known and the other is unknown. If you don't believe me, try to guess which one they know and misspell the other one on purpose (or don't, this is for historic posterity =) )
  • Re:Not new (Score:3, Informative)

    by Dachannien ( 617929 ) on Thursday August 14, 2008 @09:16PM (#24609357)

    So is the US Patent and Trademark Office, as part of the process of using PAIR [uspto.gov], the Patent Application Information Retrieval system, which lets the public look at information about patent applications that have been published.

  • Image Captchas (Score:4, Informative)

    by pembo13 ( 770295 ) on Thursday August 14, 2008 @09:48PM (#24609613) Homepage
    I've found implementing a simple "please choose the name of the item seen bellow" eliminates a large amount of spam (all?) but has the problem of not being viable for blind people.
  • Re:Not new (Score:3, Informative)

    by Firehed ( 942385 ) on Thursday August 14, 2008 @10:02PM (#24609743) Homepage

    Do they really? From what I was able to tell, it's not specified as reCAPTCHA anywhere in the window; having looked at the reCAPTCHA site from a development side I could swear that I read that you needed to give credit if developing a custom style for it. Either I'm remembering wrong, they've got a deal, or FB is undergoing one of the stupidest TOS violations ever.

  • Re:Not new (Score:4, Informative)

    by erbmjw ( 903229 ) on Thursday August 14, 2008 @10:21PM (#24609911)
    from reCAPTCHA FAQ [recaptcha.net]

    When showing reCAPTCHA to the user, is it possible not to show the reCAPTCHA logo? We allow you to customize the theme of reCAPTCHA with our Client API. You are still required to have text on your website which states that you are using reCAPTCHA, however with our theming API, you are free to do this in a way that blends in to your site.

  • by corbettw ( 214229 ) on Thursday August 14, 2008 @10:35PM (#24610039) Journal

    There are multiple libraries for reCAPTCHA already published, all under the MIT License. Just see http://code.google.com/p/recaptcha/ [google.com] for a list of them.

  • Re:Not new (Score:4, Informative)

    by Your Pal Dave ( 33229 ) on Thursday August 14, 2008 @11:29PM (#24610433)

    Quoting from the NPR story [npr.org] which aired earlier today:

    more than 40,000 Web sites -- including popular ones such as Ticketmaster, Facebook and Craigslist -- are using a new kind of security program called reCAPTCHA.

  • by Robotech_Master ( 14247 ) on Thursday August 14, 2008 @11:35PM (#24610477) Homepage Journal

    I've seen one ReCAPTCHA string that was just a distorted entirely illegible blob of ink.

    Just do what I did: click the "refresh" button to the right for a new word pair and enter that one.

  • by Irish_Samurai ( 224931 ) on Thursday August 14, 2008 @11:42PM (#24610525)

    The point is to see what the populace thinks the relation is.

    If you think google is the end all be all of absolute information then you already fail.

  • It turns out... (Score:3, Informative)

    by symbolset ( 646467 ) on Friday August 15, 2008 @12:02AM (#24610689) Journal

    That slashdot's Goatse troll server guy proves useful.

    Note: This is not a troll. One of the guys that offers open web services to slashdot trolls is also responsible for considerable development of CAPTCHA breakage and is an eminent Debian developer. This is why I've said that we should respect his efforts despite the unpleasant side effects. The truly brilliant we should grant exceptions from social behavior because they discover things more proper folk would not.

  • Re:Not new (Score:3, Informative)

    by sangreal66 ( 740295 ) on Friday August 15, 2008 @12:55AM (#24611029)

    Do they really? From what I was able to tell, it's not specified as reCAPTCHA anywhere in the window; having looked at the reCAPTCHA site from a development side I could swear that I read that you needed to give credit if developing a custom style for it. Either I'm remembering wrong, they've got a deal, or FB is undergoing one of the stupidest TOS violations ever.

    They do give attribution to reCAPTCHA. You have to click on "What's this?"

    This is a standard security test that we use to prevent spammers from creating fake accounts and spamming users. Our captchas are provided by ReCaptcha

  • Re:RTFA (Score:3, Informative)

    by Psychotria ( 953670 ) on Friday August 15, 2008 @02:23AM (#24611503)

    The authors also tested software designed to crack CAPTCHAs against images created using reCAPTCHA, and found that they failed completely. The authors ascribe this to the fact that the letters in scanned images contain distortions that are not the result of a clean mathematical transformation. User response times were also measured, but there were no significant differences between the time it took users to handle traditional systems and that required to use reCAPTCHA.

  • by RJFerret ( 1279530 ) on Friday August 15, 2008 @02:58AM (#24611685)

    You can also use reCaptcha for your own email address, and be more willing to provide it "publicly" since they'd have to answer the reCaptcha to get to the mailto... reCaptcha mailhide [recaptcha.net]

  • Re:Not new (Score:2, Informative)

    by Anonymous Coward on Friday August 15, 2008 @06:20AM (#24612557)

    That's scary. The way ReCaptcha works allows the reCaptcha server to collect the IPs of reCaptcha users (along with the reCaptcha-enabled website they are using). If many websites are using reCaptcha, it allows to track users as they are moving through the web, from one reCaptcha-enabled website to the next.

    Only if you actually use the JavaScript API. If you want to protect the privacy of your site's users, you are free to use the server side API of your choice. This gives them (at most) a count of how many recaptchas your users have solved. By the way, the recaptcha site provides - amongst others - ready-made server side bindings for PHP, Java, Ruby, Python and Perl.

For God's sake, stop researching for a while and begin to think!

Working...