Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Google News

Google Buys reCAPTCHA For Better Book Scanning 138

TimmyC writes "This story may interest the Slashdot folk, many of whom use the reCAPTCHA anti-spam service. Well, reCAPTCHA is now owned by Google. Apparently, what attracted Google to ReCAPTCHA is that the company has linked its core authentication service with efforts to digitize print books and periodicals. The search giant has a massive (and controversial) effort underway in that area for its Google Books and Google News Archive services. Every time people solve a CAPTCHA from the company, they are also, as a byproduct, helping to turn scanned words into plain text that can be indexed and made searchable by search engines. Interesting times indeed."
This discussion has been archived. No new comments can be posted.

Google Buys reCAPTCHA For Better Book Scanning

Comments Filter:
  • Well... (Score:4, Interesting)

    by vikhyat ( 1593841 ) on Thursday September 17, 2009 @10:10AM (#29453083)
    This should improve Google's indecipherable CAPTCHA.
  • Re:WTF Summary (Score:1, Interesting)

    by Anonymous Coward on Thursday September 17, 2009 @10:16AM (#29453141)

    As a control, the system sends out one word that it knows the answer to. You don't know which of the two is the unknown word beforehand. Also, I think that the same unknown word is kept in rotation for a couple of iterations just to double-check that it was entered correctly.

    At least, that's how I'd implement it.

  • Good idea, but how? (Score:1, Interesting)

    by Nesa2 ( 1142511 ) on Thursday September 17, 2009 @10:16AM (#29453145)
    ReCAPTCHA is a free service that usually integrates into forums, bLogs, and other such anonymous comment-posting services to help eliminate bot spamming. I think they will not use it on Google search pages, but exploit ReCAPTCHA users of all of those sites that do use it already. Sounds to me like a really good idea...

    I'm interested though how they are going to know what a correct entry by a user would be for a scanned word in order to validate it if they only have a scan...
  • by Kokuyo ( 549451 ) on Thursday September 17, 2009 @10:19AM (#29453181) Journal

    Just wait until some soccer mom needs to protect her genius of a brat from all the bad things there are. Latest crusade? A 'bad' word in a CAPTCHA. Just you wait, it will happen.

  • by natehoy ( 1608657 ) on Thursday September 17, 2009 @10:35AM (#29453347) Journal

    Google is doing this in order to prevent spam and to improve OCR. But once OCR is improved to the point where it can read poorer scans, won't spammers be able to use that new technology to eventually defeat CAPTCHA?

    Don't get me wrong, I think this is a marvelous idea, potentially using volunteer labor of humans as OCR to interpret a book one poorly-scanned word at a time. But it does seem to have the side effect of eventually destroying the original purpose of what they bought. Maybe CAPTCHA is worth more as a "crowdsourced OCR solution" than it ever was as spam prevention anyway...

  • by Rik Sweeney ( 471717 ) on Thursday September 17, 2009 @10:54AM (#29453521) Homepage

    Funny you should say that

    http://mailhide.recaptcha.net/ [recaptcha.net]

  • by Anonymous Coward on Thursday September 17, 2009 @10:59AM (#29453591)

    If spammers figure out how to defeat reCAPTCHA, Google will probably hire them to automatically digitise books; that probably pays a lot better than spamming. You can think of it as trying to set all the ingenuity of the world's spammers working at the same problem...

  • Re:Mod up (Score:5, Interesting)

    by mrcaseyj ( 902945 ) on Thursday September 17, 2009 @11:06AM (#29453667)

    I agree that the idea is ingenious. But on the only one I ran into, the word was completely indecipherable. I don't mean that it was really hard, I mean that it was a word so thoroughly mangled that it was clearly impossible to read by anyone, especially without context. The lack of context is one of the big weaknesses of the system. When a word is unclear, it's the words around it that give critical clues to what it is.

  • Re:WTF Summary (Score:3, Interesting)

    by slyborg ( 524607 ) on Thursday September 17, 2009 @11:08AM (#29453687)

    I still don't get it. How do you know that the person correctly identified the second word? I don't see how a priori decoding the first word means that the second was correct. I would expect that the individual bad data rate from this technique would be substantial.

    I do enjoy the fact that Google, a ridiculously profitable company by virtue of its near-monopoly on Internet search advertising, is using the public who pays it via these ad impressions to do its work for free, and using the technique invented and used by spammers to crowd-source solve CAPTCHAs to get into Gmail and the like!

  • Re:WTF Summary (Score:3, Interesting)

    by Anonymous Coward on Thursday September 17, 2009 @12:50PM (#29455137)

    Interesting you should say that.

    Unfortunately, it won't work - 4chan already ruined it for everyone.

    http://musicmachinery.com/2009/04/27/moot-wins-time-inc-loses/

The optimum committee has no members. -- Norman Augustine

Working...