Google Buys reCAPTCHA For Better Book Scanning - Slashdot

Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

×

Google Buys reCAPTCHA For Better Book Scanning 138

Posted by CmdrTaco on Thursday September 17, 2009 @10:06AM from the when-spammers-give-you-lemons dept.

TimmyC writes "This story may interest the Slashdot folk, many of whom use the reCAPTCHA anti-spam service. Well, reCAPTCHA is now owned by Google. Apparently, what attracted Google to ReCAPTCHA is that the company has linked its core authentication service with efforts to digitize print books and periodicals. The search giant has a massive (and controversial) effort underway in that area for its Google Books and Google News Archive services. Every time people solve a CAPTCHA from the company, they are also, as a byproduct, helping to turn scanned words into plain text that can be indexed and made searchable by search engines. Interesting times indeed."

This discussion has been archived. No new comments can be posted.

Google Buys reCAPTCHA For Better Book Scanning

Search 138 Comments Log In/Create an Account

Comments Filter:

Well... (Score:4, Interesting)

by vikhyat ( 1593841 ) writes: on Thursday September 17, 2009 @10:10AM (#29453083)

This should improve Google's indecipherable CAPTCHA.

Share
twitter facebook
Re:WTF Summary (Score:1, Interesting)

by Anonymous Coward writes: on Thursday September 17, 2009 @10:16AM (#29453141)

As a control, the system sends out one word that it knows the answer to. You don't know which of the two is the unknown word beforehand. Also, I think that the same unknown word is kept in rotation for a couple of iterations just to double-check that it was entered correctly.
At least, that's how I'd implement it.

Parent Share
twitter facebook
Good idea, but how? (Score:1, Interesting)

by Nesa2 ( 1142511 ) writes: on Thursday September 17, 2009 @10:16AM (#29453145)

ReCAPTCHA is a free service that usually integrates into forums, bLogs, and other such anonymous comment-posting services to help eliminate bot spamming. I think they will not use it on Google search pages, but exploit ReCAPTCHA users of all of those sites that do use it already. Sounds to me like a really good idea...

I'm interested though how they are going to know what a correct entry by a user would be for a scanned word in order to validate it if they only have a scan...

Share
twitter facebook
I'm real giddy about this (Score:2, Interesting)

by Kokuyo ( 549451 ) writes: on Thursday September 17, 2009 @10:19AM (#29453181) Journal

Just wait until some soccer mom needs to protect her genius of a brat from all the bad things there are. Latest crusade? A 'bad' word in a CAPTCHA. Just you wait, it will happen.

Share
twitter facebook
Won't this eventually defeat the purpose? (Score:4, Interesting)

by natehoy ( 1608657 ) writes: on Thursday September 17, 2009 @10:35AM (#29453347) Journal

Google is doing this in order to prevent spam and to improve OCR. But once OCR is improved to the point where it can read poorer scans, won't spammers be able to use that new technology to eventually defeat CAPTCHA?
Don't get me wrong, I think this is a marvelous idea, potentially using volunteer labor of humans as OCR to interpret a book one poorly-scanned word at a time. But it does seem to have the side effect of eventually destroying the original purpose of what they bought. Maybe CAPTCHA is worth more as a "crowdsourced OCR solution" than it ever was as spam prevention anyway...

Share
twitter facebook
Re:maybe they should use CAPTCHAs... (Score:4, Interesting)

by Rik Sweeney ( 471717 ) writes: on Thursday September 17, 2009 @10:54AM (#29453521) Homepage

Funny you should say that
http://mailhide.recaptcha.net/ [recaptcha.net]

Parent Share
twitter facebook
Re:Won't this eventually defeat the purpose? (Score:1, Interesting)

by Anonymous Coward writes: on Thursday September 17, 2009 @10:59AM (#29453591)

If spammers figure out how to defeat reCAPTCHA, Google will probably hire them to automatically digitise books; that probably pays a lot better than spamming. You can think of it as trying to set all the ingenuity of the world's spammers working at the same problem...

Parent Share
twitter facebook
Re:Mod up (Score:5, Interesting)

by mrcaseyj ( 902945 ) writes: on Thursday September 17, 2009 @11:06AM (#29453667)

I agree that the idea is ingenious. But on the only one I ran into, the word was completely indecipherable. I don't mean that it was really hard, I mean that it was a word so thoroughly mangled that it was clearly impossible to read by anyone, especially without context. The lack of context is one of the big weaknesses of the system. When a word is unclear, it's the words around it that give critical clues to what it is.

Parent Share
twitter facebook
Re:WTF Summary (Score:3, Interesting)

by slyborg ( 524607 ) writes: on Thursday September 17, 2009 @11:08AM (#29453687)

I still don't get it. How do you know that the person correctly identified the second word? I don't see how a priori decoding the first word means that the second was correct. I would expect that the individual bad data rate from this technique would be substantial.
I do enjoy the fact that Google, a ridiculously profitable company by virtue of its near-monopoly on Internet search advertising, is using the public who pays it via these ad impressions to do its work for free, and using the technique invented and used by spammers to crowd-source solve CAPTCHAs to get into Gmail and the like!

Parent Share
twitter facebook
Re:WTF Summary (Score:3, Interesting)

by Anonymous Coward writes: on Thursday September 17, 2009 @12:50PM (#29455137)

Interesting you should say that.
Unfortunately, it won't work - 4chan already ruined it for everyone.
http://musicmachinery.com/2009/04/27/moot-wins-time-inc-loses/

Parent Share
twitter facebook

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Related Links Top of the: day, week, month.

503 commentsHarvard, MIT and UPenn's Presidents Should 'Resign in Disgrace', Bill Ackman Says
453 commentsEra of Global Boiling Has Arrived, UN Chief Says
417 commentsOceanGate Says All Five Titan Passengers Have Died
414 commentsJudge Blocks US Officials From Tech Contacts in First Amendment Case
404 commentsIs Gen Z Giving Up on College?

The optimum committee has no members. -- Norman Augustine