Fill Out CAPTCHAs, Digitize Books At The Same Time 121
alphadogg wrote with a link to a Networld article about a noble endeavor: putting CAPTCHAs to work for the good of humanity. A scientist at Carnegie Mellon is looking to create a new type of security check that will assist in a project meant to digitize and make searchable text from books and printed materials. Above and beyond that, the offering would probably be more secure than most current systems. "Instead of requiring visitors to retype random numbers and letters, they would retype text that otherwise is difficult for the optical character recognition systems to decipher when being used to digitize books and other printed materials. The translated text would then go toward the digitization of the printed material on behalf of the Internet Archive project."
Verification? (Score:5, Insightful)
CAPTCHAs work because the computers sending them already know what the text says; they start with it in text form and change it into a hard-to-read image. In the system discussed in the article, how will the computer verify that the user response actually matches the text? Sure, it could compare the response to its best guess, but if a program trying to guess the text was equally as sophisicated as the guessing computer, the guess would match.
I imagine the computer sending the picture of the image of hard-to-read text will further obfuscate the image in a way that makes it even more difficult for the computer on the receiving end to decipher, but the article doesn't acknowledge that this is one of the first logical questions in conceiving of / implementing this system in a functional way. The article really should cover this...
Re:Verification? (Score:5, Informative)
Official reCAPTCHA site (Score:5, Informative)
I originally missed the link to the official site - D'oh. The article also doesn't mention that the system is already in use! http://recaptcha.net/ [recaptcha.net]
Re:Official reCAPTCHA site (Score:5, Informative)
Re: (Score:3, Interesting)
As others mentioned this system gives you a known then an unknown, though I think its stupid that it further makes it difficult by putting a slash through it and making it wavey. Helloo, if you system had a hard time recognizing it why do you want to make it harder to recognize. I saw several in the examples in which the word was nonen
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
There's an interesting solution to this problem -- the "scientist at Carnegie Mellon" is Luis von Ahn [cmu.edu] who was recently awarded a MacArthur genius award. In optical recognition tasks like this where the "true" answer is not known, how do you verify that a human agent correctly did the recognition? Just see if a bunch of other users type the same thing. It's a clever twist on consensus voting, and was recently snatched up by Google as "Google image labeler" here [google.com].
it was also previously available as The ESP Game [espgame.org], from...(wait for it)...Carnegie Mellon
Re: (Score:1)
Still a dumb solution for a CAPTCHA (Score:2)
It's just about the most idiotic idea I've ever heard for a _CAPTCHA_. Here's why:
1. What about the first person that sees any given word? Do you let them get in regardless of what they type (remember, there is no consensus yet about that word)? Or will I have to wait another 2 weeks to see if my post is
Re: (Score:2)
As you probably noticed... (Score:2)
I dunno... it seems to me that, au contraire, you just described a way to make it easier for bots to pass. Magna cum laude.
You even have the exact way to tune it for maximum effect: the guys with the same OCR software are more likely to pass. Even if you don't exactly
Re: (Score:2)
Still a great solution for digitizing books (Score:1)
Maybe you should actually read the comments you replied to...which quote the reCAPTCHA website:
Re: (Score:1)
sue them!
j/k
Re: (Score:2)
I'm not going to type in a captcha and just wait around on the page for an hour until X other people try to answer it... This system of yours gives priority to the answers of the first few people that see it, which may well be the OCR system of some spammers.
Even more, once you've got the first few answers, then it's just a typical captcha, as you already have had it entered, and know
Re: (Score:1)
I hate you internet.
(I'm sorry I didn't mean it we'll never fight again)
Re: (Score:2, Informative)
Considering all the other people who asked that question, they really needed to make that clear in their press releases.
So if you want to screw with it, all you have to do is intentionally get exactly one word wrong each time. Yeah, it will often take two tries to get it right, but its not like CAPTCHAs usually work fine on one try anyways... And hey, if you just try for only one word (and leave the other blank), you will end up on average typing the same amount.
The article makes comparisons to SETI@H
Re:Verification? (Score:5, Insightful)
Well... sort of. Multiple agreements are required before the system will accept that it knows the spelling of a previously unknown word. So you're not going to singlehandedly subvert the system; at the very least you need a cabal of friends. But with millions of words available in the system, the chance that you and a bunch of friends will all get the same word and write in the same bogus data is pretty close to zero. I'm not saying it this system is impossible to game, but I think it'd be heck of a lot easier (and more rewarding, if it's the sort of thing that floats your boat) to vandalize Wikipedia instead.
Re: (Score:2)
It doesn't need to be planned. For instance if the given text is very close to something dirty, a lot of people will get the same idea and will put in the same text. And if you doubt the power pranksters like this can have, look back at the Google bombing episodes.
The Wikipedia is a bit different as you have to make an effort here. People are not required to write Wikipedia articles to sign up for an email account or post on a message board. If they were, the resulting information would be even less c
Re:Verification? (Score:5, Funny)
e.g.,
12345
l1il1
The captcha software knows the "12345"
but it doesn't know the "l1ill1". A human could figure out both.
But spammer captcha deciphering can figure out 12345, and is allowed to incorrectly guess 11ii1 for the 2nd part. End result is
Re:Verification? (Score:5, Insightful)
Re:Verification? (Score:4, Informative)
Yeah, but it's not like you're only allowed to present a given unknown word once. Present it many times, and use the word with the most hits.
--Rob
Re: (Score:2)
True. But captchas generally require prompt feedback; you want to know right away whether or not the user has passed the Turing test, not leave it unknown for a couple hours until a sufficient number of other users have submitted their answers to establish a consensus.
Re: (Score:1)
Re: (Score:3, Interesting)
Re: (Score:2)
Re: (Score:2, Insightful)
Re: (Score:2)
KUKUKUK (or some other random permutation of K & U in the desired length)
This way, you can
a) check all the K's for validitity, if so, then ACCEPT
b) Break up words so that they aren't as easily recognizeable
c) Still allows you to compare different people's answers for U's, as you aren't using them for validity
d) I would think that this method would reduce the number of "Jackasses" because you ne
Re: (Score:2)
You do realize that "a number of other people" here could refer to even several dozen or several hundred?
So clearly it's not going to be just one person that determines the answer for unknown captchas.
Re: (Score:2)
Re: (Score:1)
Re: (Score:1, Redundant)
That was my thought. I suppose you could let the first five people through automatically, then use their answers to check everyone else; but what's the point of a CAPTCHA that lets a certain minimum portion through?
Turning people away when they actually got it right is worse, though; that way you potentially lose customers in trying to fight spam.
Seems like an interesting idea, but I don't see how it can work...
Re: (Score:1)
And poster above has explained nicely how it works. Thanks. They could have put that in the article... (or summary!)
Re: (Score:1)
Re: (Score:2)
Re: (Score:2, Informative)
Re: (Score:2)
Re: (Score:2)
"I think it's a brilliant idea -- using the Internet to correct OCR mistakes,"
Suggesting that the words have been OCR'd, and that the user is correct the mistakes. This goes on to suggest that there is a margin of error that takes into account OCR mistakes but will allow the corrected text.
With a little imagination, it's easy to think of many permutations to this, along with the idea of just asking for a new captcha if the first one doesn't work.
The article also states there's a speakabl
Exactly what I was wondering (Score:2)
Am I missing something fundamental here?
Re:Exactly what I was wondering (Score:4, Informative)
Re:Exactly what I was wondering (Score:4, Insightful)
Also it wouldn't take much to add some grammar to pad the guessing. While we wee two words the system sees them in at least two contexts.
Obviously it has the actual dictionary to help it basically spell check the words we submit to it. If the words we give it are completely garbage, its unlikely to go for it. Which is where knowing that "niis" needs a correction.
Re: (Score:2)
Computer: [unreadable scribble]
User: Bartholomew
Computer: Please try again.
[Captcha image]
User:Red49
Computer: Access Granted!
Computer to OCR Central: [unreadable scribble]="Bartholomew"
Re: (Score:1)
"But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a
Re: (Score:1)
Look up the human computation google talk (Score:2)
Re: (Score:1)
Re: (Score:2, Informative)
Working as intended (Score:1)
In a hole in the ground there lived a penis. Not a nasty, dirty, wet hole, filled with the ends of worms and an oozy smell, nor yet a dry, bare, sandy hole with nothing in it to sit down on or to eat: it was a hobbit-hole, and that means comfort.
(yeah, i'd trust the internet community to digitize my books. why don't we just cut out the middle-man, and create a wiki-gutenberg project?)
Better links (Score:5, Informative)
Official reCAPTCHA site [recaptcha.net]
Hide your email address with reCAPTCHA [recaptcha.net] (super easy!)
A more detailed blog post about how the system works [blogspot.com]
Disclaimer: I work with Luis von Ahn [cmu.edu], who's the professor running the reCAPTCHA project.
Re:Better links (Score:5, Interesting)
Re: (Score:1)
Re: (Score:2)
Brings to mind a dystopian (and fictional) future where robots lord it over us but still need us to process large amounts of data for them. Like the Matrix, but without violating the laws of Thermodynamics. That'd make a cool SF novel, I think...
idea (Score:1)
Re: (Score:1)
More than just digitizing text (Score:3, Informative)
http://yro.slashdot.org/article.pl?sid=07/04/03/22 11258 [slashdot.org]
I believe amazon.com has filed a patent for a solution to this problem which attributes every annotation input to a unique user id. They then claim to use the average accuracy over the history of that user for whittling away the, 2 out of 10 i think the patent says, worst answers.
i'm sure some form of quality control/check will be needed and i wonder if such a solution would infringe on this patent?
Re: (Score:1)
This would also be a great approach to solving captchas on other sites. Wanna buy tickets with a bot but have a captcha in your way? Set up a third-party captcha server to have humans solve your captchas for you!
Booger (Score:2, Insightful)
Re: (Score:1)
Re: (Score:1)
But then a bot may exploit it by passing on until it finds one it can process.
Re: (Score:1)
How it could work (Score:3, Insightful)
Another method might be to separate out the un-OCRable letters from words and sprinkle them with known letters, though this might be less effective since people can often recognize words far better than individual letters. If one or two letters in a word cannot be interpreted, a person can often still read the entire word.
A spam tactic? (Score:1)
Re: (Score:2)
Re: (Score:2)
Really? How do you know this? Can you give an example of a porn site that asks for captchas? If not, it's an urban legend.
I've seen this suggested as an attack on captchas, but never heard of any site that put it into practice. Probably it is simpler to pay some third-world computer sweatshop worker to solve hundreds of them per hour for a few dollars a day. But that's equally a conjecture.
Dodgy free porn s
A better scheme (Score:1)
Re: (Score:2, Informative)
So... If you thought that CAPTCHAs were hard... (Score:2)
Mushed text with letters that slide into each other, bad lighting and every other kind of bad scanning you can imagine. Hell, you'd be lucky if you can recognize letters at all.
Question is, if the machine couldn't figure out what the word is, how will it verify your answer? Is it going to be something along "by the popular vote"?
Something is very not right in all this.
A pain for users (Score:2, Insightful)
Re: (Score:1)
Re: (Score:1)
Re: (Score:2)
Here's an early test phrase... (Score:2)
This expains (Score:1, Funny)
Type: Miserable Failure
Thankyou, click here to proceed.
Amazon's Mechanical Turk (Score:1)
http://www.mturk.com/ [mturk.com]
How stupid (Score:1)
That's the dumbest most retarded (traditional sense of teh word) thing that I've ever heard.
Missed opportunity (Score:1)
If someone can write a program to solve the distorted images of OCR-unreadable words, don't you just hire that guy to do your OCR and get out of the CAPTCHA business?
Image spam (Score:5, Interesting)
CAPTCHA+CAPTCHA (Score:1, Redundant)
Hmmm, That Looks Like A... (Score:3, Funny)
You're all missing the point (Score:1)
CAPTCHAs are bad design (Score:2)
This sound like not working (Score:1)
And if they already know what it says, then why would they need someone else to type it for the first time.
the extent of how academics can be o out of touch with reality.
Source Material (Score:2)
World's Best CAPTCHA (Score:2)
www.hotcaptcha.com [hotcaptcha.com]
A captcha doesn't have to function as a password (Score:1)
However the Iron Internet Law of "lolz > human decency" applies ... and we can look forward to books being translated as "chucknorrischucknorrischucknorrischurknorris..."
Great CAPTCHA solution to solve people not RTFA! (Score:5, Interesting)
We should put a CAPTCHA system on slashdot:
When you want to post, You get to type-in a CAPTCHA. The Image for this is generated in this way:
- The links to the article/s actually link to a page with a javascript wrapper that loads the article text, but replaces certain words with the graphical representation of that word, in the form of a CAPTCHA.
- This words form a phrase that the user must type in if he wants to post. There are different combinations of phrases selected from the article, and each poster gets one randomly.
This technology should be called CAPSSAA (for Completely Automated Public Stupidity test to tell Slashdoters and Assholes Apart)
Mod parent up (Score:1)
Re:Great CAPTCHA solution to solve people not RTFA (Score:2)
"Page 13, Line 4, Word 5, Letter 2", after ending the first level...
Nothing that a Hex editor operation in the
Re: (Score:2)
Thank you for the good memories
Re:Great CAPTCHA solution to solve people not RTFA (Score:2)
I believe it is doomed to fail.
Re:Great CAPTCHA solution to solve people not RTFA (Score:1)
Re: (Score:1)
BUSH IS AN IDIOT
then you can leave off the Obama part.
Oh, come on, somebody mod this funny - it's even on-topic. Puhleeez?
Re: (Score:3, Funny)
Oh! You mean the "E. Plebnista?"
Re: (Score:2)