Researchers Build An AI That's Better At Reading Lips Than Humans (bbc.com) 62
An anonymous reader quotes the BBC:
Scientists at Oxford say they've invented an artificial intelligence system that can lip-read better than humans. The system, which has been trained on thousands of hours of BBC News programs, has been developed in collaboration with Google's DeepMind AI division. "Watch, Attend and Spell", as the system has been called, can now watch silent speech and get about 50% of the words correct. That may not sound too impressive - but when the researchers supplied the same clips to professional lip-readers, they got only 12% of words right...
The system now recognizes 17,500 words, and one of the researchers says, "As it keeps watching TV, it will learn."
The system now recognizes 17,500 words, and one of the researchers says, "As it keeps watching TV, it will learn."
Re: (Score:3)
Well, there is "Bad Lip Reading" - their videos are usually pretty funny.
Re: (Score:3)
'the same clips to professional lip-readers"
ok, who else didn't know that there are "professional" lip readers?
The police use them from time to time (on surveillance videos). I imagine there are other uses as well.
17 years too late (Score:5, Insightful)
I'm sorry Dave, I'm afraid I can't do that.
Re: (Score:3)
Great way to get flushed down the airlock! [n/t] (Score:2)
N/T
Re: Maybe /. needs an AI ... (Score:1)
with all the AI job obsolescence going on the universal income one is pretty much relevant
perfect opportunity (Score:3)
Sseeing as there's so much closed-captioning going on, they've got an enormous volume of material to train their neural network on.
I've done this sort of thing before, and often finding a large set of quality training material is a significant challenge.
Getting half the words correct, then feeding that into a grammar / context engine should yield very close to 100% accuracy. That's what deaf (and hearing impaired) lip readers have to do since the stated 12% initial recognition is about right. They have to stay very focused on the speaker and make heavy use of context to work out what's being said. And that's a perfect job for a computer.
Re: (Score:2)
Re: (Score:2)
The closed-captioning does speech-to-text, not lip reading.
Sure, but if it did both, the error rate would go way down.
Re: (Score:2)
Re: (Score:2)
Also consider how frequently the captions differ from the actual spoken words.
Re: (Score:2)
Re: (Score:2)
Yes, I understand. But the fact that the captions and the spoken words often differ limits the effectiveness of combining captions and lip reading to reduce the error in machine translations. It doesn't matter much why the captions and the spoken words differ.
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Closed Captioning is the transmission of text of what is being said along with the video and audio stream. It's up to the receiver to do text to speech.
The benefit of CC here is that you have the "problem" (the video of the speaker) AND the "answer" (the text that they spoke) to work with, and this is precisely what you require to train a neural network. A large volume of problems and correct solutions. "When you get THIS input, you are suppose
Re: (Score:2)
Re: (Score:2)
Would each closed-captioned syllable or word need to be manually synchronized with the video first? Or can the training be done without it?
Getting half the words correct, then feeding that into a grammar / context engine should yield very close to 100% accuracy.
But this AI is already using context to some degree. The article gives the example of "Prime Minister" for instance, where the AI knows that if the word "Prime" is read on their lips, that the word "Minister" will probably follow. Also, the AI has been trained in one context alone, which means that the context is already taken into account. For instance, if the same anch
straight from the related links (Score:1)
https://tech.slashdot.org/story/16/11/25/1146258/googles-deepmind-made-an-ai-watch-close-to-5000-videos-so-that-it-surpasses-humans-in-lip-reading?sdsrc=rel
But the wild card walks in (Score:1)
Sees the computer AI progressing in its research, and decides to replace the movies being watched, with the complete collection of gojira monster films that were dubbed in English and hardly provided any syncing at all, circa 1960's era, followed by Chinese martial arts movies full of lines like "Yaaaaa!" " Huh?" and "Prepare to die!"
The icing on the cake is when he throws in an Inspector Clouseau film
The surveillance state (Score:4, Insightful)
The surveillance state is coming in its pants thinking about all the additional conversations they'll be able to monitor now.
Time to break out the bandannas and cough-masks....soon it'll be fashionable to wear them in public!
Re: (Score:3, Insightful)
soon it'll be fashionable to wear them in public!
And illegal
That cry of dismay ... (Score:2)
That cry of dismay was the sound of thousands of blind gynecologists realizing they will be out of a job reading lips. :-)
Of course the reality is grim - even more surveillance by marketers and the state - especially with TVs and webcams and (if you believe Trump) microwaves watching everything you say and do.
Re: (Score:2)
if you believe Trump
You should! He never lies, and he's always right
Except when his lips move ....
Re: (Score:2)
Professional lip readers are bunk. (Score:3)
Go compare this to a deaf person that reads lips. I know of literally thousands that never miss a single spoken word as long as they're looking at the speaker's mouth.
Source: Camfrog, where there are fucktons of deaf people communicating with those with hearing. We speak after getting their attention with a hand signal, they read our lips and reply with zero issues.
Re: (Score:2)
This is true. I once had a conversation with someone and was very surprised to later learn that the person was completely deaf. I had no clue.
Based on "2001", I thought it would be better (Score:2)
Or was Frank Poole killed because HAL thought they were going to unplug the "Mammary Circus" and that was basically the only DVD the three of them could agree on watching?
need good info to train the AI (Score:2)
Round peg, meet round hole (Score:4, Interesting)
Re: (Score:2)
Because Berkeley lied when they said that they had to provide transcripts or remove the material. Section 107 of the copyright act 1976 [copyright.gov] allows for fair use for teaching materials, and this allows 3rd parties to make available all such materials in more accessible forms, and for Berkeley to use the results of such work.
They weren't interested in doing this. It's about monetization and artificial scarcity, pure and simple. This was just a smokescreen to remove the material.
The blind will be using TTS screen
Re: (Score:2)
Re: (Score:2)
Why don't they offer to run this against the thousands of hours of course videos that Berkley just pulled due to ADA? Google gets massive training material, Berkley gets free transcripts, and the material stays online. Everyone wins...
Good idea, but unfortunately it won't work in this case. Many of UCBerkeley's lecture videos only show the slides and you hear the lecturer talk. See, for example, https://www.youtube.com/watch?... [youtube.com].
Learning through TV (Score:2)
When TV was first being introduced as a consumer product, one of the selling points of the idea was that people would be able to learn by watching it. If this works out as well as that, then the system will only be able to recognize when someone is uttering lines from commercials.
Re: (Score:2)
Spying Concerns (Score:1)
At least I know it won't be able to read my lips. You see, I speak American, not English.
Not surprising (Score:2)
Humans are very difficult to read.
Try this line, Mr. AI lipreader (Score:2)
Did he just say "No new taxes," or did he say "No Newt[Gingrich] Axes" ?
Heck you were even told, prior to that line, "read my lips," so you got no excuses.
Duplicate, and old (Score:2)
https://tech.slashdot.org/story/16/11/25/1146258/googles-deepmind-made-an-ai-watch-close-to-5000-videos-so-that-it-surpasses-humans-in-lip-reading
Quiet Man (Score:1)
Maybe we'll finally find out what John Ford told Maureen O'Hara to say John Wayne...a secret all three took to their graves...