Catch up on stories from the past week (and beyond) at the Slashdot story archive


Forgot your password?
Youtube Google Social Networks The Media News

YouTube Makes Captioning Available To All 102

adeelarshad82 writes "Google's YouTube announced that it has moved its automatic speech-recognition and closed-captioning technology out of beta and has now made it available to the YouTube community at large. Most, if not all, YouTube videos now include a 'CC' button that, if pressed, will automatically generate the closed-captioning technology. The technology processes the audio feed using the speech-recognition technology used in the core voice search feature that has also been built into the Android voice search feature, the GOOG-411 phone search, and other products."
This discussion has been archived. No new comments can be posted.

YouTube Makes Captioning Available To All

Comments Filter:
  • by bennomatic ( 691188 ) on Saturday March 06, 2010 @01:57AM (#31379076) Homepage
    Or you'll end up with captions like this:

    Hey glum, Jen tonight. It's apologize for it, interrupting our conversation in early as this afternoon, yes, so I wanted to returning your call and you know check in with you further. Alright, hope you, I hope you're doing well done. Sounded like you, works but alright. Well I'll call me later. I'll talk to you soon. Bye.

    • Some of my funny ones here:

      "Hey bottoms ASAP. But on the religious anyways, call me back and I'm at my just call me back. Thank you."

      "Hey what's going on man, this is in 2 mother and anything any cool commands. We have some. Please let me stop you have a cable is nothing important. Bye. "

      "Hey Todd, on a bit. The Negro, then I put in an active on plan and payPal okay called up and have a couple (630) 440-6809. Okay bye. "
      • by Mr2001 ( 90979 ) on Saturday March 06, 2010 @02:21AM (#31379158) Homepage Journal

        My funniest one:

        "Hello voice subscriber what. Hey if you few questions for you. They can feel me 6 like a year like 2 years ago to like forever. Go you came over and I was locked out of the password didnt know the password so much and we wanted. Anybody passed it. I don't know how you guys have a good i just took it out for the first time in years and it says your class is expired. I must be changed and I go to that the windows X P professional you went and dollar dishing whatever it is really old addition, windows 85,001 yet and it's give me a change. Faster screen and says, administrative, which is still around. Funny has got hold us for new password. I confirm you got through. I've any idea what the password again, 30, or if you're more than the who knows no idea what it would've been so if you tell me but sister for you know the next week, otherwise, I was gonna go out to confirm for some a long time, so if you should come pick the and a case."

        • by aCC ( 10513 ) *

          How about you leave a voice message just reading that text? What's the result? Maybe it's some kind of "encryption" like ROT-13 but for voice messages. ;-)

        • Re: (Score:3, Funny)

          by John Hasler ( 414242 )

          I know people for whom the examples in this thread would be accurate transcriptions...

    • Re: (Score:3, Informative)

      by The MAZZTer ( 911996 )
      Phone audio quality is generally much poorer than online videos, in my experience.
      • Re: (Score:3, Insightful)

        by bennomatic ( 691188 )
        True. I think that Google should put an app on their Android phones that recognizes when someone is connected to a GVoice vmail box, and does the recording and processing locally. I figure that'd make a much more accurate translator.
        • Pretty brilliant idea. It'd be a bit annoying to implement and could only work on data plan phones android - android. But it seems feasible. Wonder how much of an improvement that would reap.
        • I doubt it'd make any difference.

          Speech recognition technology is really still in its infancy... it's possible to get good results but only under the most controlled of circumstances... high quality microphone, no background noise, clear diction, recognition engine trained for the speaker, etc. Even then it may depend on what you're actually saying, since in the case of any ambiguity a smart recognition engine will fall back to grammatical analysis and word frequency counts etc to try to guess right.

          The rea

      • by crossmr ( 957846 )

        it doesn't matter. I just checked out a couple of high quality videos with a normal person speaking english without background was a jumbled mess of garbage. Another fine google production.

    • by TheJokeExplainer ( 1760894 ) on Saturday March 06, 2010 @02:44AM (#31379226)
      Parent is referring to Google Voice's [] less-than-perfect voicemail transcription technology which often leads to odd or hilarious transcriptions [].
    • Re: (Score:3, Interesting)

      by uncqual ( 836337 )
      My most intersting one:

      Hey Hello hello, hi bye hello hello. Bye bye hey hello, test, Hello bye hello. Bye hi hello. Bye, hello hey hey hello hello hello. Bye bye hello. Call hey bye hello hello hello hello hello, hey bye bye bye hello. Bye hello. Bye hello hello. Bye. Hello S hello. Bye bye. Hello. Hello. Yeah, hello. Bye hello hello hello hello, hey, hey, yeah.

      Some of the words hello and bye were dark, the rest were mostly light gray.


      • by zill ( 1690130 )
        I have severe developmental speech disorder, you insensitive clod!

        I'm never inviting you to my parties again.
      • Re: (Score:1, Interesting)

        by Anonymous Coward

        Isn’t that essentially what modem negotiation actually is? The two modems talking to each other, saying “hello” at length?

        My goodness. It’s alive, and it can understand V.34...

      • It would really be funny if the developers planted a message when listening to standard fax negotiation tones:

        Hey, how are you?
        Not much going on. This new "Exchange Server" is such an asshole I wish he dies!
        Yeah I know what you're sayin.. I think they're gonna throw me away soon:(
        Oh's the fax anyway. Hope to hear from you soon..Bye!
        bip-bip bip bip bip bip-bip....
        bib bip bip-bip...
    • I wonder what accounts for the difference. I'd say in general most people who call me come out 99% perfect on the transcripts. Except one friend, with a Texan accent, who usually is closer to 50% accurate.
      • > I wonder what accounts for the difference.

        Some people sound that way on my answering machine (and others come across that way in person).

      • Except one friend, with a Texan accent, who usually is closer to 50% accurate.

        Of course if you live in Texas and get called by mostly people with Texan accents you get 50% accuracy.

      • I wouldn't say the transcripts have been 99% accurate word for word for me, but I can almost always get the meaning. The one exception being a friend with a speech impediment.
        The YouTube transcripts are pretty much useless from what I can tell.

    • Google Voice Voicemail Transcriptions! Now with Mad Gab [] embedded puzzles!

  • Huzzah! Now if we can just get subtitling/captioning on Netflix streams, the net will be accessible to the Deaf again.

    • by aussie_a ( 778472 ) on Saturday March 06, 2010 @02:53AM (#31379254) Journal

      I almost never turn on my speakers and yet I find the internet quite accessible.

      I'm not saying this isn't a great development. But to try to portray the internet as inaccessible to the deaf before now is ridiculous.

      • Actually, the internet of the old used to be extremely accessible to deaf and hard-of-hearing people. However, the advent of YouTube, podcasts and other multimedia services has caused an exciting and new part of the internet to be inaccessible to these people. This technology -- if it works -- will help bring the internet back to deaf and hard-of-hearing people.
    • This is why Google rocks ...and M$'s tarnished SilverDimGlow does not.

      Srongly wish Netflix would realign themselves to use a youtube-like setup instead, but I strongly suspect M$ either threw them 'an offer they could not refuse', or this will become yet another mutual lock-in, like Intel_M$.

      (Really irritated that I cannot, yet, watch Netflix from my Debian machines.)

  • Not only that (Score:1, Interesting)

    by Anonymous Coward

    They also changed the way videos are sent to the browser, many flash video players are failing because of that.

  • by Mr2001 ( 90979 ) on Saturday March 06, 2010 @02:13AM (#31379118) Homepage Journal

    Talk about advanced! Back in my day, we had to pay engineers to generate technology for us!

    • Re: (Score:3, Funny)

      by nebaz ( 453974 )

      Feeling feeling = Feeling.getFeeling(Feeling.LAUGHTER);;

      • Sounds like you're suffering from stuttering semantics -- Either that or you're an egregiously emotional eccentric.
      • And here I thought sprinkling 'self.' throughout my Python classes made me egotistical...

    • I can sell you a UML modeller which will do that. Just $100k per license. Believe me its cheap at the price. Let me demonstrate how you refactor the code. Just drag this little icon from here to here and the other little icons reorganise themselves around it. Buy this and you will never have to hire an engineer again!

    • I pay technology to generate engineers, you insensitive clod!

  • by Coopjust ( 872796 ) on Saturday March 06, 2010 @02:14AM (#31379124)
    The results are still very funny, especially for non-English speakers.

    However, it's a technology that is still relatively young. One hopes that applying it to Youtube will help Google improve the accuracy.

    However, except for spoken videos with a native English speaker with absolutely no background noise, it's nothing more than a novelty at this point. Trying this on several videos not only yielded hilarious results, but delays of several seconds in some cases.
    • by Idiomatick ( 976696 ) on Saturday March 06, 2010 @05:23AM (#31379634)
      "One hopes that applying it to Youtube will help Google improve the accuracy."

      This, if they allow for corrections it could be an incredibly huge resource of data for google. They'd end up with people spending millions of man hours teaching google how to do voice recognition. And having highly accurate voice recognition would be a boon for society generally.
    • by oztiks ( 921504 )

      Im the first to agree but then i saw Microsoft's attempt at voice recognition and its just as poor.

      There needs to be significant improvements as whole until this stuff works properly, sadly i think it's still got a long way to go.

      Accents play a big part, also the rate at people speak join words, you can tell youtube's voice recognition is good, but it doesn't keep up in those areas at all.

  • I'm trying to understand the difference between an interactive transcript, as seen at, and a caption. Why did Google go the embedded captioning route? Isn't the goal to create searchable content? If so, captions don't seem to be the solution.
    • I can imagine Google would cache intermediate results, possibly improve those results from time to time, and create a good coupling to its own search engine. Other search engines might have to 'distill' searchable text from the video (=difficult?), so that Google can search YouTube video content better than other search engines? Just a guess, FWIW.

    • Google has no problem searching it, they have the data. The problem will be for other bots searching youtube, and I can imagine reasons why Google would not want to make it easy for others to search their site.
  • CC this... (Score:5, Funny)

    by flogger ( 524072 ) <non@nonegiven> on Saturday March 06, 2010 @02:21AM (#31379156) Journal
    I looked but I can;t find google's CC button for this video: []
  • Search? (Score:2, Insightful)

    by Spy Hunter ( 317220 )

    I haven't seen any mention of search, which seems odd. Google is adding captions to every YouTube video, and nobody is interested in whether you'll be able to search the captions or not? Seems to me like it could be quite useful to search the captions of every video on YouTube.

  • Just imagine when they hook this up to Google translation and text2speech. You can choose your language for youtube audio.
  • Wish this technology would be used by TV stations to provide 'sort of' subtitling for programs that don't have any. This could be helpful for deaf/hearing impaired viewers.

    Where I live (Netherlands), there's a few public TV channels. Most programs on there are subtitled using a dedicated teletext page (888). For the bulk of commercial channels, there's also subtitles for things like prime time movies, and specific (popular) TV shows. But a lot of it is not, like average day time shows / late night docume

    • by crossmr ( 957846 )

      Proper subtitling needs humans, but come on, be honest. How much manpower does it actually require to subtitle something?
      If its your native language its a matter of timing. Little else. If you're paying someone to be on the clock depending on the length of the program it might take anywhere from 30 minutes to a day for a long program. How much is a day's wages for even the lowest of budget infomercials?

      if you're translating, you're probably not translating something new, and that means there are likely alre

      • If you're paying someone to be on the clock depending on the length of the program it might take anywhere from 30 minutes to a day for a long program.

        If the captioning takes longer than the program, you have to do it in advance. This rules out captioning news, sports, entertainment awards, and other live programs.

        • by crossmr ( 957846 )

          not really. Most lives things are actually shown on a tape delay. CC already exists for the news. but usually live programming is less concerned with exact timing and its often a constant stream of words, like with the news. I'm talking more about subbing a 2 hours movie and spending time making sure the captions line up perfectly with the dialog. It can be a tedious process. With a live program you just need someone who can type fast and accurately with a slight tape delay to check for any crazy mistakes.

        • by AK Marc ( 707885 )
          I've seen lots of live news in the US captioned, so having a human do a "suitable" job can be done in the time of a program. Granted, there was a noticeable error rate. But it was good enough that people would be quite happy with it.
  • I'm sure they will improve it dramatically in the coming months and years, but I have not laughed so hard in a while at some of the stuff it comes up with. It's as funny as using a translator to translate a word into Korean and back again.
  • Which is to say, pretty darned feeble. Clever work, but basically rubbish when compared to user expectation.

    One of my favourite videos is this one ( []), dating from the '30s, about how differential gears work. The voice-over is that beautifully clear, precise American newsreader accent of the period, and there isn't any background music to confuse things. If anything should be a perfect candidate for a computer to analyse, it's this.

    But the captions are worse than I'd

    • by gr8dude ( 832945 )

      I think the solution is to let people submit corrections for the automatically generated subtitles.

      This way we'll get a starting point, so the problem becomes more simple.

      I am now trying to write the subtitles for one of my lectures, and I find it very very tiring and difficult. The greatest problem for me is in synchronizing audio with text - I have to manually indicate in which time period a particular text needs to be shown.

      In other words, the bottleneck is not in figuring out what the words are, it is i

      • letting people submit corrections will work great until /b/ discovers it. Then every other caption will be "jews did 911" and "never gonna give you up". Remember Bucket the chatbot?

    • I tried the video mentioned here, but it just tells me "Captions are not availabel". Strange.

      Is it because I'm in Europe?
      Because I use Firefox on Linux?

      The video mentioned a few posts before that is even weirder: it seems to have captions, I can turn them on, but no captions are displayed.

  • Could you combine this with the lip reading technology that was introduce to allow "voiceless" cell phone calls? [] Wouldn't that improve the accuracy for those scenes where the speakers mouth is visible?

    Or how about using the subtitle tracks that are in a different language and reverse translating them to provide additional clues as to what the speaker might have been saying? It might help a little.

  • Fish sticks or Fish dicks?
    • If the recognizer isn't sure, perhaps it could use the fact that there are six times as many "fish sticks" as "fish dicks" in Google's web index. I'd bet it already does; there's a reason for the "Markov" in hidden Markov modeling.
    • That's simple - just google it and use the phrase which returns the most hits.
  • Just what we all needed: something dumber than user comments to read on YouTube.
  • by santax ( 1541065 ) on Saturday March 06, 2010 @06:18AM (#31379772)
    reads the caption and then produces the video?
    • by HoppQ ( 29469 )

      reads the caption and then produces the video?

      Actually, a rather obvious extension to this technology would be to feed the captions to a machine translator and a text-to-speech synthesizer to produce e.g. Russian voice for a video for those Russians who don't comprehend spoken or written English.

  • Most, if not all, YouTube videos now include a 'CC' button that, if pressed, will automatically generate the closed-captioning technology.

    The first 10 videos I've been to don't include it. Including suggested and front page vids.

    Is this a metric most?

    • by crossmr ( 957846 )

      oh wait.. just found one.
      What a train wreck. cheers google on yet another amazing product.
      Here is what is actually said:
      Hey Everyone So a lot of you may know that the Vancouver 2010 Winter Olympics are coming up

      and here is the transcribed audio:
      Everyone felt like a man of the I think every time he's had a winter olympics are coming

      This is certainly front page worthy.
      I'm going to roll out a different product.
      Basically the system will try to guess (not very accurately) how many words are

  • This is excellent timing; I clicked on the link to a video on the previous /. story but my sound was not working. I thought, "man, I wish more videos were closed-captioned," not just for lazy people like me but also for the hearing impaired.

    Finally it'll be easier for me to share these videos with my deaf and hard-of-hearing friends!

    - RG>

  • I like the "CC" feature... it makes it very simple to do those Hitler Downfall parodies... but I was surprised that I was the first to actually make one using the feature []. My video features closed captions for both the original German-to-English translation, and a Lost parody script. I also provide a handy download to a text-editable SRT file so others can make their own (does that make me a bad person?).

    The nice thing is that you can add as many subtitle files as you like... and give each of them separate

    • On a side note, I see that YouTube has not gotten to any of my videos with this "automagic" speech recognition-generated closed captions. I was hoping they would try and make one for this video of mine [], just to see what it generated.

  • An interesting upside to all this might be that, if Google keeps the dialog from youtube content in their searchable database, people may soon be able to search for videos via content.

    Right now, I believe keywords need to be done, but the auto-captioning would remove that barrier, perhaps.

    "Here's looking at you, kid."

  • Does this mean they can now enjoy the 2 Girls and 1 Cup Reaction videos? []

  • This is good news. I've been looking at speech-to-text and audiomining for a while. My goal was not captioning, but search, so in a long video or large set of videos, a user can quickly find snippets of video mentioning a word or phrase, and replay the found snippets. I found a bunch of options but budget was always in issue. Google Audio (Gaudi) was free (cool!) but seemed like a dead-end project after the 2008 elections. Blinx- spinoff from BBN focused on media companies. $$$$$$. Autonomy- enterpri
  • Soon (now?) they can generate captions of everything heard (or sung) in a video immediately after upload and match the captions against lyrics and transcriptions of copyrighted works or even just search them for specific keywords. Then they can flag those videos as possible copyright violations or even prevent them from being displayed until after being reviewed by someone.

    I'm not saying captioning isn't a good idea, only that it can be used for more than just assisting the hard of hearing.

"You can have my Unix system when you pry it from my cold, dead fingers." -- Cal Keegan