Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Youtube AI

Forget Subtitles. YouTube Now Dubs (Some) Videos with AI-Generated Voices (restofworld.org) 50

An anonymous reader shared this report from the international tech news site Rest of World: In an open letter earlier this year, Neal Mohan, the recently appointed head of YouTube, made a pledge to creators that better translation tools were coming. Now, YouTube is delivering on that promise with Aloud — a free tool that automatically dubs videos using synthetic voices, raising creators' hopes and putting new pressure on dubbing firms that already cater to YouTubers.

At the VidCon convention in late June, YouTube announced a pilot for Aloud. The tool first generates a transcription of a video's audio, which a creator can edit before selecting their preferred language and style of synthetic voice. The dub can take just minutes to generate.

The pilot currently includes the option to dub videos into English, Spanish, and Portuguese. The company has said more languages are coming — likely including Bahasa Indonesia and Hindi, which are already advertised on the Aloud website. Hundreds of creators have already signed up to test the tool. "Our long-term goal is to be able to dub between any two languages, and as part of that goal we will continue to pilot and learn from dubbing content in different regions," Buddhika Kottahachchi, co-founder of Aloud and the recently appointed head of product for YouTube Dubbing, told Rest of World. "Helping a creator expand beyond their primary language can help them reach new audiences..."

In the lead up to the pilot announcement, YouTube also released a new product feature that allows viewers to select between multiple dubbing tracks on a single video, similar to the current option for subtitles.

Here's a video of YouTube's announcement, with five"audio tracks" (in different languages) available if you click the "gear" icon. While YouTube's top stars hire dubbing services, many smaller creators can't afford them, the article points out. "By offering Aloud for free, YouTube is setting up a new swath of creators to access dubs for the first time...

"YouTube's new push into automated dubbing is a serious challenge for existing dubbing companies, which are now forced to compete with a free competitor built into the platform."
This discussion has been archived. No new comments can be posted.

Forget Subtitles. YouTube Now Dubs (Some) Videos with AI-Generated Voices

Comments Filter:
  • Seriously (Score:5, Insightful)

    by test321 ( 8891681 ) on Monday July 31, 2023 @03:05AM (#63727150)

    We won't "forget subtitles", it's not for the same use case as dubbing. Dubbing is for when you want to watch a movie in your native language; subtitles is when you want to have the original text but the video has poor audio, or you want to watch silently, or you prefer reading (I prefer reading, in the rare occasions I have to watch a video it's at 2x with subtitles).

    • Re:Seriously (Score:5, Insightful)

      by SurfMan ( 969573 ) on Monday July 31, 2023 @03:11AM (#63727158)

      Subtitles are useful for people like me that are hearing impaired. There is no other option. If you're (almost) deaf, you can have 1000 dubbed languages available, but it's still useless. Fuck me, the tech future is bleak...

    • by Anonymous Coward
      Seems like the most obvious feature for subtitles is missing: some people are deaf and have been relying on closed captions, a subset of subtitles, even for material with perfectly fine audio.
    • Re: (Score:3, Funny)

      They can pry my subtitles off my cold, dead eyeballs!
    • by mjwx ( 966435 )

      We won't "forget subtitles", it's not for the same use case as dubbing. Dubbing is for when you want to watch a movie in your native language; subtitles is when you want to have the original text but the video has poor audio, or you want to watch silently, or you prefer reading (I prefer reading, in the rare occasions I have to watch a video it's at 2x with subtitles).

      Yep, I find myself needing to turn the subs on just to understand what's being said with some movies, especially on airplanes (long haul flights seems to account for most of my movie watching).

      This kind of thing is going to be more use with translating English films into foreign languages for people who don't speak English... If it even works, which I have my doubts as every example I've heard can barely get English right, let alone something like Hindi or Tagalog.

    • Dubbing is for when you are illiterate and can't read the subtitles. Subtitles is when you want to see the video unaltered, as it was intended: with the original text and the original voice actors.

      • Yeah, there's a reason some actors pull in very high pay packets for their work. It's mostly their voices that bring the characters & stories to life. I'd much rather have subtitles than dubbing.
    • by e3m4n ( 947977 )
      Exactly. Subtitles are perfect for office waiting rooms and bars/restaurants with more than 1 TV tuned into different channels. Not to mention hearing impaired. Actually, the AI generated dubbing could be interesting. My son watches a ton of anime on Crunchyroll. I keep picking out the same voices in almost all of the cartoons; like when he watches my hero academia I notice that its the same whiny bitch voice from attack on Titan. Im like Jesus dude you whine more than Luke Skywalker in ep4.
    • by awarre ( 1321439 )
      it's at 2x with subtitles. Thought I was the only one --was surprised when I figured out to do this.
  • by Errol backfiring ( 1280012 ) on Monday July 31, 2023 @03:11AM (#63727162) Journal

    There is nothing worse than dubbing. You miss the puns in the original language, and often (listening to you, German television), a perfectly understandable English speaking person is dubbed over with hardly understandable German. After half a sentence.

    Even if the original is in a language I do not understand, subtitles are way better.

    • As far as I could tell, you can revert to original audio manually, each time. Which is crap UX, so it's par for the course on youtube.
    • by isj ( 453011 )

      The video titles are sometimes translated if the creator has dumped translated titles (and sometimes subtitles). And there is NO option to deselect that because Youtube knows best. The problem is that the translations are sometimes hilariously wrong or misleading.
      For one Italian/Tuscan cooking channel I want the original audio and then whatever subtitles they have. If Youtube doesn't make the dubbing feature user-selectable I have no way to verify what is actually being said, and I'll end up with weird stuf

      • by Vintermann ( 400722 ) on Monday July 31, 2023 @03:56AM (#63727250) Homepage

        Yeah, if I see an English language channel with a title translated into borked Norwegian, I set "never recommend this channel".

        It's a basic principle of machine translation that you let the user do it, if they need it. You do not do it for them. Quite often we have to back-translate from bork to English, to understand what the hell they were trying to say in the first place. It's the opposite of helpful, it creates more and harder translation work for the user.

        • By any chance, is your use of "bork" from Sesame Street's Swedish Chef character?

          Of course, I'm more familiar with "Engrish" from China and such...

          As I understand it, there was a push to label everything in China with English at one point, and they did a lot of it by handing somebody a Chinese-English phrasebook and letting the put the "translations" up. There's image galleries of the resulting hilarity out there.

          Also, I read manga online at times, and some of those translations... I wonder if I should vo

    • Both depend on the quality of the translation and VA work. I've seen plenty of subs that are very stilted (excluding machine translated). They get the point across but would be strange with real VA's reading them. Quality dubs tend to have a more refined translation because someone is actually speaking the lines (and can give additional feedback).

    • It is more hilarious when they dub a Bavarian or coast line dialect speaker.
      Perfectly understandable, but with half a sentence offset the dub comes, and both voices are so loud, you can not follow either one.

    • You sir, have never witnessed the shear unmitigated glory was early 90s TurboGrafx-16 CD based game dubbed into English. I can't recommend the game "Last Alert" enough.

      Oh, and the PS2 game "Chaos Wars". I have never heard anyone sound more bored while fighting evil demons in my life. It's like the producer grabbed his daughter and her teenage friends and said "you can't go to the mall until you finish dubbing this videogame". Magnificent. Simply magnificent.
    • I agree. I hate when a visit a site and I am presented with a machine translation (usually with bad wording at best) of a English text without no easy way to get the original (like in the reference pages for Windows), as if everyone in the world was monolingual, so that there is noting better than a translation, no matter how bad it is.

      It will be useful for people with vision impairment, who can see enough of the images, but not enough to read. Everyone else should be allowed to choose.

    • There is nothing worse than dubbing. You miss the puns in the original language

      This. It's the main reason I only watch Anime in the original language (or any production for that matter, be it in Hindi, French, Russian, Mandarin or whatever.)

  • by WereCatf ( 1263464 ) on Monday July 31, 2023 @03:23AM (#63727184)
    I haven't seen literally a single YouTube-video where the automatically generated subs didn't constantly have huge mistakes in them -- possibly because I watch technical videos instead of some reality TV or whatever shit other people get up to -- and those mistakes are the kind where, when you read them, you can typically guess what was actually being said, but when you take those mistakes and say them out loud, they become completely incomprehensible gibberish. It's really quite a different thing seeing the letters and numbers in a row in close proximity and hearing the gibber being said.
  • by Cley Faye ( 1123605 ) on Monday July 31, 2023 @03:30AM (#63727200) Homepage
    As it is now, real dubbing service have nothing to fear; for now it's so disturbingly bad you immediately scramble to click that gear icon and revert to original audio. Spending the time to properly review and annotate the script to give all intonation and references needed to generate proper voice then generating proper voiceover would become their new job, as it would be very time consuming in itself.
  • by bradley13 ( 1118935 ) on Monday July 31, 2023 @03:36AM (#63727218) Homepage

    I've watched a few tech videos where they added subtitles. Theoretically, the presenter was speaking English, but with such a thick Indian accent that they were difficult to understand. The generated subtitles were laughably ridiculous, worse than no help. These were videos from big organizations, for example, Google. If the dubs are as bad (and they will be), the videos will be entirely worthless.

    It's great to be inclusive and all, but presenters on videos (from organizations, anyway) should have neutral, easily understood accents. For English, that means mid-Atlantic. For German, it means Hannover. Etc.

    • I recorded a technical presentation during the pandemic where I got an automatic translation, and it was a mess, despite the fact that I'm a native (American) English speaker: "hi i'm tiffany crime and i'm going to talk about the empire there, like a model, I thought it ruins person."
  • by Snard ( 61584 ) <mike@shawaluk.gmail@com> on Monday July 31, 2023 @04:01AM (#63727270) Homepage
    I am an English language speaker, and I watch a lot of foreign language Netflix TV series and movies, and I always choose the original language soundtrack with English subtitles. I feel that the voice chosen for an audio dub often does not match the actor, and the differences between the audio and mouth movements are distracting to me. Also, many languages have much different "word counts" to say the same expression (sometimes more than English, sometimes less). Another thing that you miss with a dubbed soundtrack are the vocal inflections that are inherent in certain languages, like Korean (such as the pitch shifts at the end of sentences), which to me are part of the culture. Finally, I enjoy hearing the original language and seeing the translation on screen, because after a while I learn a few words of the new language, or I get to learn about which English words they have borrowed. It's interesting to see how many other languages use the word "OK".
    • I'd argue that there are good dubs out there, but they're the 1-10% realm. I'll agree that most are shit because the translators either cheap out on the voice acting, or have agendas where they try to cover up cultural quirks.

      I mean, one translation I watched of Tenchi Muyo, where they tried to cover up that, yes, the Emperor is married to two wives, at the same time, and they know about each other. I mean, not something that should be at all surprising, out there, or anything if you have any grasp of his

    • I like to listen to German shows with German subtitles to help keep my skills up. Netflix is pretty good about letting me keep the original language, audio and subtitles. Some other platform (I think it was Hoopla or Kanopy) forced English.
    • by vanyel ( 28049 )

      Exactly - normal dubbing sucks because it's almost always flat and misses all the "audio acting"; auto generated dubbing can only be worse.

  • by blugalf ( 7063499 ) on Monday July 31, 2023 @04:19AM (#63727304)

    If this is simply speech generation for the auto-generated subtitles, auto-translated into the target language, then I'm not very hopeful.

    Good dubbing (done by humans) is incredibly resource-intensive. It's the kind of task that has to be fairly perfect to be perceived as natural.

    Dubbing also has something akin to the 'uncanney valley'.

    A fairly effortless voice-over, with no attempt at lip-sync and the original voice still coming through in the background, is OK (for short stretches, not for entire features!).

    Perfect dubbing, with lots of effort put into translations, as much lip sync as possible and done by professional voice actors, is also OK.

    Sloppy dubbing, as seen on many Netflix productions, with half-assed speakers and no discernible attempt at proper lip sync is absolutely awful.

  • by mkwan ( 2589113 ) on Monday July 31, 2023 @04:30AM (#63727318)

    I wonder if the synthesized voice is generic, chosen from limited options (male, female), or whether it clones the original speaker's voice.

    And since they presumably need to mute the original voice, I wonder if we'll lose all sound or if the muting will be selective? Otherwise it's not much use for things like guitar tutorials ...

  • Nothing worse than dubbed voices, it grates on the mind with lips out of sync and the voice just not sounding like it is part of the movie. Give me subtitles anyday.
  • Use software to morph faces to sync with the dubbed voices.

  • by Casandro ( 751346 ) on Monday July 31, 2023 @05:47AM (#63727434)

    ... it should be the source of hours of strange humor.

  • Wake me up when AI alters the video as well to make the lips and facial movements match the dubbed words.

  • by Torp ( 199297 ) on Monday July 31, 2023 @06:57AM (#63727550)

    ... to promote illiteracy.

    After all, people who can't use the fastest way to absorb information - since no youtube influencer can beat the density of well written text - will be less educated and more likely to click on ads.

    • by PPH ( 736903 )

      ... to promote illiteracy.

      You were modded +Funny, but I find this to be so true. In all of the parts of the world I've visited, the ratio of dubbing to subtitles seems to vary proportionally with the resistance to multi-cultural literacy. Not illiteracy itself*, but the resistance to learning alternate cultures and languages.

      *I've met quite a few people who have learned native languages by turning on the TV set, selecting subtitles plus the local language and absorbing it. The people who select the dubbed language tend to be less

  • Deaf people can't hear either the original or a dubbed voice. Subtitles are required.
  • I'm getting tired of youtube videos that use a synthetic voice. While they have improved, they are still very unnatural sounding. They frequently use inappropriate inflections, and seriously lack in the normal subtle variations of human speech. I'm also starting to suspect some of the folks offering voice services are actually using a synth voice.
    • Search for any science or technology topic and 9/10 results are AI-generated shit these days. A surefire way to check if you're listening to an AI voice (and thus almost certainly a ChatGPT-written script) is to go to the channel and try two other videos. If they both have wildly different voices you're obviously not dealing with a real creator. The channels also tend to be quite young, tons of recent videos, and more popular than any organic growth.

      In other words, AI-assisted YouTube SEO

    • AI #1) They can use AI recognition to generate the text with timing info, this is pretty decent already from youtube; it even figures context to choose the right acronyms in my use.

      AI #2) Rephrasing of text transcripts given time constraints; including indicating a slowing of video as a last resort.

      #3) Video/Audio re-timing

      AI #3) Voice profiling of audio in sync with the text transcript and using the translation text from #2 and retiming info to speak in the source's voice with the source's intonation but i

A Fortran compiler is the hobgoblin of little minis.

Working...