Forget Subtitles. YouTube Now Dubs (Some) Videos with AI-Generated Voices (restofworld.org) 50
An anonymous reader shared this report from the international tech news site Rest of World:
In an open letter earlier this year, Neal Mohan, the recently appointed head of YouTube, made a pledge to creators that better translation tools were coming. Now, YouTube is delivering on that promise with Aloud — a free tool that automatically dubs videos using synthetic voices, raising creators' hopes and putting new pressure on dubbing firms that already cater to YouTubers.
At the VidCon convention in late June, YouTube announced a pilot for Aloud. The tool first generates a transcription of a video's audio, which a creator can edit before selecting their preferred language and style of synthetic voice. The dub can take just minutes to generate.
The pilot currently includes the option to dub videos into English, Spanish, and Portuguese. The company has said more languages are coming — likely including Bahasa Indonesia and Hindi, which are already advertised on the Aloud website. Hundreds of creators have already signed up to test the tool. "Our long-term goal is to be able to dub between any two languages, and as part of that goal we will continue to pilot and learn from dubbing content in different regions," Buddhika Kottahachchi, co-founder of Aloud and the recently appointed head of product for YouTube Dubbing, told Rest of World. "Helping a creator expand beyond their primary language can help them reach new audiences..."
In the lead up to the pilot announcement, YouTube also released a new product feature that allows viewers to select between multiple dubbing tracks on a single video, similar to the current option for subtitles.
Here's a video of YouTube's announcement, with five"audio tracks" (in different languages) available if you click the "gear" icon. While YouTube's top stars hire dubbing services, many smaller creators can't afford them, the article points out. "By offering Aloud for free, YouTube is setting up a new swath of creators to access dubs for the first time...
"YouTube's new push into automated dubbing is a serious challenge for existing dubbing companies, which are now forced to compete with a free competitor built into the platform."
At the VidCon convention in late June, YouTube announced a pilot for Aloud. The tool first generates a transcription of a video's audio, which a creator can edit before selecting their preferred language and style of synthetic voice. The dub can take just minutes to generate.
The pilot currently includes the option to dub videos into English, Spanish, and Portuguese. The company has said more languages are coming — likely including Bahasa Indonesia and Hindi, which are already advertised on the Aloud website. Hundreds of creators have already signed up to test the tool. "Our long-term goal is to be able to dub between any two languages, and as part of that goal we will continue to pilot and learn from dubbing content in different regions," Buddhika Kottahachchi, co-founder of Aloud and the recently appointed head of product for YouTube Dubbing, told Rest of World. "Helping a creator expand beyond their primary language can help them reach new audiences..."
In the lead up to the pilot announcement, YouTube also released a new product feature that allows viewers to select between multiple dubbing tracks on a single video, similar to the current option for subtitles.
Here's a video of YouTube's announcement, with five"audio tracks" (in different languages) available if you click the "gear" icon. While YouTube's top stars hire dubbing services, many smaller creators can't afford them, the article points out. "By offering Aloud for free, YouTube is setting up a new swath of creators to access dubs for the first time...
"YouTube's new push into automated dubbing is a serious challenge for existing dubbing companies, which are now forced to compete with a free competitor built into the platform."
Seriously (Score:5, Insightful)
We won't "forget subtitles", it's not for the same use case as dubbing. Dubbing is for when you want to watch a movie in your native language; subtitles is when you want to have the original text but the video has poor audio, or you want to watch silently, or you prefer reading (I prefer reading, in the rare occasions I have to watch a video it's at 2x with subtitles).
Re:Seriously (Score:5, Insightful)
Subtitles are useful for people like me that are hearing impaired. There is no other option. If you're (almost) deaf, you can have 1000 dubbed languages available, but it's still useless. Fuck me, the tech future is bleak...
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
Indeed, thanks.
Re: (Score:3, Funny)
Re: (Score:2)
We won't "forget subtitles", it's not for the same use case as dubbing. Dubbing is for when you want to watch a movie in your native language; subtitles is when you want to have the original text but the video has poor audio, or you want to watch silently, or you prefer reading (I prefer reading, in the rare occasions I have to watch a video it's at 2x with subtitles).
Yep, I find myself needing to turn the subs on just to understand what's being said with some movies, especially on airplanes (long haul flights seems to account for most of my movie watching).
This kind of thing is going to be more use with translating English films into foreign languages for people who don't speak English... If it even works, which I have my doubts as every example I've heard can barely get English right, let alone something like Hindi or Tagalog.
Re: (Score:1)
Dubbing is for when you are illiterate and can't read the subtitles. Subtitles is when you want to see the video unaltered, as it was intended: with the original text and the original voice actors.
Re: (Score:2)
Re: (Score:2)
Re: (Score:1)
Can it be switched off? (Score:4, Interesting)
There is nothing worse than dubbing. You miss the puns in the original language, and often (listening to you, German television), a perfectly understandable English speaking person is dubbed over with hardly understandable German. After half a sentence.
Even if the original is in a language I do not understand, subtitles are way better.
Re: (Score:2)
Re: (Score:2)
The video titles are sometimes translated if the creator has dumped translated titles (and sometimes subtitles). And there is NO option to deselect that because Youtube knows best. The problem is that the translations are sometimes hilariously wrong or misleading.
For one Italian/Tuscan cooking channel I want the original audio and then whatever subtitles they have. If Youtube doesn't make the dubbing feature user-selectable I have no way to verify what is actually being said, and I'll end up with weird stuf
Re: Can it be switched off? (Score:4, Interesting)
Yeah, if I see an English language channel with a title translated into borked Norwegian, I set "never recommend this channel".
It's a basic principle of machine translation that you let the user do it, if they need it. You do not do it for them. Quite often we have to back-translate from bork to English, to understand what the hell they were trying to say in the first place. It's the opposite of helpful, it creates more and harder translation work for the user.
Re: (Score:2)
By any chance, is your use of "bork" from Sesame Street's Swedish Chef character?
Of course, I'm more familiar with "Engrish" from China and such...
As I understand it, there was a push to label everything in China with English at one point, and they did a lot of it by handing somebody a Chinese-English phrasebook and letting the put the "translations" up. There's image galleries of the resulting hilarity out there.
Also, I read manga online at times, and some of those translations... I wonder if I should vo
Re: (Score:2)
Both depend on the quality of the translation and VA work. I've seen plenty of subs that are very stilted (excluding machine translated). They get the point across but would be strange with real VA's reading them. Quality dubs tend to have a more refined translation because someone is actually speaking the lines (and can give additional feedback).
Re: (Score:1)
It is more hilarious when they dub a Bavarian or coast line dialect speaker.
Perfectly understandable, but with half a sentence offset the dub comes, and both voices are so loud, you can not follow either one.
Re: (Score:2)
Oh, and the PS2 game "Chaos Wars". I have never heard anyone sound more bored while fighting evil demons in my life. It's like the producer grabbed his daughter and her teenage friends and said "you can't go to the mall until you finish dubbing this videogame". Magnificent. Simply magnificent.
Re: (Score:2)
I agree. I hate when a visit a site and I am presented with a machine translation (usually with bad wording at best) of a English text without no easy way to get the original (like in the reference pages for Windows), as if everyone in the world was monolingual, so that there is noting better than a translation, no matter how bad it is.
It will be useful for people with vision impairment, who can see enough of the images, but not enough to read. Everyone else should be allowed to choose.
Re: (Score:2)
There is nothing worse than dubbing. You miss the puns in the original language
This. It's the main reason I only watch Anime in the original language (or any production for that matter, be it in Hindi, French, Russian, Mandarin or whatever.)
Super mangled dubs from mangled subs? (Score:5, Insightful)
What Could Posiibly Go Wrong ? (Score:2)
it is like using Excel in any database, decision making, or numerical data analysis task.
What Could Possibly Go Wrong ?
Re:Super mangled dubs from mangled subs? (Score:5, Interesting)
Another things that annoys me with automatic subtitling is that it does not edit out the small quirks that people have more or less of in their speech.
"Ahem, hmm, yeah, like ", stuttering and repetition. If the person had been annoying to listen to, the subtitles gets ten times as a annoying to read.
Not a replacement for actual dubs (Score:3)
Accents... (Score:3)
I've watched a few tech videos where they added subtitles. Theoretically, the presenter was speaking English, but with such a thick Indian accent that they were difficult to understand. The generated subtitles were laughably ridiculous, worse than no help. These were videos from big organizations, for example, Google. If the dubs are as bad (and they will be), the videos will be entirely worthless.
It's great to be inclusive and all, but presenters on videos (from organizations, anyway) should have neutral, easily understood accents. For English, that means mid-Atlantic. For German, it means Hannover. Etc.
Re: (Score:1)
I always listen to the original soundtrack (Score:4, Interesting)
Re: (Score:2)
I'd argue that there are good dubs out there, but they're the 1-10% realm. I'll agree that most are shit because the translators either cheap out on the voice acting, or have agendas where they try to cover up cultural quirks.
I mean, one translation I watched of Tenchi Muyo, where they tried to cover up that, yes, the Emperor is married to two wives, at the same time, and they know about each other. I mean, not something that should be at all surprising, out there, or anything if you have any grasp of his
Re: (Score:1)
Re: (Score:2)
Exactly - normal dubbing sucks because it's almost always flat and misses all the "audio acting"; auto generated dubbing can only be worse.
Could be useful, but (Score:3)
If this is simply speech generation for the auto-generated subtitles, auto-translated into the target language, then I'm not very hopeful.
Good dubbing (done by humans) is incredibly resource-intensive. It's the kind of task that has to be fairly perfect to be perceived as natural.
Dubbing also has something akin to the 'uncanney valley'.
A fairly effortless voice-over, with no attempt at lip-sync and the original voice still coming through in the background, is OK (for short stretches, not for entire features!).
Perfect dubbing, with lots of effort put into translations, as much lip sync as possible and done by professional voice actors, is also OK.
Sloppy dubbing, as seen on many Netflix productions, with half-assed speakers and no discernible attempt at proper lip sync is absolutely awful.
Voice cloning + dubbing would be useful (Score:3)
I wonder if the synthesized voice is generic, chosen from limited options (male, female), or whether it clones the original speaker's voice.
And since they presumably need to mute the original voice, I wonder if we'll lose all sound or if the muting will be selective? Otherwise it's not much use for things like guitar tutorials ...
Will take subtitles thanks (Score:2)
Dub the faces too (Score:2)
Use software to morph faces to sync with the dubbed voices.
Re: (Score:2)
If that works as well as subtitles do... (Score:3)
... it should be the source of hours of strange humor.
Meh (Score:2)
Wake me up when AI alters the video as well to make the lips and facial movements match the dubbed words.
And Google is doing their part... (Score:5, Funny)
... to promote illiteracy.
After all, people who can't use the fastest way to absorb information - since no youtube influencer can beat the density of well written text - will be less educated and more likely to click on ads.
Re: (Score:2)
... to promote illiteracy.
You were modded +Funny, but I find this to be so true. In all of the parts of the world I've visited, the ratio of dubbing to subtitles seems to vary proportionally with the resistance to multi-cultural literacy. Not illiteracy itself*, but the resistance to learning alternate cultures and languages.
*I've met quite a few people who have learned native languages by turning on the TV set, selecting subtitles plus the local language and absorbing it. The people who select the dubbed language tend to be less
Shortsighted action (Score:2)
unnatural synth voices...yuck (Score:2)
Re: (Score:2)
Search for any science or technology topic and 9/10 results are AI-generated shit these days. A surefire way to check if you're listening to an AI voice (and thus almost certainly a ChatGPT-written script) is to go to the channel and try two other videos. If they both have wildly different voices you're obviously not dealing with a real creator. The channels also tend to be quite young, tons of recent videos, and more popular than any organic growth.
In other words, AI-assisted YouTube SEO
Perfect translations likely soon (Score:2)
AI #1) They can use AI recognition to generate the text with timing info, this is pretty decent already from youtube; it even figures context to choose the right acronyms in my use.
AI #2) Rephrasing of text transcripts given time constraints; including indicating a slowing of video as a last resort.
#3) Video/Audio re-timing
AI #3) Voice profiling of audio in sync with the text transcript and using the translation text from #2 and retiming info to speak in the source's voice with the source's intonation but i