FFmpeg 8 Can Now Subtitle Your Videos on the Fly (theregister.com) 32

Posted by msmash on Thursday August 28, 2025 @07:20PM from the pushing-the-limits dept.

FFmpeg 8.0 brings GPU-accelerated video encoding via Vulkan -- and can now subtitle your videos automatically using integrated speech recognition. From a report: At the start of the week, the FFmpeg project released its eighth major version. It's codenamed "Huffman" after the Huffman code algorithm, which was invented in 1952, making it one of the oldest lossless compression algorithms.

[...] The changelog lists 30 significant changes, of which the top new feature is integrating Whisper. This means whisper.cpp, which is Georgi Gerganov's entirely local and offline version of OpenAI's Whisper automatic speech recognition model. The bottom line is that FFmpeg can now automatically subtitle videos for you.

FFmpeg 8 Can Now Subtitle Your Videos on the Fly

Post Load All Comments

Search 32 Comments Log In/Create an Account

Comments Filter:

Shit (Score:1)

by ArchieBunker ( 132337 ) writes:

Looks like ffmpeg is the latest enshittification victim.
- Re:Shit (Score:5, Interesting)
  
  by drinkypoo ( 153816 ) writes: <drink@hyperlogos.org> on Thursday August 28, 2025 @07:42PM (#65622986) Homepage Journal
  
  At least it's local and offline.
  
  Reply to This Parent Share
  Flag as Inappropriate
- Re:Shit (Score:5, Interesting)
  
  by Kisai ( 213879 ) writes: on Thursday August 28, 2025 @08:09PM (#65623018)
  
  Not entirely.
  Whisper actually works rather well in several specific use cases, and fails spectacularly in others. You need to know this in advance:
  - Whisper is roughly 90% accurate at transcription and translation
  - Whisper absolutely does not know what to do with silence and will randomly inject "subtitled by (fansub group, netflix, etc)" into silence
  - Whisper does not really understand singing well
  - Whisper does not understand code-switching (eg switching between English and Japanese in the same context window)
  - Whisper understands zero onomatopoeia, just like all ASR systems.
  With that said, it is not useful or reliable for:
  1. Fansubbing, especially anything adult. It can only understand words, not onomatopoeia. So when it stumbles into a scene where someone goes "ah!" it has zero context for it. The result is actually pretty silly, and often turns sex scenes in R-rated and unrated media into a series of random gibberish words that begin with the same sound. Likewise children playing and women giggling often turns it into a series of nonsense, sometimes sexually charged words.
  2. Transcription of podcasts. Sorry bub, your average podcaster has a shitty microphone, and can not subtitle when multiple people are speaking over each other. Especially when people use Zoom or Discord to have a multi-party video. If you want to use it to transcribe a podcast, record each participant separately and merge the result.
  3. ASR technology is often built on corpus of bad data that elevates profanity when it tries to guess words it can not understand. So it's more likely to use racist language "trigger" becomes the same word with an n, that isn't even in the audio. So your input source must be professional grade, or it's word error rate will be higher and favor profanity or racist language over other more less-often but more obvious words.
  I doubt most people will use this in practice as Whisper.cpp is insanely slow without being expressly used on a 16GB nvidia GPU anyway.
  
  Reply to This Parent Share
  Flag as Inappropriate
  - Re: (Score:2, Interesting)
    
    by djgl ( 6202552 ) writes:
    
    The points you mention sound like they are drawbacks of the available language models, not of the used whisper library.
  - Re: Shit (Score:1)
    
    by bjoast ( 1310293 ) writes:
    
    I was confused about your comment until I fugured out that you don't seem to understand what an onomatopoeia is.
    - Re: Shit (Score:2)
      
      by EldoranDark ( 10182303 ) writes:
      
      Really common misunderstanding. I've spent years trying to correct people. I'm giving up. It's pervasive in the subtitling/translation/localisation industry
    - Re: (Score:2)
      
      by allo ( 1728082 ) writes:
      
      I am still confused. I can look up the real meaning, but what is the common misconception meaning?
  - Re: (Score:2)
    
    by allo ( 1728082 ) writes:
    
    "- Whisper absolutely does not know what to do with silence and will randomly inject "subtitled by (fansub group, netflix, etc)" into silence"
    While this may be a design problem with Whisper, it should be easy to avoid in ffmpeg. If silence is detected, do not generate subtitles. Not the scientific solution, but a working one.
- Re:Shit (Score:5, Informative)
  
  by Lproven ( 6030 ) writes: on Friday August 29, 2025 @04:08AM (#65623680) Homepage Journal
  
  I wrote this article.
  I don't think so, no. It's a local feature, not online, entirely optional, and you are perfectly free to ignore it, not turn it on, and use FFmpeg as before.
  The size of the binary of FFmpeg is a rounding error compared to the many gigabytes of the video files it takes as input and emits. If you do not enable the Whisper model I am not even sure it'll take any additional memory at runtime.
  
  Reply to This Parent Share
  Flag as Inappropriate
- Re: (Score:3)
  
  by John Allsup ( 987 ) writes:
  
  If people want to develop free and open source AI, it's better than just leaving it to self-interested corporations. Provided it's not forced on people.
If it is as good as Youtube, I'll pass (Score:5, Insightful)

by thesjaakspoiler ( 4782965 ) writes: on Thursday August 28, 2025 @07:46PM (#65622994)

Youtube's automatic subtitling is a piece of junk.

Reply to This Share
Flag as Inappropriate
- Re: (Score:2)
  
  by Zontar_Thing_From_Ve ( 949321 ) writes:
  
  Youtube's automatic subtitling is a piece of junk.
  Yeah, it really is. I'd have modded you up, but no points.
- Re: (Score:2)
  
  by vbdasc ( 146051 ) writes:
  
  It does many mistakes, perhaps too many - this can't be denied, but it can still be a life saver for people with poor hearing or poor command of spoken English.
  - Re: (Score:3)
    
    by Valgrus Thunderaxe ( 8769977 ) writes:
    
    Try turning off your audio just relying on their generated CC's. It's just a bunch of incomprehensible nonsense.
    - Re: If it is as good as Youtube, I'll pass (Score:3)
      
      by Fons_de_spons ( 1311177 ) writes:
      
      I do this often. My experience is completely the opposite. It is not perfect, but works great. I use that here if someone is watching TV. Don't want YouTube blaring through that.
      - Re: (Score:2)
        
        by nanoflower ( 1077145 ) writes:
        
        Agreed. While it may not be perfect in most cases the subtitles are perfectly useable. I do see some cases where it gets most of the words right but some are clearly not even close to being correct.
    - Re: (Score:2)
      
      by allo ( 1728082 ) writes:
      
      Depends on the speaker.
Not on Wayland (Score:1)

by kurt_cordial ( 6208254 ) writes:

The last fs I trusted was Windows 10. We had airport sysadmin delete the regex hotfix. Not worth recommending! I don't know why people insist on stupid commentary.
It's gotta be better than live sports subtitles (Score:2)

by Tony Isaac ( 1301187 ) writes:

I wish they'd go ahead and switch to some kind of automated subtitles already. The human subtitlers do an amazing job for a human, but they often get several sentences behind what's actually going on. If AI can subtitle live events, keeping the words on the screen in sync with what's being said, I'd welcome that even if it got a few more words wrong (which I doubt would happen, the humans get a lot of words wrong, and miss a lot as well).
- Re: It's gotta be better than live sports subtitle (Score:4, Informative)
  
  by Fons_de_spons ( 1311177 ) writes: on Friday August 29, 2025 @02:07AM (#65623544)
  
  I live in Belgium. A lot is subtitled. Unless they are doing it live, it is pretty spot on. Ai sure can learn a few things from subtitlers here.
  Well, unless it is Netflix. They clearly wanted it done cheap. Spelling mistakes, sloppy translations,... It is rushed work.
  
  Reply to This Parent Share
  Flag as Inappropriate
  - Re: (Score:2)
    
    by Tony Isaac ( 1301187 ) writes:
    
    Indeed. As my subject line noted, I'm referring to live events. In the US, subtitles are also very good for prerecorded content.
    - Re: (Score:2)
      
      by Fons_de_spons ( 1311177 ) writes:
      
      Oh, missed that. Sorry.
Start by making the actual ones work (Score:2)

by dargaud ( 518470 ) writes:

I have an Android TV projector and use it to play movies off a local DLNA server with VLC. Half the time the local srt files are completely ignored. I can find no reasons for it and I've tried everything. It's annoying as fuck as I (and family) watch movies in multiple languages. I find all the other media player systems (Plex, Jellyfin, etc...) highly cumbersome because they reorganize everything.
Yesterday I even tried merging avi and srt into an mkv and even that wouldn't display the subtitle. WTF ?!?
memory? (Score:2)

by groobly ( 6155920 ) writes:

How big is this new AI monstrosity? A few terabytes maybe?
- Re: (Score:2)
  
  by allo ( 1728082 ) writes:
  
  What did you do to find out? I mean you could for example try Google, Perplexity, maybe even ChatGPT could be asked about the size of the Whisper model.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

FFmpeg 8 Can Now Subtitle Your Videos on the Fly (theregister.com) 32

FFmpeg 8 Can Now Subtitle Your Videos on the Fly More | Reply Login

FFmpeg 8 Can Now Subtitle Your Videos on the Fly

Shit (Score:1)

Re:Shit (Score:5, Interesting)

Re:Shit (Score:5, Interesting)

Re: (Score:2, Interesting)

Re: Shit (Score:1)

Re: Shit (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Shit (Score:5, Informative)

Re: (Score:3)

If it is as good as Youtube, I'll pass (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3)

Re: If it is as good as Youtube, I'll pass (Score:3)

Re: (Score:2)

Re: (Score:2)

Not on Wayland (Score:1)

It's gotta be better than live sports subtitles (Score:2)

Re: It's gotta be better than live sports subtitle (Score:4, Informative)

Re: (Score:2)

Re: (Score:2)

Start by making the actual ones work (Score:2)

memory? (Score:2)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot