Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
Media

FFmpeg 8 Can Now Subtitle Your Videos on the Fly (theregister.com) 32

FFmpeg 8.0 brings GPU-accelerated video encoding via Vulkan -- and can now subtitle your videos automatically using integrated speech recognition. From a report: At the start of the week, the FFmpeg project released its eighth major version. It's codenamed "Huffman" after the Huffman code algorithm, which was invented in 1952, making it one of the oldest lossless compression algorithms.

[...] The changelog lists 30 significant changes, of which the top new feature is integrating Whisper. This means whisper.cpp, which is Georgi Gerganov's entirely local and offline version of OpenAI's Whisper automatic speech recognition model. The bottom line is that FFmpeg can now automatically subtitle videos for you.

FFmpeg 8 Can Now Subtitle Your Videos on the Fly

Comments Filter:
  • Looks like ffmpeg is the latest enshittification victim.

    • Re:Shit (Score:5, Interesting)

      by drinkypoo ( 153816 ) <drink@hyperlogos.org> on Thursday August 28, 2025 @07:42PM (#65622986) Homepage Journal

      At least it's local and offline.

    • Re:Shit (Score:5, Interesting)

      by Kisai ( 213879 ) on Thursday August 28, 2025 @08:09PM (#65623018)

      Not entirely.

      Whisper actually works rather well in several specific use cases, and fails spectacularly in others. You need to know this in advance:
      - Whisper is roughly 90% accurate at transcription and translation
      - Whisper absolutely does not know what to do with silence and will randomly inject "subtitled by (fansub group, netflix, etc)" into silence
      - Whisper does not really understand singing well
      - Whisper does not understand code-switching (eg switching between English and Japanese in the same context window)
      - Whisper understands zero onomatopoeia, just like all ASR systems.

      With that said, it is not useful or reliable for:
      1. Fansubbing, especially anything adult. It can only understand words, not onomatopoeia. So when it stumbles into a scene where someone goes "ah!" it has zero context for it. The result is actually pretty silly, and often turns sex scenes in R-rated and unrated media into a series of random gibberish words that begin with the same sound. Likewise children playing and women giggling often turns it into a series of nonsense, sometimes sexually charged words.
      2. Transcription of podcasts. Sorry bub, your average podcaster has a shitty microphone, and can not subtitle when multiple people are speaking over each other. Especially when people use Zoom or Discord to have a multi-party video. If you want to use it to transcribe a podcast, record each participant separately and merge the result.
      3. ASR technology is often built on corpus of bad data that elevates profanity when it tries to guess words it can not understand. So it's more likely to use racist language "trigger" becomes the same word with an n, that isn't even in the audio. So your input source must be professional grade, or it's word error rate will be higher and favor profanity or racist language over other more less-often but more obvious words.

      I doubt most people will use this in practice as Whisper.cpp is insanely slow without being expressly used on a 16GB nvidia GPU anyway.

      • Re: (Score:2, Interesting)

        by djgl ( 6202552 )

        The points you mention sound like they are drawbacks of the available language models, not of the used whisper library.

      • I was confused about your comment until I fugured out that you don't seem to understand what an onomatopoeia is.
        • Really common misunderstanding. I've spent years trying to correct people. I'm giving up. It's pervasive in the subtitling/translation/localisation industry
        • by allo ( 1728082 )

          I am still confused. I can look up the real meaning, but what is the common misconception meaning?

      • by allo ( 1728082 )

        "- Whisper absolutely does not know what to do with silence and will randomly inject "subtitled by (fansub group, netflix, etc)" into silence"

        While this may be a design problem with Whisper, it should be easy to avoid in ffmpeg. If silence is detected, do not generate subtitles. Not the scientific solution, but a working one.

    • Re:Shit (Score:5, Informative)

      by Lproven ( 6030 ) on Friday August 29, 2025 @04:08AM (#65623680) Homepage Journal

      I wrote this article.

      I don't think so, no. It's a local feature, not online, entirely optional, and you are perfectly free to ignore it, not turn it on, and use FFmpeg as before.

      The size of the binary of FFmpeg is a rounding error compared to the many gigabytes of the video files it takes as input and emits. If you do not enable the Whisper model I am not even sure it'll take any additional memory at runtime.

    • If people want to develop free and open source AI, it's better than just leaving it to self-interested corporations. Provided it's not forced on people.

  • by thesjaakspoiler ( 4782965 ) on Thursday August 28, 2025 @07:46PM (#65622994)

    Youtube's automatic subtitling is a piece of junk.

  • The last fs I trusted was Windows 10. We had airport sysadmin delete the regex hotfix. Not worth recommending! I don't know why people insist on stupid commentary.
  • I wish they'd go ahead and switch to some kind of automated subtitles already. The human subtitlers do an amazing job for a human, but they often get several sentences behind what's actually going on. If AI can subtitle live events, keeping the words on the screen in sync with what's being said, I'd welcome that even if it got a few more words wrong (which I doubt would happen, the humans get a lot of words wrong, and miss a lot as well).

  • I have an Android TV projector and use it to play movies off a local DLNA server with VLC. Half the time the local srt files are completely ignored. I can find no reasons for it and I've tried everything. It's annoying as fuck as I (and family) watch movies in multiple languages. I find all the other media player systems (Plex, Jellyfin, etc...) highly cumbersome because they reorganize everything.
    Yesterday I even tried merging avi and srt into an mkv and even that wouldn't display the subtitle. WTF ?!?
  • How big is this new AI monstrosity? A few terabytes maybe?

    • by allo ( 1728082 )

      What did you do to find out? I mean you could for example try Google, Perplexity, maybe even ChatGPT could be asked about the size of the Whisper model.

The confusion of a staff member is measured by the length of his memos. -- New York Times, Jan. 20, 1981

Working...