Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Media

Opus 1.5 Gets a Serious Machine Learning Upgrade 19

Longtime Slashdot reader jmv writes: After more than two years of work, Opus 1.5 is out. It brings many new features that can improve quality and the general audio experience through machine learning, while maintaining fully-compatibility with previous releases. See this release page demonstrating all the new features, including:
  • Significant improvement to packet loss robustness using Deep Redundancy (DRED)
  • Improved packet loss concealment through Deep PLC
  • Low-bitrate speech quality enhancement down to 6 kb/s wideband
  • Improved x86 (AVX2) and Arm (Neon) optimizations
  • Support for 4th and 5th order ambisonics
This discussion has been archived. No new comments can be posted.

Opus 1.5 Gets a Serious Machine Learning Upgrade

Comments Filter:
  • For those that don't know what Opus is (like me), Opus seems to be an open-source audio format for lossy audio coding.

    https://opus-codec.org/demo/op... [opus-codec.org]

    • To be more specific, it is a multipurpose codec, good for both lossy offline storage and real-time audio. VP9 uses it, as well as AV1.

      Now, the AI stuff this time is extremely useful for VoIP over low bitrate flakey connections. This is CB radio and cell service in bad spots.

      The 6kp/s is very impressive.

    • Opus is considered the best codec in almost every case and is significantly better than AAC.
      But can it now distinguish between cymbals and noise, the highest of all audio codecs so far?

      • Bane *
        Thanks Swype.

        • What word is this supposed to replace?
          * Can Bane [some guy] distinguish... ?
          * Can it distinguish between Bane and noise?
          * Can it distinguish between cymbals and Bane?
          * The highest of all audio Banes so far?

        • Hello fellow Swype user.

          I'm still sticking to it as long as I can. It's not been updated in forever and is definitely getting buggy in relation to newer android systems. What's kind of pathetic is how none of the free swipe keyboards are as good as one that's not been touched in years.

          The swipe to paste is amazing. The ability to intentionally add words to the dictionary properly is amazing too.

          Gboard is a piece of crap by comparison.

      • by jensend ( 71114 ) on Tuesday March 05, 2024 @02:33AM (#64290568)

        Percussive sounds have always been a strong point for Opus relative to other codecs, avoiding problems with pre-echo etc, while sparse pure tones, as in e.g. glockenspiel solos, were something Xiph had to work at doing better with.

        Part of that is simply due to the nature of the short-time Fourier transform [wikipedia.org]. Since it was designed first as a VOIP codec, prioritizing low latency, Opus uses short transform windows, while most other music-capable codecs use long ones. This results in Opus having naturally better temporal resolution, while other codecs have naturally better frequency resolution. That's a Gabor limit/ Uncertainty Principle [wikipedia.org] type of deal. Opus includes extra tricks to improve its performance on tonal content, and some of that includes boosting VBR bitrate; other codecs take corresponding measures to try to improve their performance on transients.

        So while it may be worth encoding whatever cymbal-heavy tracks you have in mind and doing a blind listening test [hydrogenaud.io], I think it's likely the cymbals have been encoded pretty well even by pre-1.0 versions of Opus, which are now over twelve years old.

      • I thought AI was going to code circles around us and improve every algorithm with it's superpowers. Why then are we just seeing incremental human-based improvements in such things as Opus? Since "everything is about to be automated, especially coding" according to the nVidia CEO, shouldn't all this stuff just near-magically start improving like overnight yesterday?
    • by Dwedit ( 232252 )

      Whenever you use any online videoconferencing system (Google Chat, Skype, Zoom, LINE, whatever), you will most likely be using OPUS as your codec. It was made Mandatory To Implement for WebRTC.

  • by DrunkenTerror ( 561616 ) on Monday March 04, 2024 @06:46PM (#64289924) Homepage Journal

    > most users should not notice the extra [CPU] cost, but people using older (5+ years) phones or microcontrollers might. For that reason, all new ML-based features are disabled by default in Opus 1.5.

    - from the release notes, emphasis mine

    • Looks like it would only be an issue for encoding.

      • Looks like it would only be an issue for encoding.

        Opus's single biggest use case is end user encoding. It is the codec of choice for all voice communications.

        • And it's not getting any worse so far as results that don't use ML for the encoding process, but perhaps not better. But with ML acceleration becoming more and more common, using ML is a sensible option for many end users, provide they don't abandon non-ML options. And I don't think they will. Quite possibly things can't be improved much except by using ML. Eventually ML acceleration will be as ubiquitous as FP arthmetic.

        • Not than decoding?

    • Thanks for the highlight. They're correct in not changing the default behaviour of their tool, at a minimum for reproducibility reasons. But the default that matters is the one in the user software. Like everything else it will be a configuration flag in audio/video encoding/transcoding software.

Programmers do it bit by bit.

Working...