Opus 1.5 Gets a Serious Machine Learning Upgrade 19

Posted by BeauHD on Monday March 04, 2024 @06:30PM from the new-and-improved dept.

Longtime Slashdot reader jmv writes: After more than two years of work, Opus 1.5 is out. It brings many new features that can improve quality and the general audio experience through machine learning, while maintaining fully-compatibility with previous releases. See this release page demonstrating all the new features, including:

Significant improvement to packet loss robustness using Deep Redundancy (DRED)
Improved packet loss concealment through Deep PLC
Low-bitrate speech quality enhancement down to 6 kb/s wideband
Improved x86 (AVX2) and Arm (Neon) optimizations
Support for 4th and 5th order ambisonics

Opus 1.5 Gets a Serious Machine Learning Upgrade

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 19 Comments Log In/Create an Account

Comments Filter:

For those that don't know (Score:2)

by JustAnotherOldGuy ( 4145623 ) writes:

For those that don't know what Opus is (like me), Opus seems to be an open-source audio format for lossy audio coding.
https://opus-codec.org/demo/op... [opus-codec.org]
- Re: (Score:3)
  
  by WilCompute ( 1155437 ) writes:
  
  To be more specific, it is a multipurpose codec, good for both lossy offline storage and real-time audio. VP9 uses it, as well as AV1.
  Now, the AI stuff this time is extremely useful for VoIP over low bitrate flakey connections. This is CB radio and cell service in bad spots.
  The 6kp/s is very impressive.
- Re: For those that don't know (Score:3)
  
  by UpnAtom ( 551727 ) writes:
  
  Opus is considered the best codec in almost every case and is significantly better than AAC.
  But can it now distinguish between cymbals and noise, the highest of all audio codecs so far?
  - Re: For those that don't know (Score:2)
    
    by UpnAtom ( 551727 ) writes:
    
    Bane *
    Thanks Swype.
    - Re: (Score:2)
      
      by test321 ( 8891681 ) writes:
      
      What word is this supposed to replace?
      * Can Bane [some guy] distinguish... ?
      * Can it distinguish between Bane and noise?
      * Can it distinguish between cymbals and Bane?
      * The highest of all audio Banes so far?
      - Re: For those that don't know (Score:4, Informative)
        
        by Entrope ( 68843 ) writes: on Monday March 04, 2024 @09:36PM (#64290208) Homepage
        
        I'm pretty sure "highest" was the word that got autocowrecked:
        But can it now distinguish between cymbals and noise, the bane of all audio codecs so far?
        
    - Re: (Score:2)
      
      by serviscope_minor ( 664417 ) writes:
      
      Hello fellow Swype user.
      I'm still sticking to it as long as I can. It's not been updated in forever and is definitely getting buggy in relation to newer android systems. What's kind of pathetic is how none of the free swipe keyboards are as good as one that's not been touched in years.
      The swipe to paste is amazing. The ability to intentionally add words to the dictionary properly is amazing too.
      Gboard is a piece of crap by comparison.
  - Re: For those that don't know (Score:5, Informative)
    
    by jensend ( 71114 ) writes: on Tuesday March 05, 2024 @03:33AM (#64290568)
    
    Percussive sounds have always been a strong point for Opus relative to other codecs, avoiding problems with pre-echo etc, while sparse pure tones, as in e.g. glockenspiel solos, were something Xiph had to work at doing better with.
    Part of that is simply due to the nature of the short-time Fourier transform [wikipedia.org]. Since it was designed first as a VOIP codec, prioritizing low latency, Opus uses short transform windows, while most other music-capable codecs use long ones. This results in Opus having naturally better temporal resolution, while other codecs have naturally better frequency resolution. That's a Gabor limit/ Uncertainty Principle [wikipedia.org] type of deal. Opus includes extra tricks to improve its performance on tonal content, and some of that includes boosting VBR bitrate; other codecs take corresponding measures to try to improve their performance on transients.
    So while it may be worth encoding whatever cymbal-heavy tracks you have in mind and doing a blind listening test [hydrogenaud.io], I think it's likely the cymbals have been encoded pretty well even by pre-1.0 versions of Opus, which are now over twelve years old.
    
  - Where is the super-coding AI algo-wizard fix? (Score:1)
    
    by Seven Spirals ( 4924941 ) writes:
    
    I thought AI was going to code circles around us and improve every algorithm with it's superpowers. Why then are we just seeing incremental human-based improvements in such things as Opus? Since "everything is about to be automated, especially coding" according to the nVidia CEO, shouldn't all this stuff just near-magically start improving like overnight yesterday?
- Re: (Score:3)
  
  by Dwedit ( 232252 ) writes:
  
  Whenever you use any online videoconferencing system (Google Chat, Skype, Zoom, LINE, whatever), you will most likely be using OPUS as your codec. It was made Mandatory To Implement for WebRTC.
  - - Re: (Score:2)
      
      by jsonn ( 792303 ) writes:
      
      Much of the quality is a direct function of the used bit rate. So if a videoconferencing system is massive limiting the audio bandwidth, no codec will give excellent quality.
thankfully opt-in (for now) (Score:3)

by DrunkenTerror ( 561616 ) writes: on Monday March 04, 2024 @07:46PM (#64289924) Homepage Journal

> most users should not notice the extra [CPU] cost, but people using older (5+ years) phones or microcontrollers might. For that reason, all new ML-based features are disabled by default in Opus 1.5.
- from the release notes, emphasis mine

- Re: thankfully opt-in (for now) (Score:2)
  
  by UpnAtom ( 551727 ) writes:
  
  Looks like it would only be an issue for encoding.
  - Re: (Score:2)
    
    by thegarbz ( 1787294 ) writes:
    
    Looks like it would only be an issue for encoding.
    Opus's single biggest use case is end user encoding. It is the codec of choice for all voice communications.
    - Re: (Score:2)
      
      by John Allsup ( 987 ) writes:
      
      And it's not getting any worse so far as results that don't use ML for the encoding process, but perhaps not better. But with ML acceleration becoming more and more common, using ML is a sensible option for many end users, provide they don't abandon non-ML options. And I don't think they will. Quite possibly things can't be improved much except by using ML. Eventually ML acceleration will be as ubiquitous as FP arthmetic.
    - Re: thankfully opt-in (for now) (Score:2)
      
      by UpnAtom ( 551727 ) writes:
      
      Not than decoding?
- Re: (Score:2)
  
  by test321 ( 8891681 ) writes:
  
  Thanks for the highlight. They're correct in not changing the default behaviour of their tool, at a minimum for reproducibility reasons. But the default that matters is the one in the user software. Like everything else it will be a configuration flag in audio/video encoding/transcoding software.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Opus 1.5 Gets a Serious Machine Learning Upgrade 19

Opus 1.5 Gets a Serious Machine Learning Upgrade More Login

Opus 1.5 Gets a Serious Machine Learning Upgrade

For those that don't know (Score:2)

Re: (Score:3)

Re: For those that don't know (Score:3)

Re: For those that don't know (Score:2)

Re: (Score:2)

Re: For those that don't know (Score:4, Informative)

Re: (Score:2)

Re: For those that don't know (Score:5, Informative)

Where is the super-coding AI algo-wizard fix? (Score:1)

Re: (Score:3)

Re: (Score:2)

thankfully opt-in (for now) (Score:3)

Re: thankfully opt-in (for now) (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: thankfully opt-in (for now) (Score:2)

Re: (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot