Opus 1.5 Gets a Serious Machine Learning Upgrade 19
Longtime Slashdot reader jmv writes: After more than two years of work, Opus 1.5 is out. It brings many new features that can improve quality and the general audio experience through machine learning, while maintaining fully-compatibility with previous releases. See this release page demonstrating all the new features, including:
- Significant improvement to packet loss robustness using Deep Redundancy (DRED)
- Improved packet loss concealment through Deep PLC
- Low-bitrate speech quality enhancement down to 6 kb/s wideband
- Improved x86 (AVX2) and Arm (Neon) optimizations
- Support for 4th and 5th order ambisonics
For those that don't know (Score:2)
For those that don't know what Opus is (like me), Opus seems to be an open-source audio format for lossy audio coding.
https://opus-codec.org/demo/op... [opus-codec.org]
Re: (Score:3)
To be more specific, it is a multipurpose codec, good for both lossy offline storage and real-time audio. VP9 uses it, as well as AV1.
Now, the AI stuff this time is extremely useful for VoIP over low bitrate flakey connections. This is CB radio and cell service in bad spots.
The 6kp/s is very impressive.
Re: For those that don't know (Score:3)
Opus is considered the best codec in almost every case and is significantly better than AAC.
But can it now distinguish between cymbals and noise, the highest of all audio codecs so far?
Re: For those that don't know (Score:2)
Bane *
Thanks Swype.
Re: (Score:2)
What word is this supposed to replace?
* Can Bane [some guy] distinguish... ?
* Can it distinguish between Bane and noise?
* Can it distinguish between cymbals and Bane?
* The highest of all audio Banes so far?
Re: For those that don't know (Score:4, Informative)
I'm pretty sure "highest" was the word that got autocowrecked:
But can it now distinguish between cymbals and noise, the bane of all audio codecs so far?
Re: (Score:2)
Hello fellow Swype user.
I'm still sticking to it as long as I can. It's not been updated in forever and is definitely getting buggy in relation to newer android systems. What's kind of pathetic is how none of the free swipe keyboards are as good as one that's not been touched in years.
The swipe to paste is amazing. The ability to intentionally add words to the dictionary properly is amazing too.
Gboard is a piece of crap by comparison.
Re: For those that don't know (Score:5, Informative)
Percussive sounds have always been a strong point for Opus relative to other codecs, avoiding problems with pre-echo etc, while sparse pure tones, as in e.g. glockenspiel solos, were something Xiph had to work at doing better with.
Part of that is simply due to the nature of the short-time Fourier transform [wikipedia.org]. Since it was designed first as a VOIP codec, prioritizing low latency, Opus uses short transform windows, while most other music-capable codecs use long ones. This results in Opus having naturally better temporal resolution, while other codecs have naturally better frequency resolution. That's a Gabor limit/ Uncertainty Principle [wikipedia.org] type of deal. Opus includes extra tricks to improve its performance on tonal content, and some of that includes boosting VBR bitrate; other codecs take corresponding measures to try to improve their performance on transients.
So while it may be worth encoding whatever cymbal-heavy tracks you have in mind and doing a blind listening test [hydrogenaud.io], I think it's likely the cymbals have been encoded pretty well even by pre-1.0 versions of Opus, which are now over twelve years old.
Where is the super-coding AI algo-wizard fix? (Score:1)
Re: (Score:3)
Whenever you use any online videoconferencing system (Google Chat, Skype, Zoom, LINE, whatever), you will most likely be using OPUS as your codec. It was made Mandatory To Implement for WebRTC.
Re: (Score:2)
thankfully opt-in (for now) (Score:3)
> most users should not notice the extra [CPU] cost, but people using older (5+ years) phones or microcontrollers might. For that reason, all new ML-based features are disabled by default in Opus 1.5.
- from the release notes, emphasis mine
Re: thankfully opt-in (for now) (Score:2)
Looks like it would only be an issue for encoding.
Re: (Score:2)
Looks like it would only be an issue for encoding.
Opus's single biggest use case is end user encoding. It is the codec of choice for all voice communications.
Re: (Score:2)
And it's not getting any worse so far as results that don't use ML for the encoding process, but perhaps not better. But with ML acceleration becoming more and more common, using ML is a sensible option for many end users, provide they don't abandon non-ML options. And I don't think they will. Quite possibly things can't be improved much except by using ML. Eventually ML acceleration will be as ubiquitous as FP arthmetic.
Re: thankfully opt-in (for now) (Score:2)
Not than decoding?
Re: (Score:2)
Thanks for the highlight. They're correct in not changing the default behaviour of their tool, at a minimum for reproducibility reasons. But the default that matters is the one in the user software. Like everything else it will be a configuration flag in audio/video encoding/transcoding software.