Google Open-Sources Lyra In Beta To Reduce Voice Call Bandwidth Usage (venturebeat.com) 30

Posted by BeauHD on Tuesday April 06, 2021 @06:02PM from the can-you-hear-me-now dept.

Google today open-sourced Lyra in beta, an audio codec that uses machine learning to produce high-quality voice calls. VentureBeat reports: The code and demo, which are available on GitHub, compress raw audio down to 3 kilobits per second for "quality that compares favorably to other codecs," Google says. Lyra's architecture is separated into two pieces, an encoder and decoder. When someone talks into their phone, the encoder captures distinctive attributes, called features, from their speech. Lyra extracts these features in 40-millisecond chunks and then compresses and sends them over the network. It's the decoder's job to convert the features back into an audio waveform that can be played out over the listener's phone.

According to Google, Lyra's architecture is similar to traditional audio codecs, which form the backbone of internet communication. But while these traditional codecs are based on digital signal processing techniques, the key advantage for Lyra comes from the ability of its decoder to reconstruct a high-quality signal. Google believes there are a number of applications Lyra might be uniquely suited to, from archiving large amounts of speech and saving battery to alleviating network congestion in emergency situations.

Google Open-Sources Lyra In Beta To Reduce Voice Call Bandwidth Usage

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 30 Comments Log In/Create an Account

Comments Filter:

Yet another speech codec (Score:5, Insightful)

by xack ( 5304745 ) writes: on Tuesday April 06, 2021 @06:17PM (#61244608)

Toss it in the pile over here [wikipedia.org].

- Re:Yet another speech codec (Score:4, Interesting)
  
  by NateFromMich ( 6359610 ) writes: on Tuesday April 06, 2021 @07:40PM (#61244954)
  
  Toss it in the pile over here [wikipedia.org].
  "Human speech can be modeled easier than generic audio."
  "This means that otherwise good quality generic audio codecs perform poorly with speech signal even at quite high bit rates"
  Who wrote this contradictory crap?
  
  - Re: (Score:1)
    
    by Anonymous Coward writes:
    
    192.100.116.143 at 2:20 PM, on October 18th 2004 [wikipedia.org]. Seemingly a singular person in a single edit, which was opposite my expectation. It's from the oldest revision of the page. The first named editor was Rick Block [wikipedia.org]. Blame the first person to fix it, I say. Must've been them what broke it.
    - Re: Yet another speech codec (Score:2)
      
      by BAReFO0t ( 6240524 ) writes:
      
      PROTIP: Wikipedia is written by multiple people. Most of ehich do not read the rest of whst they are adding to. :P
      Think of it as a million monkeys with a single typewriter... and each with a bone-headed ignorant ideology that "is the absolute neutral truth of a reliable source for everyone in all of the known and unknown universes and parallel dimensions and everythingeverythingeverything!".
not an audio codec (Score:5, Interesting)

by the_other_chewey ( 1119125 ) writes: on Tuesday April 06, 2021 @06:19PM (#61244622)

It is to be clarified that Lyra is not a general purpose audio codec,
but a domain-specialized speech codec.

While it certainly is an impressive achievement, and will no doubt be
highly useful for low-bandwidth voice communication (the examples are
quite impressive), it won't make music streaming or any other non-speech
audio more bandwidth-efficient.

- Re: not an audio codec (Score:3)
  
  by BAReFO0t ( 6240524 ) writes:
  
  Also: What speech?
  30-year-old female American $someCoast TV English?
  Or will a highly tonal tribal language from both a huge deep-bass guy and his little granddaughter work?
- Re: not an audio codec (Score:1)
  
  by zilexa ( 5896268 ) writes:
  
  Thanks for that. Indeed this is great for speech, phone calls, video conferenc calls. For music, OPUS is the go-to codec as it is nearly basically CD quality at average of 128kbps, beating both AAC and Ogg Vorbis even at 64-96kbps. Based on independent listening tests.
Not really open source, and no mention of patents (Score:5, Insightful)

by nneul ( 8033 ) * writes: <nneul@neulinger.org> on Tuesday April 06, 2021 @06:43PM (#61244706) Homepage

With audio and video codecs - having open source code is not usually the most important part - it's having an actual license and patent grant to use the algorithm without getting hassled over it or exposed to IP risk.
Additionally, calling it open source when it can't even be built without the closed source math kernel seems pretty questionable. See below from the github project readme:
Please note that there is a closed-source kernel used for math operations that is linked via a shared object called libsparse_inference.so. We provide the libsparse_inference.so library to be linked, but are unable to provide source for it. This is the reason that a specific toolchain/compiler is required.

- Re:Not really open source, and no mention of paten (Score:4, Informative)
  
  by Gwala ( 309968 ) writes: <adam@PARISgwala.net minus city> on Tuesday April 06, 2021 @08:34PM (#61245092) Homepage
  
  It's Apache 2.0 [github.com]. It's got a patent grant included.
  
  - Re: (Score:2)
    
    by nneul ( 8033 ) * writes:
    
    Good point, thank you!
- Re: Not really open source, and no mention of pate (Score:2)
  
  by BAReFO0t ( 6240524 ) writes:
  
  Only in Murica, where there is such a thing a software patents.
Ok, but probably patented. (Score:5, Informative)

by Areyoukiddingme ( 1289470 ) writes: on Tuesday April 06, 2021 @06:53PM (#61244756)

3000 bits per second is nice and all, but if it's network congestion you want to fight, use Codec 2 [wikipedia.org], which is an open source [github.com] speech codec that produces intelligible output at 450 bits per second. Codec 2 starts at 3200 bits per second, and if you're willing to be that spendthrift with bandwidth, it also takes the minimum latency down to 20 ms, half of Lyra's latency.
As I recall, Codec 2 was specifically developed to be patent free. It also has the advantage over Lyra of being so small it can run in a microcontroller. A classic microcontroller, with no FPU. Unlike Lyra, which requires a big fat blob of machine learned data to operate.
Nice effort from Google, but human cleverness still beats a generative adversarial network.

- Re: Ok, but probably patented. (Score:1, Interesting)
  
  by deepthought90 ( 937992 ) writes:
  
  Every company or organization I've dealt with has a serious not-invented-here syndrome. Why should Google be any different?
- Re: (Score:2)
  
  by Nova77 ( 613150 ) writes:
  
  Okay, but how does it compare in terms of audio quality?
Comment removed (Score:4, Funny)

by account_deleted ( 4530225 ) writes: on Tuesday April 06, 2021 @07:23PM (#61244890)

Comment removed based on user account deletion

is it really an audio codec ? (Score:3)

by AlexHilbertRyan ( 7255798 ) writes: on Tuesday April 06, 2021 @07:24PM (#61244892)

or is it really transcribing the speech into a simpler form and attempting to recreate them on the other end ?

- Re: (Score:2)
  
  by NateFromMich ( 6359610 ) writes:
  
  or is it really transcribing the speech into a simpler form and attempting to recreate them on the other end ?
  If that's the case, you may as well just to speech to text and text to speech again.
  Written language is extremely dense compared to audio.
  - Written language is extremely dense compared to au (Score:2)
    
    by rossdee ( 243626 ) writes:
    
    Back in the day I remember hearing that a 300 baud modem would be fast enough since nobody could type faster than than 30 cps.
    My first modem was 1200 baud
- Re: (Score:3)
  
  by mattr ( 78516 ) writes:
  
  It says that features are picked up every 50ms and a feature is a spectrogram showing energy levels in a number of bands to which the human ear is sensitive. So it is not text to speech, it is a waveform, therefore audio. Did not dig into the code to find the ML part of how it picks a waveform, presumably it has learned common (for English) sequences of features. The demo was not available. And it requires a specific toolchain with closed-source library so not even going to touch it. If they need bandwidth
- Re: is it really an audio codec ? (Score:1)
  
  by BAReFO0t ( 6240524 ) writes:
  
  What's the difference?
  PCM is just a transcription in a vey fast purely tonal language.
  - Re: (Score:2)
    
    by AlexHilbertRyan ( 7255798 ) writes:
    
    NExt your going to tell me a transcription on paper is an audio codec.
- Re: (Score:2)
  
  by AmiMoJo ( 196126 ) writes:
  
  It seems to be a bit like FLAC. Approximate the signal, then encode the difference. The better the approximation is the less the difference and the less data you need to encode it.
Faking (Score:1)

by AlexHilbertRyan ( 7255798 ) writes:

Sounds like the perfect tool to reproduce fake speech and simulate another person for honourable purposes like spamming.
comparison (Score:4, Insightful)

by johnjones ( 14274 ) writes: on Tuesday April 06, 2021 @07:28PM (#61244908) Homepage Journal

to be honest I think the comparison they jacked up the levels so its not really all that better and I wish everyone would migrate to Opus

Please no (Score:3)

by wakeboarder ( 2695839 ) writes: on Tuesday April 06, 2021 @08:28PM (#61245064)

Phone calls are already really bad, even when you have good cell reception. More compression make the call quality more susceptible to error.

- Re: (Score:2)
  
  by ShadowRangerRIT ( 1301549 ) writes:
  
  Of course, you could always use some of the freed up bandwidth for better error-detection/correction. As is, more and more modern communications are encrypted, so errors have to be detectable/correctable anyway if they're going to use good encryption. Why not reduce the bandwidth while they're at it?
Amateur Radio (Score:1)

by Dan East ( 318230 ) writes:

Sounds very useful for Amateur Radio use, which is always about cramming as much into as little bandwidth as possible. If only we could get radio manufacturers to ever agree and standardize on a half decent digital mode of modulation...
... to altering what was being said ... (Score:1)

by BAReFO0t ( 6240524 ) writes:

... in version 2.0 "with the 100 times bigger and just MOAR AI *cums ads and soon-to-be-abandoned projects*".
Well it's not terribly relevant (Score:3)

by Casandro ( 751346 ) writes: on Wednesday April 07, 2021 @10:57AM (#61246938)

Sure low bandwidth voice codecs have their use, for example in voice broadcasting. For the far larger market of telephony they are only of limited use.
The problem is that today we use data networks to transport voice. VoIP and similar schemes will add a fixed sized header in front of your voice data. Therefore you need to break up your voice into packets. If you make these packets to small you will end up with mostly headers. If you make these packets to big, you will impose a delay into your connection. You cannot make that delay arbitrarily large in most situations as it'll start to get annoying from about 250ms. So typically going below 8-16 kilobits per second isn't of any use for that scenario.
So essentially this is something like Codec2 and all the other low bandwidth codecs that have specialist uses, but aren't useful for general purpose telephony or teleconferencing.
So unless you want to start a "Push to Talk"-service via some low bitrate channel, this isn't particularly exciting. If this hadn't been publicised by Google it probably wouldn't have made the news.

Pied Piper (Score:1)

by rlwinm ( 6158720 ) writes:

Why didn't they call it PiedPiper?

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Google Open-Sources Lyra In Beta To Reduce Voice Call Bandwidth Usage (venturebeat.com) 30

Google Open-Sources Lyra In Beta To Reduce Voice Call Bandwidth Usage More Login

Google Open-Sources Lyra In Beta To Reduce Voice Call Bandwidth Usage

Yet another speech codec (Score:5, Insightful)

Re:Yet another speech codec (Score:4, Interesting)

Re: (Score:1)

Re: Yet another speech codec (Score:2)

not an audio codec (Score:5, Interesting)

Re: not an audio codec (Score:3)

Re: not an audio codec (Score:1)

Not really open source, and no mention of patents (Score:5, Insightful)

Re:Not really open source, and no mention of paten (Score:4, Informative)

Re: (Score:2)

Re: Not really open source, and no mention of pate (Score:2)

Ok, but probably patented. (Score:5, Informative)

Re: Ok, but probably patented. (Score:1, Interesting)

Re: (Score:2)

Comment removed (Score:4, Funny)

is it really an audio codec ? (Score:3)

Re: (Score:2)

Written language is extremely dense compared to au (Score:2)

Re: (Score:3)

Re: is it really an audio codec ? (Score:1)

Re: (Score:2)

Re: (Score:2)

Faking (Score:1)

comparison (Score:4, Insightful)

Please no (Score:3)

Re: (Score:2)

Amateur Radio (Score:1)

... to altering what was being said ... (Score:1)

Well it's not terribly relevant (Score:3)

Pied Piper (Score:1)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot