Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Open Source Communications Google

Google Open-Sources Lyra In Beta To Reduce Voice Call Bandwidth Usage (venturebeat.com) 30

Google today open-sourced Lyra in beta, an audio codec that uses machine learning to produce high-quality voice calls. VentureBeat reports: The code and demo, which are available on GitHub, compress raw audio down to 3 kilobits per second for "quality that compares favorably to other codecs," Google says. Lyra's architecture is separated into two pieces, an encoder and decoder. When someone talks into their phone, the encoder captures distinctive attributes, called features, from their speech. Lyra extracts these features in 40-millisecond chunks and then compresses and sends them over the network. It's the decoder's job to convert the features back into an audio waveform that can be played out over the listener's phone.

According to Google, Lyra's architecture is similar to traditional audio codecs, which form the backbone of internet communication. But while these traditional codecs are based on digital signal processing techniques, the key advantage for Lyra comes from the ability of its decoder to reconstruct a high-quality signal. Google believes there are a number of applications Lyra might be uniquely suited to, from archiving large amounts of speech and saving battery to alleviating network congestion in emergency situations.

This discussion has been archived. No new comments can be posted.

Google Open-Sources Lyra In Beta To Reduce Voice Call Bandwidth Usage

Comments Filter:
  • by xack ( 5304745 ) on Tuesday April 06, 2021 @06:17PM (#61244608)
    • by NateFromMich ( 6359610 ) on Tuesday April 06, 2021 @07:40PM (#61244954)

      "Human speech can be modeled easier than generic audio."
      "This means that otherwise good quality generic audio codecs perform poorly with speech signal even at quite high bit rates"

      Who wrote this contradictory crap?

      • by Anonymous Coward

        192.100.116.143 at 2:20 PM, on October 18th 2004 [wikipedia.org]. Seemingly a singular person in a single edit, which was opposite my expectation. It's from the oldest revision of the page. The first named editor was Rick Block [wikipedia.org]. Blame the first person to fix it, I say. Must've been them what broke it.

        • PROTIP: Wikipedia is written by multiple people. Most of ehich do not read the rest of whst they are adding to. :P

          Think of it as a million monkeys with a single typewriter... and each with a bone-headed ignorant ideology that "is the absolute neutral truth of a reliable source for everyone in all of the known and unknown universes and parallel dimensions and everythingeverythingeverything!".

  • not an audio codec (Score:5, Interesting)

    by the_other_chewey ( 1119125 ) on Tuesday April 06, 2021 @06:19PM (#61244622)
    It is to be clarified that Lyra is not a general purpose audio codec,
    but a domain-specialized speech codec.

    While it certainly is an impressive achievement, and will no doubt be
    highly useful for low-bandwidth voice communication (the examples are
    quite impressive), it won't make music streaming or any other non-speech
    audio more bandwidth-efficient.
    • Also: What speech?

      30-year-old female American $someCoast TV English?
      Or will a highly tonal tribal language from both a huge deep-bass guy and his little granddaughter work?

    • Thanks for that. Indeed this is great for speech, phone calls, video conferenc calls. For music, OPUS is the go-to codec as it is nearly basically CD quality at average of 128kbps, beating both AAC and Ogg Vorbis even at 64-96kbps. Based on independent listening tests.
  • by nneul ( 8033 ) * <nneul@neulinger.org> on Tuesday April 06, 2021 @06:43PM (#61244706) Homepage

    With audio and video codecs - having open source code is not usually the most important part - it's having an actual license and patent grant to use the algorithm without getting hassled over it or exposed to IP risk.

    Additionally, calling it open source when it can't even be built without the closed source math kernel seems pretty questionable. See below from the github project readme:

        Please note that there is a closed-source kernel used for math operations that is linked via a shared object called libsparse_inference.so. We provide the libsparse_inference.so library to be linked, but are unable to provide source for it. This is the reason that a specific toolchain/compiler is required.

  • by Areyoukiddingme ( 1289470 ) on Tuesday April 06, 2021 @06:53PM (#61244756)

    3000 bits per second is nice and all, but if it's network congestion you want to fight, use Codec 2 [wikipedia.org], which is an open source [github.com] speech codec that produces intelligible output at 450 bits per second. Codec 2 starts at 3200 bits per second, and if you're willing to be that spendthrift with bandwidth, it also takes the minimum latency down to 20 ms, half of Lyra's latency.

    As I recall, Codec 2 was specifically developed to be patent free. It also has the advantage over Lyra of being so small it can run in a microcontroller. A classic microcontroller, with no FPU. Unlike Lyra, which requires a big fat blob of machine learned data to operate.

    Nice effort from Google, but human cleverness still beats a generative adversarial network.

  • by account_deleted ( 4530225 ) on Tuesday April 06, 2021 @07:23PM (#61244890)
    Comment removed based on user account deletion
  • by AlexHilbertRyan ( 7255798 ) on Tuesday April 06, 2021 @07:24PM (#61244892)
    or is it really transcribing the speech into a simpler form and attempting to recreate them on the other end ?
    • or is it really transcribing the speech into a simpler form and attempting to recreate them on the other end ?

      If that's the case, you may as well just to speech to text and text to speech again.
      Written language is extremely dense compared to audio.

    • by mattr ( 78516 )

      It says that features are picked up every 50ms and a feature is a spectrogram showing energy levels in a number of bands to which the human ear is sensitive. So it is not text to speech, it is a waveform, therefore audio. Did not dig into the code to find the ML part of how it picks a waveform, presumably it has learned common (for English) sequences of features. The demo was not available. And it requires a specific toolchain with closed-source library so not even going to touch it. If they need bandwidth

    • What's the difference?

      PCM is just a transcription in a vey fast purely tonal language.

    • by AmiMoJo ( 196126 )

      It seems to be a bit like FLAC. Approximate the signal, then encode the difference. The better the approximation is the less the difference and the less data you need to encode it.

  • Sounds like the perfect tool to reproduce fake speech and simulate another person for honourable purposes like spamming.
  • comparison (Score:4, Insightful)

    by johnjones ( 14274 ) on Tuesday April 06, 2021 @07:28PM (#61244908) Homepage Journal

    to be honest I think the comparison they jacked up the levels so its not really all that better and I wish everyone would migrate to Opus

  • by wakeboarder ( 2695839 ) on Tuesday April 06, 2021 @08:28PM (#61245064)

    Phone calls are already really bad, even when you have good cell reception. More compression make the call quality more susceptible to error.

    • Of course, you could always use some of the freed up bandwidth for better error-detection/correction. As is, more and more modern communications are encrypted, so errors have to be detectable/correctable anyway if they're going to use good encryption. Why not reduce the bandwidth while they're at it?
  • Sounds very useful for Amateur Radio use, which is always about cramming as much into as little bandwidth as possible. If only we could get radio manufacturers to ever agree and standardize on a half decent digital mode of modulation...

  • ... in version 2.0 "with the 100 times bigger and just MOAR AI *cums ads and soon-to-be-abandoned projects*".

  • by Casandro ( 751346 ) on Wednesday April 07, 2021 @10:57AM (#61246938)

    Sure low bandwidth voice codecs have their use, for example in voice broadcasting. For the far larger market of telephony they are only of limited use.

    The problem is that today we use data networks to transport voice. VoIP and similar schemes will add a fixed sized header in front of your voice data. Therefore you need to break up your voice into packets. If you make these packets to small you will end up with mostly headers. If you make these packets to big, you will impose a delay into your connection. You cannot make that delay arbitrarily large in most situations as it'll start to get annoying from about 250ms. So typically going below 8-16 kilobits per second isn't of any use for that scenario.

    So essentially this is something like Codec2 and all the other low bandwidth codecs that have specialist uses, but aren't useful for general purpose telephony or teleconferencing.

    So unless you want to start a "Push to Talk"-service via some low bitrate channel, this isn't particularly exciting. If this hadn't been publicised by Google it probably wouldn't have made the news.

  • Why didn't they call it PiedPiper?

Any circuit design must contain at least one part which is obsolete, two parts which are unobtainable, and three parts which are still under development.

Working...