Codec2 — an Open Source, Low-Bandwidth Voice Codec 179
Bruce Perens writes "Codec2 is an Open Source digital voice codec for low-bandwidth applications, in its first Alpha release. Currently it can encode 3.75 seconds of clear speech in 1050 bytes, and there are opportunities to code in additional compression that will further reduce its bandwidth. The main developer is David Rowe, who also worked on Speex. Originally designed for Amateur Radio, both via sound-card software modems on HF radio and as an alternative to the proprietary voice codec presently used in D-STAR, the codec is probably also useful for telephony at a fraction of current bandwidths. The algorithm is based on papers from the 1980s, and is intended to be unencumbered by valid unexpired patent claims. The license is LGPL2. The project is seeking developers for testing in applications, algorithmic improvement, conversion to fixed-point, and coding to be more suitable
for embedded systems."
Presentation this week. (Score:5, Informative)
Re: (Score:2, Funny)
But will you be presenting IN Codec2?
That would be very impressive.
Re:Presentation this week. (Score:5, Informative)
Re: (Score:2)
>>>But will you be presenting IN Codec2?
But staticy. By my quick calculation it's only 2.4 kbps encoding. Like listening to voice over a 2400 baud modem.
How does it handle background noise? (Score:3, Interesting)
Bruce, have you guys done any testing of performance in the presence of background noise? I know that in the PMR area, there are a lot of firemen who are very unhappy with what happens to AMBE when their is background noise (e.g. saws, Personal Alert Safety System, fire) gets into the mike - while AMBE does ok at encoding just speech, throw the noise of a saw in the background and all you get is garbage.
While the initial application of CODEC2 is hams in their shacks with their noise-canceling mikes, It Woul
Re: (Score:2)
I don't know how codec2 actually does, but noise is a fundamental problem for all low-bitrate codecs. One thing that can sometimes help is applying some (conservative) noise reduction on the input to reduce the effect of noise on the codec.
Original Rationale (Score:5, Informative)
Re:Original Rationale (Score:5, Informative)
Re:Original Rationale (Score:5, Informative)
Re: (Score:3, Informative)
(Stating the obvious for those with sufficiently low UIDs and/or those who remember VAXen, or similar, or at least those with a proper beard...)
that is basically it. Speex is built (as I understand it) for lossless transmission methods with l
Re:Original Rationale (Score:5, Informative)
for UDP in packet FEC data is useless and your error correction scheme needs to be prepared to deal with losing a whole packets worth of data to be useful. For voice this is going to introduce too much latency so instead a typical codec might just try to interpolate the lost data. With radio on the other hand there is value to in packet error correction bits within the stream and in the event of an error you are going to have more data with which to guess what the audio should be like, especially if you know which bits are errored (or possibly errored)
Re: (Score:2)
With radio on the other hand there is value to in packet error correction bits within the stream and in the event of an error you are going to have more data with which to guess what the audio should be like, especially if you know which bits are errored (or possibly errored)
But wouldn't the underlying link just automatically FEC the packets at a lower layer, even if only to get the packet drop rate down?
Re: (Score:3, Informative)
This works on 51-byte frames.
Re:Original Rationale (Score:5, Informative)
The fundamental difference is not that much the lossless vs lossy transmission, but the actual bit-rate. I designed Speex with a "sweet spot" around 16 kb/s, whereas David designed codec for a sweet spot around 2.4 kb/s. Speex does have a 2.4 kb/s mode, but the quality isn't even close to what David was able to achieve with codec2.
Re:Original Rationale (Score:4, Informative)
If you've ever heard AMBE in the presence of bit errors, it doesn't do so well either. It isn't the vocoder's job to deal with bit errors, it is the protocol's job. Over half the bits in a APCO-25 voice frame are forward error correction for the voice payload: Golay encoding, Reed-Solomon, bit order scrambling (interleaving), you name it.
Putting resistance to bit errors in the codec is the wrong place to do it.
Now, making the codec use less bits, so the protocol layer has more bits for FEC makes sense.
Re: (Score:3, Interesting)
Looks really cool. I haven't messed around with D-STAR since I don't like the idea of being tied into a specific system (seems to contravene the point of amateur radio). I'll definitely be keeping an eye on this to see where it heads.
I had a really awesome idea just now for transmitting this at 1200bps using AFSK Bell 202 (like APRS) and hacking up live voice using entirely existing equipment (TNCs, etc). But the given example of 1050 bytes/3.75s works out by my math to 2240bps. I guess you could run it ove
Re:Original Rationale (Score:5, Informative)
Re:Original Rationale (Score:4, Informative)
You could just about squeeze it into 2400bps. It would probably be possible to get that out of existing AFSK modems without needing to go down the route of discriminator taps and such. Using a hardware GMSK modem like the FX589 chip would give you 9600 baud with the option of interoperating with existing D-Star modems, and interfacing an FX589 is going to be easier to implement than a G3RUH modem.
Re: (Score:3, Interesting)
By the way, look over his web site rowetel.com for the other work he's done: two really nice Open Hardware projects - a PBX and a mesh telephony device, an Open Source echo canceler for digital telephony, used in Asterisk and elsewhere, and his own electric car conversion.
I've got one of his little ip01 telephony boxes, and it is quite fantastic - a tiny, cheap, fanless, (embedded) Linux computer with plenty of memory and CPU grunt, and of course telephony hardware on board. It also has a package manager, with a quite a few pieces of software available, and regular firmware updates. It's much more powerful than the various Linux-based consumer routers that are available - it's a great option if you're looking for a small Linux server to run Asterisk, a little web site, DNS s
Re: (Score:2)
Re:Original Rationale (Score:5, Informative)
So far, we've really only compared it to g.729, and it does OK against that. CELT starts at 32 kilobits per second and we're at 2 kilobits, so it's not really for the same application. But I noticed that the Alpha, all-floating-point implementation with some known low-performance code encoded the 3.75 seconds in 0.06 seconds, and decoded them in 0.04, on my 2.4 GHz processor. I would think that a polished implementation could achieve low delay on a DSP chip or some flavors of embedded CPU.
Re: (Score:2)
Re: (Score:2)
Being able to use the reference implementation certainly is convenient and timesaving; bu
Re: (Score:3, Insightful)
Somebody goes to the trouble of designing a novel, patent unencumbered(ie. if you don't like the software licence, you are perfectly free to write your own implementation), codec that fits an otherwise rather underserved niche. They have the temerity to release it under a license requiring you to release your modifications to their code if you distr
Re: (Score:2)
Software license can often be negotiated: Authors may be willing to relicense (or add another one), if given a well presented and compelling argument for the change. This has happened before -- picking the right license can be difficult and it's possible the original authors did not think of all scenarios.
Speaking of "a well presented and compelling argument": you, sir, did not make one.
Re: (Score:2)
Why don't you just use defines in a header as a language shim so you can link against the unmodified library? Or work with the rights holders to contribute that shim or symbol aliases to the project?
Or you could just cough up the bux for a proprietary library if the license on this one offends you so.
Err Speex (Score:2, Informative)
Speex developers are involved (Score:4, Informative)
Re:Err Speex (Score:4, Informative)
Speex isn't great in this application, because at low bitrates there is a significant delay through the codec and the output stream requires far too much bandwidth to be useful. Consider that digital speech systems like Mototrbo, TETRA, P25 and Iridium typically have less than 6kbps throughput once you've taken FEC into account.
Re: (Score:2)
Sure, Speex does 2 kbps, but if you compare that to codec2, there's a hell of a difference. The 2 kbps Speex mode is something I put together quickly -- mainly to encode comfort noise at low rate. On the other hand, David put a lot of effort into codec2 and it actually sounds decent for voice at that rate (IMO better than Speex sounds at 4 kb/s).
what about LATENCY? (Score:4, Interesting)
i assume it's acceptable... but it angers me that someone thought it was relevant to give the exact number of bytes for a seemingly arbitrary 3.5 seconds of audio, but failed to say how long it take to encode that 3.5 seconds of audio, or what average latency can be expected after buffer conditions are met.
Re:what about LATENCY? (Score:5, Informative)
Re: (Score:3, Informative)
I think he probably means it in a 'how many samples does the codec need before it can send a packet' type of latency.
Re:what about LATENCY? (Score:5, Informative)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2)
Re: (Score:2, Insightful)
Well, the source is right there on the webpage. Why don't you download & compile it, and see for yourself? It's an alpha release so I'll guess that it's slower than it could be.
Re: (Score:2, Interesting)
also, considering the advantages of using lower bitrate voice codecs, the ability to implement the encoder and decoder algorithms directly in very low transistor count custom hardware would appeal to the same crowd... so not just latency in terms of x86 instructions per second, but the ability to implement those instru
Re: (Score:2)
Re: (Score:2, Interesting)
why not refine the a DSP chip architecture until it works well with the original codec? i know masks are expensive... but why not do it all the way?
Re: (Score:3, Informative)
If, for some reason latency is an issue when it's first shoehorned into a DSP chip, Codec2 will be refined until it works well on a DSP chip, in real real time.
I think you are not using the definition of latency that most in the field would use.
Latency is how long it takes to process the data. Its a computer science type of thing. If you understand Knuth and his tape drive sorting examples, this is pretty obvious...
For example, heres a nice, simple, hopelessly useless codec that has almost exactly 100 ms of latency:
1) Get yerself a buffer that holds 1000 samples.
2) Run a A/D converter at 10Ksamples/sec until the buffer is full.
3) Run "gzip" on the 1000 sample bu
Really early latency figures (Score:5, Informative)
It encoded those 3.75 seconds in 0.06 seconds and decoded in 0.04 seconds on my AMD Phenom 9750 2.4 GHz, one core only, compiled with GCC and the -O3 switch. That's all of the overhead of the program starting and exiting, too. It's using floating, not fixed point.
This, it seems, bodes well for low latency of the final implementation on a DSP chip.
Re: (Score:2)
Thanks for promoting this, it's a fascinating project.
Re: (Score:2)
Bruce, you've replied to this question several time, but you are not understanding the question. Almost every encoder buffers some data then compresses it. Generally, the larger the buffer, the better the compression, but the greater the delay between starting to put audio into the encoder and starting to get audio out. The same thing happens at the decoder end. The question is how much (in terms of milliseconds of audio) does the encoder need to buffer before it starts compressing and how much does the
Re: (Score:2)
I don't know why you people keep badgering Bruce about this, when I could figure out the answers to all that within minutes of looking at the linked site. How about going and reading for yourself?
Re: (Score:2)
I don't know why you people keep badgering Bruce about this, when I could figure out the answers to all that within minutes of looking at the linked site. How about going and reading for yourself?
Because almost all codecs have a certain inherent fixed latency. And its by far the most important figure of merit in the real world. And no one wants to discuss it, therefore it must be horrifically bad.
Number one priority for codec designer is always will it fit in the available B/W goal. This is a simple T/F Y/N 1/0 either it fits or it doesn't.
Number two priority is minimum inherent codec latency. Humans don't talk so well above 100 ms or so (debatable). That doesn't mean you get 100 ms to blow in
Re: (Score:2)
Codec2 Web Page Says 20ms samples (Score:2)
20ms samples, 51 bits, 2550 bits/sec. Are you sure about the 40 frames/sec vs. 50? Maybe he's doing that to get it under 2400 bps?
Re: (Score:2)
And no one wants to discuss it, therefore it must be horrifically bad.
It is discussed on the damn site which you could just click on, is what I am saying.
Re: (Score:2)
Re:Really early latency figures (Score:4, Informative)
Re: (Score:2)
Re: (Score:2)
Keep in mind, this is alpha code that hasn't yet been converted to fixed point. The final performance is just a guess at this point. The intrinsic latency will be 25 milliseconds due to the frame size.
To put that 25 milliseconds into perspective, I've found that most people won't even perceive it if I drop 25 milliseconds out of an audio stream.
People who do have a use for it would probably be much better at judging what level of performance is acceptable. People with no use for it have no feel for the trad
Re: (Score:2)
Yes, that is the main tradeoff, but the threshold moves a LOT depending on the conditions and requirements.
There are, however more tradeoffs than you have thought of which depend on how you define "quality". For example, in emergency communication fidelity is practically unimportant but intelligibility is essential. For a lot of music where nobody can understand the singer anyway, intelligibility doesn't matter as much as fidelity. For basic communication, everything that is not the voice is "noise" and we'
Re: (Score:2)
Don't worry. The frame size is 20 ms and there's probably (haven't looked at that detail) around 10 ms of look-ahead, so latency shouldn't be an issue. I'd actually argue that it could be increased *if* there's a way to reduce the bit-rate by doing that.
Serindipidy. (Score:3, Interesting)
Re:Serindipidy. (Score:5, Informative)
Congratulations on the license, OM. We haven't yet explored how to wedge this into D-STAR, but sending it as data rather than voice would be one way. All of the D-STAR radios except the latest one, the IC-92AD, use a plug-in daughter board to hold the AMBE chip, and it might be that somebody could make a dual-chip version of this board sometime. Since AMBE is proprietary we are stuck using their chip if we want to be compatible, unless the repeater does the conversion for us using a DV-Dongle. They sell TI DSP chips with their program burned in, and don't give out the algorithm.
It may be that on D-STAR the AMBE chip also does the modulation for a data transmission, just doesn't run the codec. But the modulation is known and there is a sound-card software implementation of D-STAR that interoperates with it. I don't have any D-STAR equipment to test. The folks on dstar_development@yahoogroups.com know a lot more about D-STAR.
73
K6BP
Re: (Score:2)
Why does a repeater need to understand the encoding? Can't it just rebroadcast the data, or even the analogue signal?
Re:Serindipidy. (Score:5, Informative)
The repeater can rebroadcast the data, but that data would be AMBE encoded, and AMBE is both trade-secret in its implementation and patented in some of its algorithms. There may be an AMBE chip in the repeater, I've not played with one. The usual way one converts to and from AMBE on a PC is with a device called the DV-Dongle, which contains the AMBE chip. This costs lots of money and is not nearly so powerful as the CPU of the computer it's plugged into, which is one reason to be fed up with proprietary codecs.
So, if you had some newer, Codec2-based radios, and some older D-STAR radios, linking repeaters might be a good way to get them to talk to each other.
This is hand-waving about a lot of issues, like we've not designed the next generation of data radio to put Codec2 into. One might guess that such a thing could use IPV6, and better modulation than just FM, and FEC, etc.
Re:Serindipidy. (Score:5, Insightful)
Congratulations on your new license!
The proprietary AMBE codec bothers me, too. I think that a closed, license-encumbered, proprietary codec is entirely inappropriate for ham radio use.
Great news (Score:2, Informative)
>3.75 seconds of clear speech in 1050 bytes
That's 2240 bps, 2.19 kbps, quite impressive. Maybe one day they can beat MELP (up to 600bps) and remain open.
Excellent work.
Re:Great news (Score:5, Interesting)
Re: (Score:2)
Re: (Score:2)
Hi Bruce,
Just a minor correction, the frame size is 20 millisecond, not 20 microsecond :-). As for VQ, the concept is not that hard really. Of course, as for many things, the devil's in the details, many of which I got wrong in the Speex LSP VQ anyway.
Re: (Score:2)
Thankyou! (Score:2)
Re:Thankyou! (Score:4, Informative)
Re: (Score:2)
Re: (Score:2)
Awesome (Score:2)
Packet loss? (Score:4, Interesting)
I didn't see it mentioned when quickly scanning TFA, but how does this codec handle packet loss?
It is all nice and well to develop a codec to cram as much speech as possible in as few bits as possible, but in this case, one lost packet could mean a gap of several seconds. The success of a low-bandwidth codec, at least when it comes to IP telephony, also depends on how well it can handle lost packets. Low bandwidth codecs are usually used in low bandwidth networks, such as the internet, and there the packetloss is the highest.
Same goes for delay and jitter, by the way. If a stream of packets is delayed, and more voice is crammed in fewer bits, then the delays in the voice stream will get longer too.
Re:Packet loss? (Score:5, Informative)
We don't know yet, but I don't see how it could be worse than AMBE in D-STAR, which makes various eructions when faced with large packet loss. I did various sorts of bit-error injection inadvertently while debugging yesterday, and right now you still get comprehensible voice with significant corruption of the LSP data. This, IMO, indicates an opportunity for more compression. Handling the problems of the radio link is more a problem for forward error correction, etc.
Re: (Score:2)
Re: (Score:2)
English only ? (Score:5, Interesting)
Re:English only ? (Score:5, Interesting)
Re: (Score:2)
Re: (Score:2)
The languages you mentioned don't really use much different sounds. If you want a real test try the clicking sounds in Zulu, Xhosa etc.
Re: (Score:2)
Actually, this is not low enough for language to really have an effect other than tonal vs non-tonal languages. As long as you "train" quantizers with multiple languages you're fine. I would not expect language-dependencies to actually kick in until you hit something like 100 bps or below (i.e. when you need to do speech-to-text in the "encoder" and text-to-speech in the decoder).
Re: (Score:2)
Spanish is spoken so quickly, compressing it is like trying to make an MP3 smaller by zipping it--it just won't work. French, though, with all its mushy pronunciation, compresses very well, like how a blurry image responds well to JPG encoding.
Merry "Kurisumasu" (Score:2)
the vocal tract is more or less the same for every human
Different languages use different parts of the vocal tract. If a language distorts clicks, it won't pass Zulu or the Bushmen languages. Languages also make different distinctions on the parts of the vocal tract they do use. If a codec distorts pitches, it won't pass intelligible Cantonese, Yoruba, or Mandarin.
There were differences in the range of sounds used by every language, but today that is not the case, thanks to communications advances. E.g Japanese people had incorporated external language sounds and it is not alien anymore.
The only foreign sounds that have been fully assimilated into the phonology of Japanese are the 'y' compounds (e.g. "kyo", "hya", "chu" (phonemically "tyu")), borrowed a long time ago from a Chinese la
Mumble integration ? (Score:4, Interesting)
One of the fastest ways to ensure its testing and distribution is to use it in Mumble - the low latency voice chat software (with an iPhone client as well).
Mumble is typically used by gaming clans for their chat rooms and it Codec2 would be tested in real-life conditions.
Re: (Score:2)
Re:Mumble integration ? (Score:5, Interesting)
Is there an existing Mumble developer whom we could get interested in this? It might be that we should take some of the Alpha-isms out of the code first.
Re:Mumble integration ? (Score:5, Insightful)
These are the stories I used to enjoy. I don't realy understand them, but they make a good read.
Impressive. (Score:2)
Re: (Score:2)
Really? It sounds quite pronounced to me. It's still very impressive, but it's not magic.
Re:Impressive. (Score:4, Informative)
The DSP Innovations [dspini.com] codec manages decent speech quality at 600bps, god knows how (proprietary closed source). I think this the state-of-the-art in low bitrate codecs just now.
Re: (Score:2)
Jump on quick - this could be the next twitter! (Score:3, Funny)
Who wants to be the first to make a web service based on this codec and 3.75-second messages? :-)
Re: (Score:2)
At 4.5 letters per word, a text can hold about 29 non-abbreviated words. You'd have to speak at 464 words per minute to do that in 3.75 seconds. The world record is 595 wpm. Normal reading comprehension is in the range of 200-300 wpm.
However, let's look at this from a different perspective.
A non-abbreviated text message is about 4.8 bytes per word (Bpw?). At, say, 200 wpm speaking, this codec comes out to about 84 Bpw.
Honestly, a 17x difference to go to audio is remarkable. Text is probably the most co
Its a great start but not usable yet! (Score:3, Informative)
Re:Interactive communication? (Score:4, Informative)
It is a real-time codec on my workstation and is intended to be a real-time codec on embedded DSP. It's currently all floating point and does things it should not like malloc of multiple buffers per sample
Download the code and build it. It's "just type make" on Linux. The raw (uncompressed) sample format we've used for testing is 16-bit samples at 8 KHz and there are some tools to play those, and some pre-recorded samples. Not too much trouble to figure out.
Slashdot Summary Problem, not Codec Problem (Score:2)
I had the same reaction, given the slashdot summary, but if you read the actual web page it's 20ms samples. You still have the problem of how to wrap it in IP packets, if you're going to do that, which gets much more annoying on low bit rate codecs. Take your 51-bit sample, pad it to 7 bytes, add 20 bytes of UDP RTP headers, 20 bytes of IP headers, maybe some IPSEC for fun, etc., maybe some Ethernet headers.... Obviously if you're actually trying to run over a slow transmission system, you're more lik
Re: (Score:2)
Re: (Score:2)
Re: (Score:3, Interesting)
But there it's called AMBE, not Codec. Codec2 is a bad name for a codec for the same reasons that Variable2 is a bad name for a variable. If this is supposed to supplant AMBE, why not AMBE2 or S(uper)AMBE?
Re: (Score:2)