Codec2 — an Open Source, Low-Bandwidth Voice Codec 179
Bruce Perens writes "Codec2 is an Open Source digital voice codec for low-bandwidth applications, in its first Alpha release. Currently it can encode 3.75 seconds of clear speech in 1050 bytes, and there are opportunities to code in additional compression that will further reduce its bandwidth. The main developer is David Rowe, who also worked on Speex. Originally designed for Amateur Radio, both via sound-card software modems on HF radio and as an alternative to the proprietary voice codec presently used in D-STAR, the codec is probably also useful for telephony at a fraction of current bandwidths. The algorithm is based on papers from the 1980s, and is intended to be unencumbered by valid unexpired patent claims. The license is LGPL2. The project is seeking developers for testing in applications, algorithmic improvement, conversion to fixed-point, and coding to be more suitable
for embedded systems."
Presentation this week. (Score:5, Informative)
Original Rationale (Score:5, Informative)
Err Speex (Score:2, Informative)
Re:Interactive communication? (Score:4, Informative)
It is a real-time codec on my workstation and is intended to be a real-time codec on embedded DSP. It's currently all floating point and does things it should not like malloc of multiple buffers per sample
Download the code and build it. It's "just type make" on Linux. The raw (uncompressed) sample format we've used for testing is 16-bit samples at 8 KHz and there are some tools to play those, and some pre-recorded samples. Not too much trouble to figure out.
Re:what about LATENCY? (Score:5, Informative)
Re:Presentation this week. (Score:5, Informative)
Speex developers are involved (Score:4, Informative)
Re:Err Speex (Score:4, Informative)
Speex isn't great in this application, because at low bitrates there is a significant delay through the codec and the output stream requires far too much bandwidth to be useful. Consider that digital speech systems like Mototrbo, TETRA, P25 and Iridium typically have less than 6kbps throughput once you've taken FEC into account.
Great news (Score:2, Informative)
>3.75 seconds of clear speech in 1050 bytes
That's 2240 bps, 2.19 kbps, quite impressive. Maybe one day they can beat MELP (up to 600bps) and remain open.
Excellent work.
Re:Serindipidy. (Score:5, Informative)
Congratulations on the license, OM. We haven't yet explored how to wedge this into D-STAR, but sending it as data rather than voice would be one way. All of the D-STAR radios except the latest one, the IC-92AD, use a plug-in daughter board to hold the AMBE chip, and it might be that somebody could make a dual-chip version of this board sometime. Since AMBE is proprietary we are stuck using their chip if we want to be compatible, unless the repeater does the conversion for us using a DV-Dongle. They sell TI DSP chips with their program burned in, and don't give out the algorithm.
It may be that on D-STAR the AMBE chip also does the modulation for a data transmission, just doesn't run the codec. But the modulation is known and there is a sound-card software implementation of D-STAR that interoperates with it. I don't have any D-STAR equipment to test. The folks on dstar_development@yahoogroups.com know a lot more about D-STAR.
73
K6BP
Re:Original Rationale (Score:5, Informative)
Re:Original Rationale (Score:5, Informative)
Re:Serindipidy. (Score:5, Informative)
The repeater can rebroadcast the data, but that data would be AMBE encoded, and AMBE is both trade-secret in its implementation and patented in some of its algorithms. There may be an AMBE chip in the repeater, I've not played with one. The usual way one converts to and from AMBE on a PC is with a device called the DV-Dongle, which contains the AMBE chip. This costs lots of money and is not nearly so powerful as the CPU of the computer it's plugged into, which is one reason to be fed up with proprietary codecs.
So, if you had some newer, Codec2-based radios, and some older D-STAR radios, linking repeaters might be a good way to get them to talk to each other.
This is hand-waving about a lot of issues, like we've not designed the next generation of data radio to put Codec2 into. One might guess that such a thing could use IPV6, and better modulation than just FM, and FEC, etc.
Re:Thankyou! (Score:4, Informative)
Really early latency figures (Score:5, Informative)
It encoded those 3.75 seconds in 0.06 seconds and decoded in 0.04 seconds on my AMD Phenom 9750 2.4 GHz, one core only, compiled with GCC and the -O3 switch. That's all of the overhead of the program starting and exiting, too. It's using floating, not fixed point.
This, it seems, bodes well for low latency of the final implementation on a DSP chip.
Re:Original Rationale (Score:5, Informative)
Re:what about LATENCY? (Score:3, Informative)
I think he probably means it in a 'how many samples does the codec need before it can send a packet' type of latency.
Re:Packet loss? (Score:5, Informative)
We don't know yet, but I don't see how it could be worse than AMBE in D-STAR, which makes various eructions when faced with large packet loss. I did various sorts of bit-error injection inadvertently while debugging yesterday, and right now you still get comprehensible voice with significant corruption of the LSP data. This, IMO, indicates an opportunity for more compression. Handling the problems of the radio link is more a problem for forward error correction, etc.
Re:what about LATENCY? (Score:5, Informative)
Re:Original Rationale (Score:3, Informative)
(Stating the obvious for those with sufficiently low UIDs and/or those who remember VAXen, or similar, or at least those with a proper beard...)
that is basically it. Speex is built (as I understand it) for lossless transmission methods with little/no error correction needed. UDP [wikipedia.org] , by its very nature is a very lossy medium, so something with better error tolerance is needed. Hence, Codec2 provides a nice route.
(There. Extrapolated that for you. Doubly-so [wikipedia.org], perhaps.)
Re:Original Rationale (Score:5, Informative)
So far, we've really only compared it to g.729, and it does OK against that. CELT starts at 32 kilobits per second and we're at 2 kilobits, so it's not really for the same application. But I noticed that the Alpha, all-floating-point implementation with some known low-performance code encoded the 3.75 seconds in 0.06 seconds, and decoded them in 0.04, on my 2.4 GHz processor. I would think that a polished implementation could achieve low delay on a DSP chip or some flavors of embedded CPU.
Re:Original Rationale (Score:4, Informative)
You could just about squeeze it into 2400bps. It would probably be possible to get that out of existing AFSK modems without needing to go down the route of discriminator taps and such. Using a hardware GMSK modem like the FX589 chip would give you 9600 baud with the option of interoperating with existing D-Star modems, and interfacing an FX589 is going to be easier to implement than a G3RUH modem.
Re:Original Rationale (Score:5, Informative)
for UDP in packet FEC data is useless and your error correction scheme needs to be prepared to deal with losing a whole packets worth of data to be useful. For voice this is going to introduce too much latency so instead a typical codec might just try to interpolate the lost data. With radio on the other hand there is value to in packet error correction bits within the stream and in the event of an error you are going to have more data with which to guess what the audio should be like, especially if you know which bits are errored (or possibly errored)
Re:Really early latency figures (Score:4, Informative)
Re:Original Rationale (Score:5, Informative)
The fundamental difference is not that much the lossless vs lossy transmission, but the actual bit-rate. I designed Speex with a "sweet spot" around 16 kb/s, whereas David designed codec for a sweet spot around 2.4 kb/s. Speex does have a 2.4 kb/s mode, but the quality isn't even close to what David was able to achieve with codec2.
Re:Original Rationale (Score:4, Informative)
If you've ever heard AMBE in the presence of bit errors, it doesn't do so well either. It isn't the vocoder's job to deal with bit errors, it is the protocol's job. Over half the bits in a APCO-25 voice frame are forward error correction for the voice payload: Golay encoding, Reed-Solomon, bit order scrambling (interleaving), you name it.
Putting resistance to bit errors in the codec is the wrong place to do it.
Now, making the codec use less bits, so the protocol layer has more bits for FEC makes sense.
Re:Impressive. (Score:4, Informative)
The DSP Innovations [dspini.com] codec manages decent speech quality at 600bps, god knows how (proprietary closed source). I think this the state-of-the-art in low bitrate codecs just now.
Re:Original Rationale (Score:3, Informative)
This works on 51-byte frames.
Re:what about LATENCY? (Score:3, Informative)
If, for some reason latency is an issue when it's first shoehorned into a DSP chip, Codec2 will be refined until it works well on a DSP chip, in real real time.
I think you are not using the definition of latency that most in the field would use.
Latency is how long it takes to process the data. Its a computer science type of thing. If you understand Knuth and his tape drive sorting examples, this is pretty obvious...
For example, heres a nice, simple, hopelessly useless codec that has almost exactly 100 ms of latency:
1) Get yerself a buffer that holds 1000 samples.
2) Run a A/D converter at 10Ksamples/sec until the buffer is full.
3) Run "gzip" on the 1000 sample buffer, squishing it down to maybe 500 bytes. Optimistically.
4) send the 500 byte chunk to the other side (radio, internet, whatever)
5) Run "gunzip" on the hopefully unerrored compressed 500 bytes, expanding it back to 1000 raw samples.
6) Squirt yonder 1000 raw sample values out the D/A converter at 10Ksamples/sec
7) Pray ye get another packet of compressed voice data before the well runs dry. Or maybe listen to a bit of silence. Or play interpolation games.
Your argument is once steps 3 and 5 are quick enough, the codec latency will be zero. A fine bit of analysis, however, shows that the first sample to enter the buffer in step 2 cannot possibly be decompressed and played back, until the buffer fills, which takes... 100 ms aka 1/10 of a second. This is the latency we're talking about in voice codecs. Most are somewhat faster than 1/10 of a second.
It is quite possible to make a very efficient codec where a fireman would hit the PTT button, yack into the radio for a fifteen minutes, and then the entire message would be compressed down to maybe 1000 bits/sec theoretical average, then sent and played back. This would, of course, be completely useless for tactical public safety comms.
Its a great start but not usable yet! (Score:3, Informative)