Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Music Media

Audio Compression Primer 236

Hack Jandy writes "For those of you with a little extra time this afternoon, check out Sudhian's primer to all things concerning audio compression. The article details everything from DRM to CRC matrixes (with a healthy dosage of Ogg)."
This discussion has been archived. No new comments can be posted.

Audio Compression Primer

Comments Filter:
  • by tsanth ( 619234 ) on Thursday January 13, 2005 @03:43PM (#11351616)
    Given the topics in the audio section [sudhian.com] (it has an audio section!), the site seems to lean more towards audiophiles.

    I don't agree with the dismissal of lossy algorithms either, but I think it makes sense given the context.
  • Re:Is FLAC worth it? (Score:2, Informative)

    by jasoncc ( 754385 ) on Thursday January 13, 2005 @03:49PM (#11351681)
    I use FLAC because converting from a lossy format to another lossy format can produce crappy results. If I choose a lossy format for all my audio and then I need the audio to be in some other lossy format, I might be screwed.

    You might choose Ogg for your audio then sometime in the future, a new lossy format sweeps the industry. Your Ogg files might not convert well to the new format.

    and besides...Disk is Cheap!
  • by demonbug ( 309515 ) on Thursday January 13, 2005 @03:49PM (#11351686) Journal
    Trying to transmit audio data with uncompressed audio or video is not the easiest task. After all, even an audio CD contains data that transmits at 1400kb/s



    Shouldn't that be 1200 kb/s? 150 KB/s * 8 = 1200 kb/s, right? Or is the 150 KB/s figure I'm using incorrect (I could have sworn that was the 1x CD speed)?

  • AAC (Score:4, Informative)

    by sometwo ( 53041 ) on Thursday January 13, 2005 @03:49PM (#11351688)
    So what about AAC used by Apple in their music store?

    I did a little googling and found this (http://www.teamcombooks.com/mp3handbook/13.htm [teamcombooks.com]):
    AAC (Advanced Audio Coding) is not a MPEG layer, although it is based on a psycho-acoustic model. Sometimes referred to as MP4, AAC provides significantly better quality at lower bit-rates than MP3. AAC was developed under MPEG-2 and also exists under MPEG-4.


    AAC supports a wider range of sampling rates (from 8 kHz to 96 kHz) and up to 48 audio channels, plus up to 15 auxiliary low frequency enhancement channels and up to 15 embedded data streams. AAC works at bit rates from 8 kbps for mono speech and up to in excess of 320 kbps for high-quality audio. Three profiles of AAC provide varying levels of complexity and scalability.

    AAC software is much more expensive to license than MP3 because the companies that hold related patents decided to keep a tighter reign on it. Most AAC software is geared towards professional applications and secure music distribution systems, so it may be a while before you see AAC in consumer-oriented products.
  • by parvenu74 ( 310712 ) on Thursday January 13, 2005 @03:50PM (#11351696)
    Because the code is open source, FLAC will be around forever and available on whatever OS/Platform you want to use it on if you feel like compiling the software.

    Another reason it's going to be around and much more prevalent as time goes on is that the compression is so good and the speed/resource usage figures are so attractive. When I rip CD's to FLAC I am limited to 40x by my burner (CPU utilization is around 20-25%). When I rip the same CD to ogg, I top out under 30X because the processor has reached 100% utilization.

    Fast. Free. Efficient. Frugal with the CPU. What else do you need?
  • by stratjakt ( 596332 ) on Thursday January 13, 2005 @03:53PM (#11351728) Journal
    441000hz*16bits*2 channels = 1411200 bits per second, 1400 kb/s

    The 150KB number is for CD-ROM data storage, the gap between the two data rates is for the extra error detection and correction.
  • by wfberg ( 24378 ) on Thursday January 13, 2005 @03:54PM (#11351739)
    FM Radio is far from CD quality hence there isnt really a need to use very high bitrate MP3s or whatever

    Or consider this; since FM radio has a limited range of frequencies that come across well, songs that are intended to be widely played on FM radio (e.g. Britney Spear's latest "hit" song) are actually engineered to sound best in those frequencies. With the end result that when you hear Britney Spears on the radio, the track sounds just like it does on the CD.

    Meanwhile, quality music, lovingly mixed onto CD by people who actually give a damn, sounds like crap on the radio..

    In other words; if you can't hear the difference between 128kbps and higher, it might just be that you're listening to mass produced music.

    As for musicians preferring 128kbps? Well, sound engineers usually don't sit on stage with zillion Watt speakers right next to their fragile precious ears for a reason..

    Me, I have crap taste in music AND I'm tonedeaf, so whatever, 128kbps all the way! ;-)

    (MPEG artifacts in video drive me nuts, though)
  • by stratjakt ( 596332 ) on Thursday January 13, 2005 @03:55PM (#11351751) Journal
    Err, that would be error codes and positional information.

    There's even a little more room, in the subcode channels where one can hide the data for CD+G (karaoke) or CD-TEXT.
  • more algorithms (Score:5, Informative)

    by barik ( 160226 ) on Thursday January 13, 2005 @03:59PM (#11351802) Homepage
    While the article is a primer, I was a little disappointed in the algorithmic treatment given in the article itself. Right now I know of two excellent free publications: Introduction to Sound Processing [mondo-estremo.com] and The Sounding Object [mondo-estremo.com], which both treat the theoretical, DSP side of things. Any other resources that Slashdot readers can recommend for those who are interested in the subject of audio compression and representation?
  • by stratjakt ( 596332 ) on Thursday January 13, 2005 @04:04PM (#11351867) Journal
    If it's lossless, you should be able to take digital file A, compress it into compressed file B, and then if you uncompress B to get A', then A' = A.

    That is, the checksums for A and A' should match, etc.

    That's how I define mathematically lossless.

    Whatever this asshat is on about double blind and testing and all that, has more to do with the ability of his FLAC playing equipment to sound the same as his CD player, which is a whole 'nother ball of wax altogether.

  • by Piquan ( 49943 ) on Thursday January 13, 2005 @04:09PM (#11351939)

    Shouldn't that be 1200 kb/s? 150 KB/s * 8 = 1200 kb/s, right? Or is the 150 KB/s figure I'm using incorrect (I could have sworn that was the 1x CD speed)?

    Data CDs are 150 KB/s at 1x, but you're missing an important difference between data and audio CDs.

    CD sectors are 2352 bytes (I'm ignoring subchannels here). Data CDs have 2048 data bytes, plus 304 bytes of error-correction data, so every bit comes off perfectly. Audio CDs have no error correction, so they use all 2352 bytes for audio data (on the assumption that a few bits missed won't hurt). That means that audio data is moved 14.8% faster (in b/s) than 9660 data. 1200*1.148 = 1378.

    Another calculation you can use instead: 44100 samples/sec * 2 channels/sample * 16 bits/channel = 1411200 bits/sec, or 1378 K/s.

  • by Sebastopol ( 189276 ) on Thursday January 13, 2005 @04:14PM (#11352018) Homepage
    Yes, I noticed the article is 3 PAGES LONG! It makes only passing reference to other codecs. Not much of a primer, and it didn't take the entire afternoon to read, it to 5 minutes.

    Did I miss a crucial link or something?
  • Re:One sad bit.. (Score:1, Informative)

    by Anonymous Coward on Thursday January 13, 2005 @04:16PM (#11352041)
    Vorbis decoder is and has been done for a long time. Like other codecs, tweaks can always be made to the encoder to produce better results by using different psychoacoustic models, etc. As long as the output still follows spec, the decoder will still decode just fine. This is why your crappy MP3's from 1997 still play today, and fancy MP3's from today will still play on those old sound players from 1997. As long as the encoder follows spec, the decoder will always be able to decode it properly.
  • by pthisis ( 27352 ) on Thursday January 13, 2005 @04:21PM (#11352119) Homepage Journal
    especially when listening to music on hi-quality speakers a la Bose

    Bose is doesn't make high-quality speakers, they make expensive speakers that don't perform nearly as well as alternatives (for instance, the Acoustimass satellites use crappy paper cones that perform poorly in the upper frequencies). A $300 pair of B&W DM302's will thrash anything Bose makes soundly for sound quality. Also investigate Hale, Thiel, or Paradigm. If you really want to spend thousands, spend it on Magnepan (Magneplanar 1.6Q) or Vandersteen (2ce signature) or the higher end speakers from the companies I already mentioned. But those DM302's are good enough to be highly rated by places like Stereophile magazine and they're an incredible deal.

    If you really want a bunch of little satellite speakers, Energy makes a much better sounding (and somewhat cheaper) system like that. I hear from people I trust that Tannoy makes an incredible one as well, but I haven't heard it.
  • by cogito ergo blog ( 830437 ) on Thursday January 13, 2005 @04:27PM (#11352182)
    (Mod to -3, nitpicking)

    The MDCT in itself is actually lossless. Any distortion you notice is most likely introduced by the quantization applied post MDCT during compression.
  • by wowbagger ( 69688 ) on Thursday January 13, 2005 @04:36PM (#11352338) Homepage Journal
    According to the "Nyquist Theorem," you need to have twice as many digital samples as the frequency of the analog signal you are trying to represent to have enough data to accurately build it.


    WRONG!

    Nyquist's criterion is "You must have at least twice as many samples as the largest BANDWIDTH of the signal in order to correctly reconstruct it."

    You can take a 10.7 MHz signal, and sample it at 10000 samples per second, and correctly reconstruct it, so long as the signal is guaranteed to be bandwidth limited to 10.7 MHz +/- 2.5 kHz. This is often done in software defined radio to aquire the signal from the intermediate frequency (IF) of the analog front end.

    You also have to have an appropriate reconstruction filter at the output of the system in order to correctly recover the signal - if you don't have the right reconstruction filter, you will NOT reconstruct the signal correctly.

    You also have to take into account the effects of any signal modulation - take a 20 kHz sine wave, and burst it for 10 msec, and you widen the bandwidth of the signal by about 100 Hz (depending upon the exact shape of the burst - a perfect square burst will widen the signal as a sinc function and will, in effect, increase the bandwidth to infinity, which is why square bursts are generally Considered Harmful in communications work).

    Also, you don't oversample a signal in time to account for "rounding errors" - you oversample in time because the frequency response of sampling a system in time introduces a sinc response in frequency - by moving the sampling rate up you reduce the impact of this response on the recovered signal's frequency response. You also greately ease the requirements on the reconstruction filter - the filter can be wider (have fewer poles in the transfer function - thus fewer parts needed).
  • by me at werk ( 836328 ) on Thursday January 13, 2005 @05:41PM (#11352776) Homepage Journal
    From Apple - iPod - Technical Specifications [apple.com]:
    • Audio formats supported: AAC (16 to 320 Kbps), MP3 (32 to 320 Kbps), MP3 VBR, Audible, AIFF, Apple Lossless and WAV
    • Upgradable firmware enables support for future audio formats
    The second bullet leaving the possibility there, but the page lists it as currently (meaning iPod users now, popularity etc) not supporting it.
  • "VBR" 320kbps (Score:3, Informative)

    by silverfuck ( 743326 ) <dan@farmer.gmail@com> on Thursday January 13, 2005 @05:42PM (#11352785) Homepage
    I know that even large radio stations use 128Kbit sampling frequency.

    Sampling frequency would typically be 44.1KHz, bitrate would be 128kbps. Also, FM radio quality (with good reception) compares to about 96kbps well-encoded mp3, so there's not much point in them recording higher except for archival purposes.

    I have switched from 128K to VBR 320K

    You should be using LAME to encode, and LAME only goes up to 320kbps (blade for instance goes up to 384kbps, but is much lower quality), ergo you can only have 320kbps CBR, not VBR.

    And to everybody else out there who complains about background noise, you should be extracting digitally from the CD!

    flac doesn't seem to have come far enough yet for me (500+ albums is a lot of diskspace if it's around 300MB/album), but to my ears on my equipment (Klipsch £250 (pound sterling if that doesn't come out) speakers, cheapo SB Audigy2 soundcard), lame --preset standard (around 200kbps VBR) sounds damn near perceptual transparency.

  • Re:more algorithms (Score:3, Informative)

    by Hal-9001 ( 43188 ) on Thursday January 13, 2005 @06:28PM (#11353242) Homepage Journal
    Any other resources that Slashdot readers can recommend for those who are interested in the subject of audio compression and representation?
    • An older but good technical survey of digital audio compression, including MP3, is Davis Yen Pan, "Digital Audio Compression," Digital Technical Journal (Spring 1993). (PDF [iocon.com])
    • Some other technical reference material on MP3 is also available on the Digital Audio Systems website. [iocon.com]
    • A more recent survey of perceptual coding of audio, which covers more recent formats like AAC, is Painter and Spanias, "Perceptual Coding of Digital Audio," Proc. IEEE (April 2000). (PDF [asu.edu])
    • Ogg Vorbis is documented on the Xiph.org website, but I found the documentation [xiph.org] to be lacking when read from a signal processing perspective. Christopher Montgomery provides a better description from that perspective in a Slashdot interview from 2000. [slashdot.org] I found another good description in this thread [hydrogenaudio.org] in the hydrogenaudio forums--it hyperlinks a good block diagram [port5.com] of the encoding process.
  • by Kiryat Malachi ( 177258 ) on Thursday January 13, 2005 @08:25PM (#11354489) Journal
    And you've just described "beating". Imagine that, instead of that 10k sine at 20khz sampling, you have a 9.99kHz sine at 20k sampling. The point on the waveform that you're sampling is going to slowly change from cycle to cycle, and you're going to wind up with a 9.99kHz sine wave amplitude modulating - "beating" - at 0.01kHz.

"Ninety percent of baseball is half mental." -- Yogi Berra

Working...