Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Music Media

Visual Analysis Of Mp3 Encoders 127

Chris Johnson writes: "I've just finished an interesting scientific analysis of several mp3 encoders and have my findings up on the Web. The process involves differencing a 'sonogram' image from an encoded test signal with the image of the original signal, and then producing response curves showing the disparity in direct signal volume, and over time. Umm . . . which is just to say this is probably the most rigorous analysis of any encoders anywhere on the web, and very geeky (in a good way). LAME carries the day, but BladeEnc shows that it has a completely distinctive sonic approach- and Fraunhofer proves unacceptable (in the version I tested) for audiophile use, though it's unbeatable at very low bit rates. See why." Truth in advertising -- this is a cool example of how visual information can convey more than you'd expect it to.
This discussion has been archived. No new comments can be posted.

Visual Analysis of Mp3 Encoders

Comments Filter:
  • How does Ogg Vorbis hold up against these?
  • The human ear is tuned to be more receptive at the frequency ranges of the human voice.

    You can lose more detail at high and low frequencies without it having as noticable an effect on the sound as perceived by the listener.
  • by jmv ( 93421 ) on Saturday October 28, 2000 @09:24AM (#668926) Homepage
    If you are a big fan of classical you will have an opinion on _which_ parts of the sonic information are expendable

    No, when a certain frequency component is discarded, it's not because the listener won't mind, it's because even if it's there, the listener cannot hear it. If you can't hear a sound, why encode it? Now, there are sometimes problems with classical music, but that's because it's often hard to predict exactly what you can and can't hear.
  • Rather than sitting and listening to all the different encoder/decoder combinations, wouldn't you prefer to view some metric that you can evaluate at a glance?

    Besides, something that might show up in the visual depiction may be audible, but not necessarily obvious the first time you listen. It's kinda like when you're at the eye doctor and he's flipping through lenses: "Is this one better? How about this one? Is the first one better than the second one? First one? Second one?"

    You may not notice a visual problem with one of the lenses at first, but then after wearing them all day, you get a headache.
  • Well, You might not hear the difference, but others might...

    Mikael Jacobson
  • by John Whitley ( 6067 ) on Saturday October 28, 2000 @09:26AM (#668929) Homepage
    This sonogram analysis is quaint, but the author fails to grok the basics of psychoacoustic model based audio compression. The first rule is: you cannot measure the perceptual quality of the compressed audio via a raw distortion metric. Subtracting the original signal's sonogram from the compressed signal's sonogram is a distortion metric.

    That said, it is generally the case that "pre-echo is bad" and "over-ring is bad." Reducing these can be thought of as a good thing. Let's assume that for these encoders, pre-echo and over-ring are universally bad (I'll give an example where this isn't the case, below). Furthermore, this comparison actually says nothing about these encoders other than the pre-echo or over-ring. I.e. what happened to the sound that was the "same" on the sonogram? It is quite possible for an "encoder" to mangle the audio quality yet have a pristine sonogram by this test's standards.

    Just to throw a wrench in the works, more advanced encoders and/or psychoacoustic models can utilize what's called temporal masking. This is the ability of a higher-amplitude signal to mask (make inaudible) a lower-amplitude signal either before or after itself, as far as the human ear is concerned. Pre-echo is the phenomenon whereby a transient signal (i.e. a very 'sudden' attack, like a drum hit) is smeared in time. The audible effect can be most obnoxious. Yet encoders utilizing temporal masking will explicitly allow a certain amount of pre-echo through, as long as it is temporally masked. This leaves the encoder to spend those bits on other parts of the signal that would be more seriously degraded as far as our ear is concerned. In short, a sufficiently savvy encoder could exhibit more pre-echo than another worse-sounding encoder, especially if it uses temporal masking.

    Quantitative analysis for perceptual audio coding is not easy; this has been a grail for researchers in the field for years. I strongly suggest that interested parties dig into various IEEE and AES (Audio Engineering Society) journal papers on the subject, as well as various books, etc.


  • Another fun experiment is to do this same thing sonically (makes a little more sense) -- encode to mp3, convert back to wave, and then subtract the original from the encoded one. The resulting wave will have all of the bits which were discarded.

    It's difficult to interpret the results (I agree with those who say that this study is more or less worthless) but it does sound pretty neat. =)
  • mp3 is not popular because it saves hard drive space, it is popular because it saves internet bandwidth... (all those people using napster thru a modem)

  • While agreeing that for high quality audio one must "fuck mp3", I have to disagree with you that it will loose it's appeal.

    Right now, the attitude is "Why be able to store several hundred songs, when I can store several thousand..."

    In a couple of years, the numbers will change but the rationale will be the same. Why store ten thousand PCMS when I can have a hundred thousand??

    I agree at some point things will become meaningless, but there will have to be quite a major revolution first... Perhaps that infinite data storage by quantum methods. Perhaps I'm a bit too hesitant to rely overmuch on Moore's law.

    E
  • There's another dimension in audio that will eat up more hard disk space... As hard drives get larger, will the high end audio people still stick with 44 kHz stereo? I think not. As the capabilities of machines to handle much finer sampling rates increases, so will file size as well. As it is we've been seeing a lot about DVD quality audio, or the Sony system... Plus as things get better/faster/cheaper, I wonder if quadrophonic sound, or something else of that nature gives file size another doubling.

    Though speed and storage double easily, I've noticed, so do audio file sizes. There comes a point in the future where we are just not sure anymore, but I think at least for the forseeable future, audio compression will become more important, not less.

    E
  • Wait a minute...aren't "MP3 Encoders" suppose to produce .mp3 files? .ogg =/= .mp3 I don't know about the rest of you but my portable mp3 player can't read a .ogg file so whats the point of making them.
  • this is the problem when one looks at MP3 purely as a technology. if you want to boil it down to pure psycho-acoustics, of course selective discard is the ultimate goal.

    personally, i want to see mp3 music come as close to uncompressed music as possible. i want to encode my songs without tinnyness or that annoying "swoosh". to me, an effective method of sound compression has no compression artifacts and has an output exactly the same as the input.

    i think people who really listen to music should go for MP3's that SOUND good, not just look good in a white paper.
  • Give me some of what you are smoking, dude!

    MP3 distortions are very evident especially at 128kbps(so called CD quality) They become less evident the higher the bitrate, but even at 320kbps the distortions are still easily identified compared to the original CD.
  • and gogo took this even further..

    It took LAME's quality and then was optimized for speed...
  • by geirt ( 55254 ) on Saturday October 28, 2000 @10:29AM (#668938)

    The basic idea of mpeg is that the encoder removes the parts of the music which you (probably) can't hear. The encoder splits the sound into pieces, and rates each piece after how important it is for the total sound image. Then it starts with the most important sound and encodes that, and continuing with the less important parts until the available bit rate is reached (e.g 128kbit/s). The rest of the sound data is discarded.

    The tricky part is the calculation of the "importantness" of each sound, and that is what differentiates the encoders. This calculation is done with an algorithm called "a psycho acoustic model".

    To measure the quality of an mpeg encoder automatically, you need an algorithm which calculates the quality the the encoded signal. By knowing this algorithm it is trivial to create an encoder which will score maximum on this quality measurement, since the quality measurement algo is basically the same as the psychoacoustic model.

    This test is "snake oil", a real test of mpeg encoder unfortunately involves listening to the music to evaluate the psycho acoustic model of the encoder, and not comparing two artificially created psycho acoustic models with each other.

  • Not really being an audiophile I beg to differ. I got some tracks from the Lola Rennt film throught Napster, remembering that I enjoyed the soundtrack as much as the film. They sounded allright on my Aureal Vortex 2 soundcard and the cheapest model Rotel amplifier. Nonetheless, when I bought the CD, the difference was noticable. And we are talking about 192 Kbs MP3's. The clarity of CD's is far superior to MP3's.
  • I'm surprised that an audiophile would even consider listening to music decoded from MP3, even at high bitrates. Presumably an audiophile's ears are sensitive enough to justify purchasing that $$$$ worth of equipment as opposed to your average high-quality stereo set -- so if you're listening to MP3 - what's the point?

    Besides, as computers and networks become faster and storage cheaper and more compact, we're not too far from the point where non-lossy compression wil suffice, as far as downloading/storing music is concerned.

    I want my music in .gz format, not .mp3 !

    --lp

  • I'm not sure what you mean by "boil it down to pure psycho-acoustics". Psycho-acoustics is what MP3 is about. Trying to boil MP3 down to anything else is pointless.

    If you want your MP3 music to come out as closely to the original in a sonogram as is possible, you have not understood what MP3 is about. I think one would like to get an MP3 file that sounds the closest to the original. Visuals be damned.

  • by Evro ( 18923 )
    Somebody actually got the point I was trying to make.

    __________________________________________________ ___

  • *g* Thankee :)

    Giving this sort of thing to Slashdot is as fun as nude mudwrestling. Gotta love it. :)

  • I have no problem with what the author set out to prove. My problem is with his method. Comparing sonograms is not a good way to evaluate a psycho-acoustic model. The only proper way to evaluate one is to listen to it.

    That one of the sonograms seems to be closer to the original visually says nothing about how it will sound.

  • Um, the real reason is more humble than that.

    On the Mac, I would have to _pay_ to use the Xing encoder. I just got through a serious ramen-and-spaghettios period, and there's just no way I'm going to merrily throw money at people who not only support the mp3 licensing patentholders, but also make an encoder that is considered to be more prone to artifacts and ringiness than even the Fraunhofer high bit rate stuff.

    Beat me, whip me, slashdot me and call me unrigorous, but I'm not paying money for Xing. The lurkers support me in email. So there ;)

  • The mp3s I used to have up on mp3.com were Blade. That's because at the time, I hadn't located any other encoders that could be simply downloaded- it might be overlooked that for the most part these are _free_ _Mac_ encoders. When I used Blade, I was happy with the frequency response, but the pre-echo and weakness of transient attack always bothered me. I had certain music ("Koala", off the "anima" album) in which there were sounds (wood block combined with reggae rhythm guitar) that _severely_ failed to be reproduced by Blade in any sort of acceptable fashion- instead of going "klik!" the sound sorta went "whuf".

    I had to know why- no, scratch that, I knew why. I had to know which encoders did better- what they in turn traded off- and I had to know across a wide range of bit rates in a way I could quickly cross correlate.

    I've written for (IMNSHO) the foremost High End Audio journal. It's not that I'm not interested in listening to encoders! But if they are _all_ quite compromised, why not break 'em down into a series of measurements relative against each other with clearly identifiable characteristics? Shows you what to listen for- and tips you off to particular issues.

  • Remember that movie "The Fly", and the sequel "The Fly II [imdb.com]" that stared Bill Gates and Daphne Zuniga from Melrose Place? He developed a teleporter but a fly teleported with him and their bodies and genes got mixed up.

    This is like that. The original ASCII art [slashdot.org] was mixed with antimatter, in the lacking of " ".

    Leaving the disfigured creature you see before you.

  • by Chris Johnson ( 580 ) on Saturday October 28, 2000 @02:37PM (#668948) Homepage Journal
    I'd be hugely interested in that. I consider it very relevant. I'm doing all this on a Mac, and have tried repeatedly to compile Vorbis in any sort of way- one of the Ogg people did this at MacHack and has not made binaries available. If he had, Vorbis would be represented at every bit rate level. I am simply not coder enough to deal with porting Vorbis, even a cheap hack, and I wish I was. I've begged for Vorbis/Mac repeatedly, and finally I had to go on without it, as there were decisions I needed to make on what mp3 encoder to use for my stuff, and the whole project was to answer for me what was most appropriate for 128K-range and what was best at arbitrarily high bit rates.

    You can add me to that list- and such a comparison (I naturally kept a logbook to be able to reproduce the process later) would indeed be meaningful to me. For instance, if Vorbis was more sophisticated in its control of over-ring and either imposed a flatter characteristic (resisting resonant peaks) or went for an intentionally tailored characteristic (say, suppressing ring around 3-5K like Fraunhofer 32K bit rate) this would have obvious and interesting application to the sound quality. Conversely, if it had big ugly peaks and artifacts, their location in the frequency response would tell a lot about the sonic signature of the encoder.

  • I said Quadrophonic or equivalent. Surround sound is pseudo pentaphonic, because you have 5 speakers... How soon till people will want a different audio stream for each one?

    E
  • Sorry :)

    Doh! For years I've used a purely white background for airwindows.com, with a sort of vintage-cnet layout. I also used to keep a 'graphics' section in which I had some web background gifs I'd done. They were made like this:
    x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
    Do a diffusion dither between white and the lightest 'web safe' gray- then take all the pixels at x positions and knock 'em out to white too. The result (works with other colors as well) is a texture in which no two colored pixels are ever directly next to each other- it's a paperlike texture but never gets darker than half Netscape grey.

    Which is to say- sorry, I did it that way because I liked it, and I'll keep it. Honest, I have done everything I possibly could to avoid obscuring the text, but it's sort of like a trade-off: in getting rid of additional table clutter that I used to have, I found that I liked the pages when this simpler layout was backed by the softest texture I had, rather than plain white.

    I hope it didn't bother your eyes too much :)

  • Those Xs were supposed to look like this:

    x x x x x x x x x x x x x
    x x x x x x x x x x x x x
    x x x x x x x x x x x x x
    x x x x x x x x x x x x x
    x x x x x x x x x x x x x
    x x x x x x x x x x x x x
    x x x x x x x x x x x x x
    x x x x x x x x x x x x x

    Woops. Or I could have said 'checkerboard' and saved myself the hassle :)

    The idea is from company named Boxtop Software which produced a photoshop plugin that put different web safe colors in checkerboard patterns to produce a much greater range of 'web safe' colors (which look solid). I figured, why not run with that and do textures that way? Maybe the Gimp would benefit from some websafe checkerboard texture generators too :)

  • I'd be very curious to know if this can be done- and I seriously question if the resulting encoder would perform 'very well'. If you think this would cause the resulting wave to score perfectly you're... not correct: it's not possible for that to happen with such lossy compression, perceptual model or no.

    Actually, I think this would be a _very_ good experiment. I'm aware that my questioning some of these concepts is seen as prima facie evidence of being a tottering loony *g* but the whole concept of the psycho-acoustic model is so central to current audio theory... and this theory basically says, 'mp3s can be made to sound indistinguishable from CDs' and they cannot- the same theory on a broader level says 'CD itself is theoretically perfect sound', and it is not- mastering engineers, for instance, have learned that to do their work they need something better than CD audio.

    I'm not certain that the psychoacoustic model must necessarily be that much better than, for instance, trying to diffuse unavoidable error as evenly as possible over the frequency and time domains. You are essentially insisting that concentrating the error in particular areas that are said to be 'masked' is far superior. This assumes the masking is effective, and that there are no side effects- neither assumption is wholly true, as large numbers of people are able to find fault with (say) 128K mp3s, and any filtering is going to impose extraneous characteristics. Finally, you're assuming that given an encoder that does not have a psychoacoustic model (I assume this would mean one that diffuses error pretty uniformly) is going to perform 'very well' in the procedure I devised. I'm not sure of that- I'd like to try it experimentally before jumping to that conclusion.

    Finally, I have to admit- I haven't got the faintest idea what the resulting sonogram, and frequency/overring characteristics, would look like. I can say some things about it- with regard to the over-ring, diffusing it over a wider frequency range is not only desirable but markedly preferable. Fraunhofer loses badly to LAME, sonically, over just this issue- and Blade gets away with its severe over-ring by diffusing it over a wider frequency range. If the experimental psychoacoustic-model-less encoder showed significant improvements in diffusing out this over-ring and reducing its duration- there would be legitimate applications for its tonal characteristics, even if the raw frequency response was noticably compromised. It would be sort of like the 'anti-Blade'.

    I don't suppose anyone will actually _try_ it, much less help me out with measuring it :P but if anyone is genuinely interested in investigating this, drop me a line? It sounds like something that could be attempted. Seriously- the whole point of such a model is 'masked stuff can't be heard'. If people can hear the masked error anyhow, what is the point? And if you assume people who can't hear anyhow and won't notice, what's the difference? Is it so axiomatic that you have to shun diffusing error evenly, and instead concentrate it in areas you think won't be heard?

  • I didn't notice at first but you're proposing a mirror image of an exercise I'd dearly like to try, an idea that emerged from yet another slashdot audio argument :)

    You are talking about applying only the psychoacoustic model of the mp3 encoding, and producing a comparison of that with the original signal. I would indeed be really interested in seeing that- I'd like to know which of the various distortions, over-rings etc. arise from the psych model and which arise from the fractal part.

    In the argument (lower in the thread) I was questioning whether you could skip the psych model entirely (pretend people can hear the difference between 128K mp3 and real life ;P ) and see just what you'd get if you went purely with the fractal encoding- trying to diffuse any and all error in the process as evenly as possible over frequency and time.

    People will swear up and down that this will be drastically worse. I'd like to measure it in comparison with normal mp3 encoders and see exactly what it is, not just run around making theories that it's going to be awful. The one thing I'm willing to guess about it is that the sound will be the opposite of BladeEnc's sound. For some people that'll be bad- but the idea of an 'anti-Blade' might really interest others.

    I don't know if anybody's comfortable enough with hacking on a version of LAME or whatever that they'd be willing to try it- I am going to bounce the idea off Martin Hairer, with whom I worked to perfect the sonogram-plotting program (I needed to request better picture export capacities- he came through like a trouper and fixed everything). I think he is the one who ported LAME to his program, and he might be both able to try such experiments, and interested in seeing what they do.

    At any rate I wanted to say that your idea of isolating the transformations and considering them independently _is_ truly an interesting exercise- and I hope to be able to do such experiments, and learn from them, with a bit of work and patience :)

  • The point is it doesn't have the copyright problems that mp3 does.

    By that line of reasoning, we musta all been fucking morons for making mp3s several years ago, since our walkmen couldn't play them.

    ---
    I'm not ashamed. It's the computer age, nerds are in.
    They're still in, aren't they?
  • Actually, I'm pretty sure that the poster forgot entirely about lossless compression.
  • True dat. I'm a skimming fool. Oh well, I still wasn't too impressed with his analysis. Perhaps I was turned off by the large amount of text against a grey static background to read all of it. Note to self: it's never a good idea to skim an article linked on Slashdot, then post an opinion about it. :)
  • I wasn't trying to make a weak argument. When I go to encode something I am using a piece of music that already sounds good as the source, I know the quality is of a certain level. If I were to encode it and then check if it has encoded it closely using a visual process, I would know exactly where the sound quality deviated from the original non-encoded piece of music. That is what this person did. Hearing is not always the best judge. To create interference patterns and such and see tiny defects as compared to the original source, I find that a lot more precise. I don't care how much someone can say they hear a difference, I doubt that they would (albeit on lower bit-rates 128, I can see a point). Even most DVD audio is encoded at 192 and I can't hear a difference, even though a difference is there.

    Even the samurai
    have teddy bears,
    and even the teddy bears

  • I can hear the difference between different encoders and different decoders. I consider myself a moderate audiophile.

    As a test, try encoding the same song using two different encoders (making sure to use the same bitrate). Using the same decoder, see if you can tell the difference. You can also try downloading the MP3s from the site referenced. A quick listen to them (at the same bit rate) should show an audible difference.

    The only other differencemight be in speaker set-up. A crappy computer speaker might not be able to really show the difference between two MP3's.

    I use a Blade-based encoder on my Mac with Cambridge Soundworks Digitals.
  • by xenoweeno ( 246136 ) on Saturday October 28, 2000 @08:46AM (#668959)
    Spectral and waveform analysis and such has all been done before, and LAME has been known to be superior for quite some time. I've been singing the praises of this site [r3mix.net] for at least six months.
  • First off, the *only* way to evaluate the quality of a perceptual encoder is to listen to it, period. Who cares what is rejected (non encoded) if you don't hear it.

    I'll agree that perception is what matters. However, what souds great on my $48 Labtec speakers at work sounds like crap on my $500 studio headphones at home. The fact of the matter is, most people don't have $25,000 of audio equipment [belgacom.net] nor sufficiently trained ears to tell the difference. I'll readily use LAME encoded stuff from people I trust, but cringe in horror when I listen to the rapage that Xing's encoder performs to the quality of complex music.

    Think of it this way: most people are arguing which color of crap tastes better. Sites like this one [belgacom.net] and the one in the article are trying to point out that you don't have to eat crap.

    hymie

  • You are talking about applying only the psychoacoustic model of the mp3 encoding, and producing a comparison of that with the original signal. I would indeed be really interested in seeing that- I'd like to know which of the various distortions, over-rings etc. arise from the psych model and which arise from the fractal part.

    yes that's exactly it, i think it would be an interesting exercise, as i don't recall seeing any study of that as of yet. i'm sure much has been done to develop psychoacoustics in the first instance, but as that was way before mp3 actually came about, this info won't be readily available from mp3 sites (though thanks to the anonymous coward's url elsewhere in this thread!)

    i think removing the psychoacoustics and simply applying the fractal transform on its own would result in a lower perceived quality-per-bitrate ratio, not much else. but it's interesting also.

    to do any of these experiments, we'd need access to the source code for an mp3 encoder - are any of these available? LAME for instance? i'm sure fraunhofer's is available from less reputable websites ;)

    fross
  • Kexis is a GPL'd lossless encoder which has proved to be _almost_ as good as shorten for filesize, is _much_ faster to decode and encode than any encoder I have ever used... The fact that the kexis file format may change in the future is largely a petty issue as you can simply losslessly convert from the old format to the new one. Have a look at it at http://kexis.sourceforge.net
  • I am tired of seeing people making MP3s tests comparing the initial signal and the resulting encoded one in terms of how similar are they.
    This is useless. MP3 is perceptual coding, and the only way by now, to decide what is better is to listen and decide. If you can't hear it, why do you need to encode it? That's the idea of MP3.
    Don't try to see if the encoded signal looks the same than the original in terms of spectral content, try to see if it sounds the same!
  • Blade became popular because it was the first program to be banned by Fraunhoffer. In fact, blade is really a copy of the ISO reference code, optimized for speed. Lame incorporated massive quality improvements, but came too late to catch the wave of publicity offered to Blade. It would be nice to have access to the code which generated these sonograms.
  • r3mix.net is really the definitive site for this sort of thing. Not only does the site show waveform deviation, but the tester actually listens to lots of very diverse music to test for quality. The waveforms are used mainly to explain errors heard during listening (ie. what the hell is that fuzzy warp sound overriding the bassline?). So anyways, read up at r3mix.net -- you'll realize people have already done this much better.
  • by Anonymous Coward on Saturday October 28, 2000 @09:44AM (#668966)
    Audiophiles are interested in the most accurate reproduction of sound...

    Absolutely. CD quality (44.1 kHz 16 bit PCM) is total CRAP to true audiophiles. I won't be satisfied until they invent a format that will store the timing and stength of every single air molecule hitting my eardrum, precise to within the Heisenberg uncertainty principle. Uncompressed.

  • Ever hear of shn files? People use them for trading bootlegs because it cuts the file size 50% and produces no loss.
  • by Malor ( 3658 )
    You need to learn better how to pull out the interesting facts from the crap. I used that same article to discover the existence of www.r3mix.net, where I learned how to encode mp3s well enough that I can barely, barely tell the difference with good quality ($400) headphones. (with a few of my CDs there are 3D-ish effects that are lost in MP3 encoding, for example.)

    Ask /. articles are often a great way to get info but you have to be willing to do some reading and thinking for yourself. Often the best articles are the shortest ones -- they are just links to outside sources.

    This article is way inferior to www.r3mix.net [r3mix.net]. You should go back to that old Ask /. article and figure out why you didn't pick up on that web site. The fact that you didn't come away with an answer from the first article was entirely your own responsibility. All the info you needed was there.

  • some of my vinyl is way better

    Vinyl sounds "warmer" because...

    ...the scratches and pops remind the listener's subconscious mind of a fireplace. [Read More... [everything2.com]]

  • I disagree, to extend your metaphore, this is like your optomologist playing a song on different stero systems and perscribing your diopter based on your reactions to the sound. What is being measured here is an absolute difference in the sound, but the value in lossy compression (both in audio and visual realms, and others?!?) is that you can loose data size without losing the important data.

    This test is valueless, as it does not take the human ear into account. The quality of the compression is completely a subjective thing, it will always be so. There will never, ever, be a worthwile mathematical test for lossy compression.
  • I've been using GoGo, which is another Japanese implementation of Lame, this one with MMX acceleration. It sounds fine at 128k to me, better than Xing, which everyone agrees is crap. Fast crap, but crap.
  • I know that... I don't encode Classical (anything) with AudioCatalyst. Nonetheless, if you are going to include LAME and BladeEnc, which suck ****, too, you might as well throw in Xing, no?


    ---------
  • You can't compress data and also have the output be the same as the input. Think about it, there are 256^(# of bytes in the sample) possible inputs, and since every encoder output can only decompress to one possible input, the only way to get 256^(# of bytes in the sample) possible decompressed results is to have 256^(# of bytes in sample) compressed outputs-- i.e. 1:1 compression ratio.

    That's trivially proven to be incorrect since gzip and bzip2 compress data and yet have the outputs be the same as the inputs. In an audio context, ten minutes of a pure frequency sound be easily compressed to a small size. The only information you really need keep is the length of the tone and the frequency.

  • Distortion you can't hear will affect you. It will cause you to feel tired and stressed. This is one reason that people who have to spend all day listening hard to audio (audio engineers) choose reproduction equipment that introduces the least distortion (or introduces distortion in the least displeasing way).

  • I would also be very interested in seeing similar graphs (preferably from the same source) made with Vorbis encoders, to see how they stack up.
  • by Fross ( 83754 ) on Saturday October 28, 2000 @11:05AM (#668976)
    first off, i must say this is a very interesting article, and an original and potentially useful analysis for comparison both between mp3 formats and, to some extent, between mp3 and other audio encoding formats. however, the correlations between visual distortion and loss of audio quality are *NOT* valid or accurate, something the article doesn't place enough emphasis on. :)

    the key point here is that mp3 encoding is in fact a process of two separate transformations (both of which consist of many processes, of course), the first of these is my bone of contention as it seems less well-known than the second, which i will address first.

    the "second transformation" is the one familiar to most people, the iterative fractal encoding procedure, which simply adds information to that audio frame until it a) either hits a "quality threshold" (ie is consider good enough), or b) fills up its bitrate allocation. it's similar in many ways to making a "jpeg of sound". you can get a good view of this whole process by following this link [iis.fhg.de] to a graphic of the aac encoding process on fraunhofer's website. It is the stuff inside the box at the lower left that this concerns.

    however the first transformation here is the important one, this is the stuff outside and above the box in the graphic linked above. (i am not sure the graphic is detailed enough, there may be some missing, from what i remember) - this is a series of transformations to limit the amount of data the second transformation has to deal with (and hence get essentially better encoding for the same bitrate), according to the way the human ear works. our ears have "features" like having a dead area in frequencies near loud noises, which means these bits can be cut out, and other bits and pieces that i can't remember and don't have to hand ;) this is of course psychoacoustics, as other people have commented. there is a _very_ basic primer on this at the fraunhofer site here [iis.fhg.de], but it doesn't go into any technical detail.

    as an aside, there used to be some fantastic and informative articles on these subjects at mp3.org back in the day (1997-1998?), may it rest in peace. does anyone have some links for where something as good on this subject is? i haven't been as in touch with the technical side of mpeg encoding as i used to be...

    but anyway back on subject, this first transformation actually distorts the signal *significantly*, but only in a way that makes it easier to process, while still sounding the same (or close) to the human ear. it may be an interesting exercise to isolate this first transformation, apply it and then save without any fractal encoding, and compare that to the original signal. this transformation will cause great "visual degradation", as shown in the article, but imho this is not an accurate criteria for measuring audio quality. still interesting, and a good read, though :)

    fross

  • He's measuring the MP3 encoders, and Ogg Vorbis is not an MP3 encoder

    Wouldn't it be interesting to make one of these tests comparing many different encoding techniques (MP3, Ogg Vorbis, VQF...)? I saw once one that made a comparisson between MP3 and VQF (I think it was posted to Slashdot, maybe) and it was pretty interesting.

    I tried the Ogg Vorbis encoder the other day for the sake of trying, encoding a small song (Black Sabbath's Paranoid) with both BladeEnc and the Ogg Vorbis enconder... and I can say that the high frquency responde for Ogg Vorbis was much, *much* better. The MP3 sounded noticeably different from the CD, while with the Ogg Vorbis file such a difference was not so trivial to hear. (Ok, I know that it is a well known fact that MP3 sucks at higher frequencies, but, it was an example.)

    Anyway, a deep comparisson showing the pros and cons of each encoding technique would be very interesting. This won't change the fact that it will be very very difficult to convince people that there may be better alternatives to MP3, but...


    --
    Marcelo Vanzin
  • Yeah, just like with other things. Take monitors for example. You may not be able to tell (or appreciate) the difference between 70 and 100 hz refresh after a couple of minutes of casual computer work, but after a few straight hours staring into the thing you sure will!

    People whose jobs rely on sound perception or who listen to a lot of sound (audiophiles maybe, hmm?) know that regular mp3s you download from Napster (128kb/s) are of average sound quality. A good encoder and a higher bitrate (160+) will do wonders.

  • This [r3mix.net] site is the whole reason why I started using LAME...
  • I consider it _very_ interesting to know what frequencies pre-echo and over-ring are occurring. As you know, the sonic results of this type of distortion are greatly different depending on what frequency they're at. It's going to be a hell of a lot harder to hide a pre-echo or ring at 3K than one at 200hz- or 12K.

    That said- the sonograms are greyscale plots of deviation from the original signal. They are inevitably offset in time by the encoding process- I aligned them using those ugly transients in the center. There are two little charts under each. The second is the pre-echo and over-ring. The _first_ is precisely the opposite- deviation from the sound that was the 'same', with the weighting of the little chart (a RELATIVE measurement) emphasising the content of the wave rather than areas that are supposed to be free of additional frequency content.

    I don't think it's possible for an encoder to mangle the audio quality and have a pristine 'sonogram' as differenced with the source material. A pristine sonogram would be uniformly BLACK when this was done- none of the encoders remotely approached this. Any mangling, no matter what sort, will show up as a lighter-than-black area on the differenced image. I'm very much a high end audio dweeb at heart, but I don't believe there can be mangled audio quality without the Fourier content changing, and thus the sonogram showing big gray or white blobs.

    I wholeheartedly agree that quantitative analysis of perceptual audio coding is not easy! :)

  • What about your amp and speakers?
  • If you would like to write an encoder that actually encodes audio, and specifically trades everything off to perform terrific on 'EncoderHell' (the test tone noise), please do so! It might have interesting results when used with regular music. You cannot make it perfectly reproduce the sound without effectively hardcoding the exact waveforms for that sound into the encoder, because of the elements that use random bandlimited noise at up to 22K- there aren't enough bits to literally store that information in an mp3.

    The ideal result from the process (totally unaltered waveform information) would be an entirely _black_ 'sonogram' at the end of the process. That's not going to happen. Since there are going to be deviations, it's down to the psychoacoustic model- and the pictures and charts are going to show what the encoder chose to throw away, on a larger scale.

    You can argue that the encoder throws away stuff that can't be heard, therefore measuring _that_ is meaningless. This equates to arguing that the result is indistinguishable from the source audio. I disagree, and feel that all mp3s are audibly degraded from the source audio- which is itself degraded, being typically 16 bit 44.1K digital audio :)

    I'm trying to measure what the encoder's failing to do. The project was meant to answer my own questions, and has done so.

  • Not so much 'better'. DIFFERENT. I think it's plain that Blade makes very different choices from Frau or LAME in discarding information. The results I got would suggest that Blade is only good for classical music but excels at that, that Frau is 'mid fi' sonic spectacular, that LAME strikes a balance between sonic spectacular and being driven into artifacts and coloration. Sure enough, I'm seeing people citing orchestra conductors who would only accept Blade, people getting Really Agitated (Fraunhofer fans?) and people saying 'yeah, I already knew LAME was best so your page has no point' ;)

    Personally, I'm with LAME for my sonic requirements, although the only mp3s of my music out there (so far) are Blade, done many months ago before I did this research. But the point is not that there is a 'winner'- the point is that the differing sonic characteristics of these encoders CAN BE QUANTIFIED. Perhaps not measured outright (my charts etc. are _relative_ to each other), but these encoders take significantly different approaches to discarding information, and that applies directly to your choice of encoder for recording music, and translates to a completely predictable sonic characteristic of the encoder on ANY music, no matter what.

    I put all sorts of music through Blade when I was on mp3.com with only Blade for a free encoder- no matter what I did, the result was always identifiably BladeEnc, with the smooth extended frequency response and absolutely terrible transient impact. For some pieces, this was suitable- for some it was grossly unsuitable. But the sonic characteristics were consistent- and correlate with what I learned about the encoder in this 'torture test'.

  • My old p120 plays most mp3's smoothly, even some discmans do...
    Why don't we shift to a more agressive compression-method for todays systems then?

    Why not grab the wav files and use the highend "100x" compression methods "you read about, but never see irl"?
    (Those articles allways claim: "current systems are too slow for this (fractal method)", but they never mention anything useable.)

    Current top-of-the-bill machines should be capable of playing the raw cd-grabs realtime from a highly compressed file, in the process possibly getting near-DOS system loads (who cares), without loosing any detail of the original track and making mp3 sound like the inbreed godzilla version of this pure little salamander.
  • Not true... read the front page. He describes pre-echo and overring, both in audiophile terms.
  • by Anonymous Coward
    WTF?

    LAME and BladeEnc produce terrible sound quality (or you have to use ridiculously high bitrates; no wonder Napster's full of huge 160 kbps files) and Fraunhofer is the only one that I'd call even adequate at 128 kbps. It's the sound not some friggin' visual graph of the music that's ultimately important.

    Ogg Vorbis [vorbis.com], on the other hand is superb. It's not only free of patents and also GPLd!

  • Ogg Vorbis?
  • Oops, I am using Lame 3.88 Alpha 1
  • Mark Neidengard (or Niedengard, maybe?) from Caltech (was an undergrad... he's now at Cornell) has an analysis on his page... it seems to jump around, but it's worth a look.
    Anyhow, good thought nonetheless.
  • 128kbps mp3s often sound like crap anyway, though, especially when used to encode classical music. A much better comparison would have been with the output of a CD player.

    Even then, you'd need to ensure that the rest of the audio reproduction path was the same: a CD played on crappy speakers will almost always sound worse than a high-quality analog setup with top-notch speakers.

    Finally, keep in mind that these kinds of do-it-yourself experiments are notoriously lax at controlling for confirmation bias [skepdic.com]. This is particularly troublesome when your goal is to measure something as subjective as audio perception.

  • Hey, check this [arstechnica.com] out, courtesy of Ars [arstechinca.com]. It presents an alternate viewpoint using different means. I remembered reading this not too long ago. Interesting read. . .and to add a spoiler, it definitely recommends the Fraunhoffer over LAME and BladeEnc.

    -s

  • It turns out the lossless compression of audio has a practical limit of about 3:1 across a typical sample of 44100Hz 16-bit 2-channel music sources using the best currently known methods. The research effort on lossless audio compression has really lost a lot of momentum with the arrival of good-quality psycho-acoustic techniques offering 12:1 compression. Lossless compression isn't impossible, it just isn't a big enough win.
  • That's nice and all, but if you can't hear a difference, what does it matter? I, too, would prefer to use the "best" one, but if I didn't know which was the best until this test, what do I really care?

    __________________________________________________ ___

  • The reason I'm interesting in seeing what the no-psychoacoustics version would do is that, apparently, the psychoacoustic transform is a dynamic but extremely elaborate equalisation curve. There are definite consequences to doing such an elaborate correction- and I think Fraunhofer has illustrated some of them, in particular by pushing for an _extremely_ sharp cutoff at the top end of the frequency range- which results in ringing.

    A lot of the pre-echo that's showing up as resonant peaks could be attributed to this type of equalisation. If that is the case, applying the fractal transform alone would result in noticably coarser frequency information but, at the same time, a much cleaner time domain with pre-echo and over-ring much more diffuse and unhearable. It might be percieved as extremely dynamic but somewhat colored sound with a great deal of openness and energy but compromised tonality. It might be terrific for certain types of electronic music, drum machines and such things.

    I'm working on being able to try this experiment. If I can do this, I can also experiment with different types of filtering (realistically, I'd be working with a programmer who would know how to do this but might not have thought to try some of the things I'll suggest).

    If anybody tries this sort of thing, don't test it on stuff that would obviously suck! It would be pointless trying it on classical, or easy listening. On the other hand, gabba house music and really harsh techno, or brutally distorted heavy metal... I know I've got stuff that I'd like to have encoded in a way that pushes impact at all costs and brings out the rawness of the sound at the expense of the detail and clarity. That should be possible, don't know if axing the psychoacoustic model would do it- depends on how much it compromises the original sound. Not all filters produce such obvious artifacts- just the ones with really sharp slopes such as the top lowpass filter in Fraunhofer.

  • I have been wondering about this kind of thing for a long time. I have used Lame as of late because it is very fast with the optimized compile I have. I wish it were as fast on VBR, but I guess I'll have to settle for CBR.

    I'd really like to see something like this with Ogg Vorbis once it matures. Or now even, because it seems to be a bit better already, though it's hard to tell on my laptop speakers.
  • The fact that the kexis file format may change in the future is largely a petty issue as you
    can simply losslessly convert from the old format to the new one.


    Yeek; that's fine until you have several gigabytes to convert each time the format changes.

    One of the best things the Minidisc inventors and the MP3 inventors did was to keep the decoding algorithm static, while allowing the encoding algorithm to improve as technology improved.
    --
  • I'm not sure how good such an algoritm would be, but there are trade-offs.

    You've got to be a bit careful doing a time-based FFT on your audio, because that means you've got to unencode the entire file before you can play any meaningful segment. You can't stream it through: You've got to pay all your computing expenses up front instead of spreading it out through the entire wave file.

    This might be okay for a scheme that doesn't take too much computing power, but if you want to incorporate all the splefty psycoacoustic models and the other stuff that's been flying around this discussion, a piece of software using such a scheme is going to take a hit.

    Are you willing to wait for five seconds after selecting a song to hear it play? How about thirty seconds or a minute?

    -BMagneton

  • I've been using the VBR Lame Encoder 3.99 Alpha for a couple of weeks and I love it. It's fast, and it sounds great. I was using BladeEnc for a while. I have found that Lame sounds better, and using VBR will result in a smaller file than Blade and still sound better.
  • by Deluge ( 94014 )
    I'm so grateful that someone had gone out and done something of this nature. I had, in the past, tried to make a decision on what encoder to use to encode my CD collection, and one of the 1st places I looked was the Ask /. story about what the best encoder was.

    Ugh... Everybody thought that whatever they were using was the best thing under the sun, no research supporting their claims, and in the several hundred comments, not even a hint of some general concensus.

    Finally, something that'll allow me to choose based on fact, something that'll allow me to make an *informed* decision. Thank you.

    ---


  • I'm glad to see I've been using the right encoder (BladeEnc) for all my classical music. I can't remember why I started using it (I don't think I've used anything else), but now I see that it beats the others as far as tonality goes. Classical music is all about the right pitches (I even have perfect pitch), so perhaps that's why all those bad sounding classical mp3's off napster sound so bad (or they were ripped off records...).

  • Ah, but nowhere does this article try to disprove that, does it? The whole point is that certain codecs does a better, more intelligent job of discarding information, and that is what the author set out to prove.
    ------------------------------------------ ---------

  • by joey ( 315 ) <joey@kitenet.net> on Saturday October 28, 2000 @09:59AM (#669009) Homepage
    He's comparing the output of the encoders, once decoded. If he had a vorbis decoder that allowed him to get the information he needs, or course he could do a meaningful comparison. And it's the comparison I and probably many of us are most interested in.
    --
  • Not that I particularly care, but this seems to be a shallow argument. When you're searching the skies, you're trying to FIND something; ignorance is NOT bliss in this case. When you're listening to music, all that matters is what you can hear. Now maybe there is a more scientific method to determine what you can hear, such that you can detect percentable problems before you run into them, but other than that, who really cares?
  • by Rei ( 128717 ) on Saturday October 28, 2000 @11:56AM (#669015) Homepage
    The author is on drugs, is all I have to say. :)

    I'm taking a course currently on audio and image compression, and his article annoys me greatly. He uses ambiguous terminology and often the wrong terminology (for example, calling things "wavelets" that aren't actually wavelets). He describes things which can't be seen clearly in the graphs and would much better be viewed with a different display format. Etc.

    I'm still wondering if some of my compression ideas will work... I plan to test them out before too long: grouping some of the generally weak high-frequency signals together since the human ear is less sensitive to high frequency pitch variation (we're sensitive to frequency on a logarithmic scale - an octave is a doubling of frequency); and, instead of doing block transforms on the music, generate a 2d image of the signal (graph: frequency vs. time), compress the frequency axis as you normally would, and instead of saving the time axis as a series of blocks of discrete frequencies, actually compress it greatly with a fft - doing this, you should be able to save space on recurring themes in songs (such as a chorus, a regular beat, etc). Voice may introduce complications, though, and I may end up having to do some kind of combination between the two (such as, compressing the difference between the original and final signal as a low quality block transform and saving it with the compressed signal). Two ideas of mine I plan to test when this incredible work load from my senior year stops bearing down on me ;)

    - Rei

  • Audio quality for compression codecs cannot be measured in terms of visual graphs or synthetic benchmarks. (I.E. just comparing the difference between the original singal and the compressed signal does not work.)
    It is quite possible to have a singal that very much resembles the original wave graph, and yet sounds horrible to the ear. It is also equally possible to have a signal who's graph doesn't resemble the original very much, and yet has a much higher 'percieved' quality.

    Just remember: The first rule in every single BEGINNERS guide to sound is to "Trust your Ears," and that is the only way to tell a good codec from a bad one.
    -----
  • Very cool! Quite a lot of people have been saying (in inimitable slashdot fashion ;) ) "j00 sux0rs! r3mix did this before you and is better!" *g*

    In fact I think I have seen this before and r3mix actually affected my approach to my encoder analysis. Definite kudos to r3mix, and I entirely agree with many of this site's decisions and approaches- interestingly they reach precisely the same conclusion as I did, that LAME 256 was the ideal archival encoder and LAME VBR was the best one for smaller file sizes- except that r3mix has added the recommendation that joint stereo be used in the latter case! (this would really hurt the relative comparison with higher bit rate stereo encoders with my mono test signal, but I think I will take the advice and try that for my own mp3s...)

    r3mix also chooses to use _relative_ graphs rather than attempting to give absolute measurements, something I heartily approve of.

    Now, here's the thing- r3mix's results are sometimes a subset and sometimes comparable to mine, just depicted in a different way. The primary measurement of a frequency sweep produces different-colored graphs- if you take the horizontal axis and express the vertical deviation of each graph, from an ideal line of flat reproduction at the top, as a brightness value of a single pixel, you'd get something akin to a single line on one of my 'sonograms'. The test with the 'applaud' signal is an example too- if you subtracted the source from the results you'd end up with distortion levels very similar to my differenced sonograms.

    More interesting to me is the fact that my sonograms show an _intermediate_ step- several r3mix tests are the averaged responses of an encoder over time. That is exactly what my 'charts' are- they are sums of all the deviation and distortion over the entire length of a sonogram, over a range of frequencies.

    I'm almost certain I'd seen r3mix before doing my own analyses- I think it's very likely that this site significantly helped me define the processes I used for my own stuff. I heartily recommend checking it out- this is good work, I totally endorse it, in fact I'm going to put a link to it on my own encoder page right now :) *put* there!

  • What you said.

    I think the guy's hearing with his eyes, or using a totally different set of music than what I listen to.

    If you wanna hear how dog-fuckingly-shit Blade is, encode the first 10 seconds of New Order's "Blue Monday" (a basic drum machine emitting a sound common to much new-wave, dance, and industrial from around 1980 to the present day) at 128/160/192 using Blade, Fraun, and LAME.

    Blade will be unlistenable at 128, shit at 160, and you may hear artifacts at 192 if you know what to listen for. LAME and Fraun sound sweet, even at 128.

    Similar results can be achieved with a heavy guitar track, e.g. Def Leppard or other 80's "hair metal" bands.

    I don't have data on string quartets - but for non-classical music, Blade blows steaming piles of donkey dick.

  • Yeah- this is interesting stuff :) the fact that it's often measuring 128K tends to hurt LAME and Blade. I'd direct your attention particularly to page three, "Testing With Real Music: 'Dirty Blue'": Ars actually learned more than they realise with this test. I quote: "The FhG encoder strives mightily all the way out to 20 kHz, but this results in obvious errors in the power spectra." Absolutely- these are the artifacts I was able to illustrate in sonograms, and the artifacts are in part produced by high frequency ringing of Frau's overly sharp cutoff.

    _Definitely_ an interesting site. Also, referring to the listening tests: "The Fraunhofer encoder produced a surprisingly harsh sounding attack on the guitar; it remained quick and sharp, but was artificially crisp and accentuated." That's precisely what I was trying to say, couldn't have put it better myself. It turns out Ars _likes_ that. I do not. But if you do- clearly, you're going to like Fraunhofer. It's not about picking a winner, it's like picking a musical instrument...

  • by jbridge21 ( 90597 ) <jeffrey+slashdot ... g ['d.o' in gap]> on Saturday October 28, 2000 @08:30AM (#669025) Journal
    from the can-a-cue-cat-read-these? dept.

    Well, after calibrating my cat on a couple of Pop-Tarts boxes, I tried several scans on the diagrams on the web page... nothing! I can therefore conclusively answer this question with a big, fat NO.
    -----
  • by Djinh ( 92332 ) on Saturday October 28, 2000 @08:31AM (#669028)
    MP3 is about selectively discarding information from the audiostream. The purpose is not to create an output waveform which is as close as possible to the input. This is what the whole business with the psycho-acoustic model is about.
  • The guy used the example of Fairport Convention with Sandy Denny.

    I don't know about his rigor, but the guy's alright by me.

    Who knows where the time goes?
  • by jmv ( 93421 ) on Saturday October 28, 2000 @08:32AM (#669030) Homepage
    OK, now we see what parts of the spectrum are thrown away at very low bit rate, but why is it supposed to be "probably the most rigorous analysis of any encoders anywhere on the web"? First off, the *only* way to evaluate the quality of a perceptual encoder is to listen to it, period. Who cares what is rejected (non encoded) if you don't hear it.

    Also, while using the 32 kbps bitrate amplifies the effects of perceptual quantization, so it's easy to see them, the problem is that not all the encoders where meant to work at this bitrate.

    Think about it, when standard institutes want to evaluate audio/speech codecs, they don't calculate sonograms like this, they make subjective tests. They make a bunch of listeners hear the result of many encoders on *many* audio files. That's right you need many files to evaluate a codec. Some will perform better for certain musical instruments, some will perform better with or without background noise, echo, ...

    For all these reasons, I do NOT consider this analysis rigorous at all!
  • by jmv ( 93421 ) on Saturday October 28, 2000 @09:15AM (#669031) Homepage
    with an oscilloscope I can get a more precise answer

    Yes, the guy's sonogram is more *precise* but it is still irrelevant. I could write an encoder that gives a much better result when evaluated with this "precise" sonogram, but yet will sound like crap.

    This is the point of perceptual encoding. The goal is not to produce the best result in terms of signal-to-noise ratio or spectral distortion, but to cause the encoder "errors" where the "non-precise" ear won't hear it. And if you don't hear it, you don't care, even if your oscilloscope of spectral analyser tells you there's an error.

    The most critical part of a perceptual encoder is the "psycho-acoustic model", which tries to model as best as it can the sensitivity of the ear at a given frequency, given the rest of the spectrum. This is not an easy task, and you have to make lots of approximations. Given two encoders that produce the same quantitive result (SNR, ...), the beat one will be the one with the best psycho-acoustic model and your $10 k oscilloscope of spectral analyser won't see that at all.
  • Is it me, or or does this seem like an oxymoron? Not being an audiophile, someone correct me if I'm wrong here... Audiophiles are interested in the most accurate reproduction of sound... Why would they even consider a lossy compression scheme at all? Just like serious digital artists shun JPEG for all but web distribution to the masses, and even then we see much done in gif or tiff. I would say that MP3 audio done by ANY encoder is unacceptable to an audiophile.

    Second, I want to challenge some of the assumptions and declarations that this experimenter made. The experiments placed on these encoders are mostly "torture tests" that one would never encounter in real situations... And by using this series of torture tests he tells people which encoders are best for encoding mp3's. Does anyone see this reasoning as flawed? He's subjecting encoders to situations that NONE of them have been designed for, and proclaiming that this has something to do with reality. I see little correlation... How often do you hear pure sine sweeps in any song?

    I found the previous mp3 performance analysis posted on Slashdot to be much more informative. It put the encoders up on real world performance, and rates them accordingly.

    The guys who wrote the encoders realized that some things just wouldn't happen in normal music, such as these torture tests, so they wrote "shortcuts" that ignored these conditions, and resulted in a higher compression rate! How dare he rate encoders on something that the programmers all deliberately IGNORED.

    My friends, trust no statistics that you did not falsify.
  • Because it's a real pain in the ass to mess with 300 CDs, but it's really easy to select a directory with 300 CDs worth of music and put it on random. You have no idea how useful it is until you put 4000 songs (I'm not kidding) on random. :)
  • I don't think it's possible for an encoder to mangle the audio quality and have a pristine 'sonogram' as differenced with the source material. A pristine sonogram would be uniformly BLACK when this was done- none of the encoders remotely approached this

    I'm sorry, from what you're saying, I just don't think you really understand what perceptual encoders are. First, if you have a 10:1 compression ratio, your sonogram cannot be all black (that would be lossless). Now, writing an encoder for which everything is grey (instead of black and white as the sonograms you found) is very easy to do, but it will sound like sh*t.

    Very simple experiment: take a signal and add white noise so you get a 20 dB SNR. It'll sound _very_ noisy. Now, while preserving the noise energy, shape that noise to look like the signal (of course, still 20 dB lower). The audio you'll hear will sound quite OK (though not perfect) and much better that with the white noise. You have just used a (quite simple) psycho acoustic model.

  • If you would like to write an encoder that actually encodes audio, and specifically trades everything off to perform terrific on 'EncoderHell' (the test tone noise), please do so

    Quite easy... strip out the psycho-acoustic model from a good MP3 encoder (like LAME) and you get a crappy MP3 encoder that performs very well in you sonogram test.
  • I would hope that anybody reading either what I wrote, or what you've just written, would avoid accepting unsupported claims, consider the facts of the situation, and make up their own minds...
  • by hymie3 ( 187934 ) on Saturday October 28, 2000 @08:36AM (#669047)
    I know that Xing (AudioCatalyst) doesn't have the greatest encoder, but that's no reason to leave it out...

    Well, actually, there is a reason: [belgacom.net] the Xing encoder blows chunks. Sure, it's fast, but the sound quality sucks. If all you're encoding is Teeny Bopper of the Week music, then you're not missing out on anything. If you're encoding stuff that's a lot more complex, you're better off with soemthing that doesn't sacrifice quality for speed..

    hymie

  • by jmv ( 93421 ) on Saturday October 28, 2000 @08:36AM (#669048) Homepage
    Ogg Vorbis?

    He's measuring the MP3 encoders, and Ogg Vorbis is not an MP3 encoder, but an Ogg Vorbis (duh!) encoder, it doesn't use exactly the same encoding scheme, though it is still a perceptual encoder (based on time-frequency masking).
  • by Jack9 ( 11421 ) on Saturday October 28, 2000 @08:40AM (#669049)
    http://users.belgacom.net/gc247244/analysis.htm#MP 3ENC31 This is what I found when searching for mp3 comparison. It compares different implementations of encoding for mp3 as well as output quality. Much more useful and definitive.

    Often wrong but never in doubt.
    I am Jack9.
  • Um- the 'sonogram' we're talking about here is the difference between a source and result sonogram (which is a pretty simple plot of frequency data over time). As such, one that was uniformly grey would represent a wave in which the distortion from the original is totally uniform over the complete frequency range _and_ time range. If I'm not mistaken, this would be essentially the same as taking the exact original wave, unaltered in any way, and perfectly blending pink noise with it. I don't see any other way you'd get a 'sonogram' result like you describe. Keep in mind that even if you took raw pink noise and called that your 'result' you wouldn't have a featureless 'sonogram', because it is relative to the original source- a 'sonogram' of full-spectrum full volume noise is uniform WHITE, and this differenced with the original source's sonogram will just invert it.

    For this reason, writing an encoder for which everything is grey (given the techniques I've been using) is far from very easy to do- and the sound of the file that would produce this result would have to be 50% perfect uncolored reproduction of the wave, and 50% pink noise. That's basically as tough to do as 100% perfect uncolored reproduction of the wave, and it'll sound bad because of the loud pink noise, but I don't think it would sound like you are imagining it to sound.

    The point about noise and psychoacoustic model is well taken- I'm not claiming that my testing is illustrating psychoacoustic model suitability. If you think about it you can see that's not testable- it's going to be different for every song, and every listener. Some people can't hear over 12K- nuke it! Some people are acutely sensitive to peaks at around 3K- for instance, someone with tinnitus who's subject to the phenomenon of _recruitment_ will find a resonance there to be painfully unpleasant.

    I can't possibly test for that and am not trying. I can, however, work out where the errors are, where artifacts are being produced in the frequency band, and what types of resonance are present, and that information can be used by any person who knows what their psychoacoustic model will accept. For instance: if you like Xing, you'll probably like Fraunhofer at high bit rate still better. If you run screaming from Xing and hate all mp3 encoders, you might need to go with Blade assuming you listen to smooth music like classical. If you can't stand Blade at all, Fraunhofer might be right up your alley. These are quantifiable observations based on driving all these encoders completely beyond their ability to cope, and watching where they break down.

"All the people are so happy now, their heads are caving in. I'm glad they are a snowman with protective rubber skin" -- They Might Be Giants

Working...