Visual Analysis Of Mp3 Encoders 127
Chris Johnson writes: "I've just finished an interesting scientific analysis of several mp3 encoders and have my findings up on the Web. The process involves differencing a 'sonogram' image from an encoded test signal with the image of the original signal, and then producing response curves showing the disparity in direct signal volume, and over time. Umm . . . which is just to say this is probably the most rigorous analysis of any encoders anywhere on the web, and very geeky (in a good way). LAME carries the day, but BladeEnc shows that it has a completely distinctive sonic approach- and Fraunhofer proves unacceptable (in the version I tested) for audiophile use, though it's unbeatable at very low bit rates. See why." Truth in advertising -- this is a cool example of how visual information can convey more than you'd expect it to.
And now for the big question... (Score:1)
Human hearing (Score:1)
You can lose more detail at high and low frequencies without it having as noticable an effect on the sound as perceived by the listener.
Re:Is not! (Score:3)
No, when a certain frequency component is discarded, it's not because the listener won't mind, it's because even if it's there, the listener cannot hear it. If you can't hear a sound, why encode it? Now, there are sometimes problems with classical music, but that's because it's often hard to predict exactly what you can and can't hear.
Re:That's nice (Score:1)
Besides, something that might show up in the visual depiction may be audible, but not necessarily obvious the first time you listen. It's kinda like when you're at the eye doctor and he's flipping through lenses: "Is this one better? How about this one? Is the first one better than the second one? First one? Second one?"
You may not notice a visual problem with one of the lenses at first, but then after wearing them all day, you get a headache.
Re:That's nice (Score:1)
Mikael Jacobson
Quaint, but flawed (Score:5)
That said, it is generally the case that "pre-echo is bad" and "over-ring is bad." Reducing these can be thought of as a good thing. Let's assume that for these encoders, pre-echo and over-ring are universally bad (I'll give an example where this isn't the case, below). Furthermore, this comparison actually says nothing about these encoders other than the pre-echo or over-ring. I.e. what happened to the sound that was the "same" on the sonogram? It is quite possible for an "encoder" to mangle the audio quality yet have a pristine sonogram by this test's standards.
Just to throw a wrench in the works, more advanced encoders and/or psychoacoustic models can utilize what's called temporal masking. This is the ability of a higher-amplitude signal to mask (make inaudible) a lower-amplitude signal either before or after itself, as far as the human ear is concerned. Pre-echo is the phenomenon whereby a transient signal (i.e. a very 'sudden' attack, like a drum hit) is smeared in time. The audible effect can be most obnoxious. Yet encoders utilizing temporal masking will explicitly allow a certain amount of pre-echo through, as long as it is temporally masked. This leaves the encoder to spend those bits on other parts of the signal that would be more seriously degraded as far as our ear is concerned. In short, a sufficiently savvy encoder could exhibit more pre-echo than another worse-sounding encoder, especially if it uses temporal masking.
Quantitative analysis for perceptual audio coding is not easy; this has been a grail for researchers in the field for years. I strongly suggest that interested parties dig into various IEEE and AES (Audio Engineering Society) journal papers on the subject, as well as various books, etc.
Another fun experiment (Score:2)
Another fun experiment is to do this same thing sonically (makes a little more sense) -- encode to mp3, convert back to wave, and then subtract the original from the encoded one. The resulting wave will have all of the bits which were discarded.
It's difficult to interpret the results (I agree with those who say that this study is more or less worthless) but it does sound pretty neat. =)
Re:In the final analysis (Score:1)
Re:In the final analysis (Score:2)
Right now, the attitude is "Why be able to store several hundred songs, when I can store several thousand..."
In a couple of years, the numbers will change but the rationale will be the same. Why store ten thousand PCMS when I can have a hundred thousand??
I agree at some point things will become meaningless, but there will have to be quite a major revolution first... Perhaps that infinite data storage by quantum methods. Perhaps I'm a bit too hesitant to rely overmuch on Moore's law.
E
Something I forgot to add... (Score:1)
Though speed and storage double easily, I've noticed, so do audio file sizes. There comes a point in the future where we are just not sure anymore, but I think at least for the forseeable future, audio compression will become more important, not less.
E
Back the truck up... (Score:1)
you are missing the point (Score:1)
personally, i want to see mp3 music come as close to uncompressed music as possible. i want to encode my songs without tinnyness or that annoying "swoosh". to me, an effective method of sound compression has no compression artifacts and has an output exactly the same as the input.
i think people who really listen to music should go for MP3's that SOUND good, not just look good in a white paper.
Re:MP3 for Audiophiles?? (Score:2)
MP3 distortions are very evident especially at 128kbps(so called CD quality) They become less evident the higher the bitrate, but even at 320kbps the distortions are still easily identified compared to the original CD.
Re:Blade is not an encoder (Score:1)
It took LAME's quality and then was optimized for speed...
You can't make an objective test of mpeg encoders (Score:5)
The basic idea of mpeg is that the encoder removes the parts of the music which you (probably) can't hear. The encoder splits the sound into pieces, and rates each piece after how important it is for the total sound image. Then it starts with the most important sound and encodes that, and continuing with the less important parts until the available bit rate is reached (e.g 128kbit/s). The rest of the sound data is discarded.
The tricky part is the calculation of the "importantness" of each sound, and that is what differentiates the encoders. This calculation is done with an algorithm called "a psycho acoustic model".
To measure the quality of an mpeg encoder automatically, you need an algorithm which calculates the quality the the encoded signal. By knowing this algorithm it is trivial to create an encoder which will score maximum on this quality measurement, since the quality measurement algo is basically the same as the psychoacoustic model.
This test is "snake oil", a real test of mpeg encoder unfortunately involves listening to the music to evaluate the psycho acoustic model of the encoder, and not comparing two artificially created psycho acoustic models with each other.
Re:MP3 for Audiophiles?? (Score:2)
MP3 and audiophiles?? (Score:1)
Besides, as computers and networks become faster and storage cheaper and more compact, we're not too far from the point where non-lossy compression wil suffice, as far as downloading/storing music is concerned.
I want my music in .gz format, not .mp3 !
--lp
Re:you are missing the point (Score:1)
If you want your MP3 music to come out as closely to the original in a sonogram as is possible, you have not understood what MP3 is about. I think one would like to get an MP3 file that sounds the closest to the original. Visuals be damned.
Hooray (Score:1)
__________________________________________________ ___
Re:Conventional Wisdom (Score:2)
Giving this sort of thing to Slashdot is as fun as nude mudwrestling. Gotta love it. :)
Re:Visual analysis of MP3 is nonsense (Score:1)
That one of the sonograms seems to be closer to the original visually says nothing about how it will sound.
The real reason (Score:2)
On the Mac, I would have to _pay_ to use the Xing encoder. I just got through a serious ramen-and-spaghettios period, and there's just no way I'm going to merrily throw money at people who not only support the mp3 licensing patentholders, but also make an encoder that is considered to be more prone to artifacts and ringiness than even the Fraunhofer high bit rate stuff.
Beat me, whip me, slashdot me and call me unrigorous, but I'm not paying money for Xing. The lurkers support me in email. So there ;)
Well, it was 'scratching an itch' really (Score:2)
I had to know why- no, scratch that, I knew why. I had to know which encoders did better- what they in turn traded off- and I had to know across a wide range of bit rates in a way I could quickly cross correlate.
I've written for (IMNSHO) the foremost High End Audio journal. It's not that I'm not interested in listening to encoders! But if they are _all_ quite compromised, why not break 'em down into a series of measurements relative against each other with clearly identifiable characteristics? Shows you what to listen for- and tips you off to particular issues.
Re:FP (Score:1)
This is like that. The original ASCII art [slashdot.org] was mixed with antimatter, in the lacking of " ".
Leaving the disfigured creature you see before you.
Re:What about... (Score:4)
You can add me to that list- and such a comparison (I naturally kept a logbook to be able to reproduce the process later) would indeed be meaningful to me. For instance, if Vorbis was more sophisticated in its control of over-ring and either imposed a flatter characteristic (resisting resonant peaks) or went for an intentionally tailored characteristic (say, suppressing ring around 3-5K like Fraunhofer 32K bit rate) this would have obvious and interesting application to the sound quality. Conversely, if it had big ugly peaks and artifacts, their location in the frequency response would tell a lot about the sonic signature of the encoder.
Re:Something I forgot to add... (Score:1)
E
Re:Web page background. (Score:2)
Doh! For years I've used a purely white background for airwindows.com, with a sort of vintage-cnet layout. I also used to keep a 'graphics' section in which I had some web background gifs I'd done. They were made like this:
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
Do a diffusion dither between white and the lightest 'web safe' gray- then take all the pixels at x positions and knock 'em out to white too. The result (works with other colors as well) is a texture in which no two colored pixels are ever directly next to each other- it's a paperlike texture but never gets darker than half Netscape grey.
Which is to say- sorry, I did it that way because I liked it, and I'll keep it. Honest, I have done everything I possibly could to avoid obscuring the text, but it's sort of like a trade-off: in getting rid of additional table clutter that I used to have, I found that I liked the pages when this simpler layout was backed by the softest texture I had, rather than plain white.
I hope it didn't bother your eyes too much :)
Yikes (Score:2)
x x x x x x x x x x x x x
x x x x x x x x x x x x x
x x x x x x x x x x x x x
x x x x x x x x x x x x x
x x x x x x x x x x x x x
x x x x x x x x x x x x x
x x x x x x x x x x x x x
x x x x x x x x x x x x x
Woops. Or I could have said 'checkerboard' and saved myself the hassle :)
The idea is from company named Boxtop Software which produced a photoshop plugin that put different web safe colors in checkerboard patterns to produce a much greater range of 'web safe' colors (which look solid). I figured, why not run with that and do textures that way? Maybe the Gimp would benefit from some websafe checkerboard texture generators too :)
Anybody up for trying this out for real? (Score:2)
Actually, I think this would be a _very_ good experiment. I'm aware that my questioning some of these concepts is seen as prima facie evidence of being a tottering loony *g* but the whole concept of the psycho-acoustic model is so central to current audio theory... and this theory basically says, 'mp3s can be made to sound indistinguishable from CDs' and they cannot- the same theory on a broader level says 'CD itself is theoretically perfect sound', and it is not- mastering engineers, for instance, have learned that to do their work they need something better than CD audio.
I'm not certain that the psychoacoustic model must necessarily be that much better than, for instance, trying to diffuse unavoidable error as evenly as possible over the frequency and time domains. You are essentially insisting that concentrating the error in particular areas that are said to be 'masked' is far superior. This assumes the masking is effective, and that there are no side effects- neither assumption is wholly true, as large numbers of people are able to find fault with (say) 128K mp3s, and any filtering is going to impose extraneous characteristics. Finally, you're assuming that given an encoder that does not have a psychoacoustic model (I assume this would mean one that diffuses error pretty uniformly) is going to perform 'very well' in the procedure I devised. I'm not sure of that- I'd like to try it experimentally before jumping to that conclusion.
Finally, I have to admit- I haven't got the faintest idea what the resulting sonogram, and frequency/overring characteristics, would look like. I can say some things about it- with regard to the over-ring, diffusing it over a wider frequency range is not only desirable but markedly preferable. Fraunhofer loses badly to LAME, sonically, over just this issue- and Blade gets away with its severe over-ring by diffusing it over a wider frequency range. If the experimental psychoacoustic-model-less encoder showed significant improvements in diffusing out this over-ring and reducing its duration- there would be legitimate applications for its tonal characteristics, even if the raw frequency response was noticably compromised. It would be sort of like the 'anti-Blade'.
I don't suppose anyone will actually _try_ it, much less help me out with measuring it :P but if anyone is genuinely interested in investigating this, drop me a line? It sounds like something that could be attempted. Seriously- the whole point of such a model is 'masked stuff can't be heard'. If people can hear the masked error anyhow, what is the point? And if you assume people who can't hear anyhow and won't notice, what's the difference? Is it so axiomatic that you have to shun diffusing error evenly, and instead concentrate it in areas you think won't be heard?
I'd like to see results of that exercise (Score:2)
You are talking about applying only the psychoacoustic model of the mp3 encoding, and producing a comparison of that with the original signal. I would indeed be really interested in seeing that- I'd like to know which of the various distortions, over-rings etc. arise from the psych model and which arise from the fractal part.
In the argument (lower in the thread) I was questioning whether you could skip the psych model entirely (pretend people can hear the difference between 128K mp3 and real life ;P ) and see just what you'd get if you went purely with the fractal encoding- trying to diffuse any and all error in the process as evenly as possible over frequency and time.
People will swear up and down that this will be drastically worse. I'd like to measure it in comparison with normal mp3 encoders and see exactly what it is, not just run around making theories that it's going to be awful. The one thing I'm willing to guess about it is that the sound will be the opposite of BladeEnc's sound. For some people that'll be bad- but the idea of an 'anti-Blade' might really interest others.
I don't know if anybody's comfortable enough with hacking on a version of LAME or whatever that they'd be willing to try it- I am going to bounce the idea off Martin Hairer, with whom I worked to perfect the sonogram-plotting program (I needed to request better picture export capacities- he came through like a trouper and fixed everything). I think he is the one who ported LAME to his program, and he might be both able to try such experiments, and interested in seeing what they do.
At any rate I wanted to say that your idea of isolating the transformations and considering them independently _is_ truly an interesting exercise- and I hope to be able to do such experiments, and learn from them, with a bit of work and patience :)
Re:Back the truck up... (Score:1)
By that line of reasoning, we musta all been fucking morons for making mp3s several years ago, since our walkmen couldn't play them.
---
I'm not ashamed. It's the computer age, nerds are in.
They're still in, aren't they?
Re:You are wrong (Score:1)
Re:A much better overview of mp3 encoders (Score:1)
Re:So what? (Score:1)
Even the samurai
have teddy bears,
and even the teddy bears
Re:That's nice (Score:1)
As a test, try encoding the same song using two different encoders (making sure to use the same bitrate). Using the same decoder, see if you can tell the difference. You can also try downloading the MP3s from the site referenced. A quick listen to them (at the same bit rate) should show an audible difference.
The only other differencemight be in speaker set-up. A crappy computer speaker might not be able to really show the difference between two MP3's.
I use a Blade-based encoder on my Mac with Cambridge Soundworks Digitals.
Done before (again). (Score:3)
Re:So what? (Score:2)
I'll agree that perception is what matters. However, what souds great on my $48 Labtec speakers at work sounds like crap on my $500 studio headphones at home. The fact of the matter is, most people don't have $25,000 of audio equipment [belgacom.net] nor sufficiently trained ears to tell the difference. I'll readily use LAME encoded stuff from people I trust, but cringe in horror when I listen to the rapage that Xing's encoder performs to the quality of complex music.
Think of it this way: most people are arguing which color of crap tastes better. Sites like this one [belgacom.net] and the one in the article are trying to point out that you don't have to eat crap.
hymie
Re:I'd like to see results of that exercise (Score:2)
yes that's exactly it, i think it would be an interesting exercise, as i don't recall seeing any study of that as of yet. i'm sure much has been done to develop psychoacoustics in the first instance, but as that was way before mp3 actually came about, this info won't be readily available from mp3 sites (though thanks to the anonymous coward's url elsewhere in this thread!)
i think removing the psychoacoustics and simply applying the fractal transform on its own would result in a lower perceived quality-per-bitrate ratio, not much else. but it's interesting also.
to do any of these experiments, we'd need access to the source code for an mp3 encoder - are any of these available? LAME for instance? i'm sure fraunhofer's is available from less reputable websites
fross
Kexis -- GPL lossless file format. (Score:2)
Again, another useless MP3 test (Score:1)
This is useless. MP3 is perceptual coding, and the only way by now, to decide what is better is to listen and decide. If you can't hear it, why do you need to encode it? That's the idea of MP3.
Don't try to see if the encoded signal looks the same than the original in terms of spectral content, try to see if it sounds the same!
Blade is not an encoder (Score:2)
A much better overview of mp3 encoders (Score:2)
Re:MP3 for Audiophiles?? (Score:4)
Absolutely. CD quality (44.1 kHz 16 bit PCM) is total CRAP to true audiophiles. I won't be satisfied until they invent a format that will store the timing and stength of every single air molecule hitting my eardrum, precise to within the Heisenberg uncertainty principle. Uncompressed.
SHN (Score:1)
Re:Yes! (Score:1)
Ask /. articles are often a great way to get info but you have to be willing to do some reading and thinking for yourself. Often the best articles are the shortest ones -- they are just links to outside sources.
This article is way inferior to www.r3mix.net [r3mix.net]. You should go back to that old Ask /. article and figure out why you didn't pick up on that web site. The fact that you didn't come away with an answer from the first article was entirely your own responsibility. All the info you needed was there.
(OT)The vinyl vs. CD debate (Score:2)
some of my vinyl is way better
Vinyl sounds "warmer" because...
Re:That's nice (Score:1)
This test is valueless, as it does not take the human ear into account. The quality of the compression is completely a subjective thing, it will always be so. There will never, ever, be a worthwile mathematical test for lossy compression.
GoGo vs Lame vs CokaCoda? (Score:1)
Re:What about Xing (AudioCatalyst)? (Score:1)
---------
You are wrong (Score:2)
That's trivially proven to be incorrect since gzip and bzip2 compress data and yet have the outputs be the same as the inputs. In an audio context, ten minutes of a pure frequency sound be easily compressed to a small size. The only information you really need keep is the length of the tone and the frequency.
Listening to distorted audio is fatiguing (Score:1)
Re:What about... (Score:2)
The portrayal of this is inaccurate. (Score:3)
the key point here is that mp3 encoding is in fact a process of two separate transformations (both of which consist of many processes, of course), the first of these is my bone of contention as it seems less well-known than the second, which i will address first.
the "second transformation" is the one familiar to most people, the iterative fractal encoding procedure, which simply adds information to that audio frame until it a) either hits a "quality threshold" (ie is consider good enough), or b) fills up its bitrate allocation. it's similar in many ways to making a "jpeg of sound". you can get a good view of this whole process by following this link [iis.fhg.de] to a graphic of the aac encoding process on fraunhofer's website. It is the stuff inside the box at the lower left that this concerns.
however the first transformation here is the important one, this is the stuff outside and above the box in the graphic linked above. (i am not sure the graphic is detailed enough, there may be some missing, from what i remember) - this is a series of transformations to limit the amount of data the second transformation has to deal with (and hence get essentially better encoding for the same bitrate), according to the way the human ear works. our ears have "features" like having a dead area in frequencies near loud noises, which means these bits can be cut out, and other bits and pieces that i can't remember and don't have to hand ;) this is of course psychoacoustics, as other people have commented. there is a _very_ basic primer on this at the fraunhofer site here [iis.fhg.de], but it doesn't go into any technical detail.
as an aside, there used to be some fantastic and informative articles on these subjects at mp3.org back in the day (1997-1998?), may it rest in peace. does anyone have some links for where something as good on this subject is? i haven't been as in touch with the technical side of mpeg encoding as i used to be...
but anyway back on subject, this first transformation actually distorts the signal *significantly*, but only in a way that makes it easier to process, while still sounding the same (or close) to the human ear. it may be an interesting exercise to isolate this first transformation, apply it and then save without any fractal encoding, and compare that to the original signal. this transformation will cause great "visual degradation", as shown in the article, but imho this is not an accurate criteria for measuring audio quality. still interesting, and a good read, though :)
fross
Re:What about... (Score:1)
He's measuring the MP3 encoders, and Ogg Vorbis is not an MP3 encoder
Wouldn't it be interesting to make one of these tests comparing many different encoding techniques (MP3, Ogg Vorbis, VQF...)? I saw once one that made a comparisson between MP3 and VQF (I think it was posted to Slashdot, maybe) and it was pretty interesting.
I tried the Ogg Vorbis encoder the other day for the sake of trying, encoding a small song (Black Sabbath's Paranoid) with both BladeEnc and the Ogg Vorbis enconder... and I can say that the high frquency responde for Ogg Vorbis was much, *much* better. The MP3 sounded noticeably different from the CD, while with the Ogg Vorbis file such a difference was not so trivial to hear. (Ok, I know that it is a well known fact that MP3 sucks at higher frequencies, but, it was an example.)
Anyway, a deep comparisson showing the pros and cons of each encoding technique would be very interesting. This won't change the fact that it will be very very difficult to convince people that there may be better alternatives to MP3, but...
--
Marcelo Vanzin
Re:Listening to distorted audio is fatiguing (Score:1)
People whose jobs rely on sound perception or who listen to a lot of sound (audiophiles maybe, hmm?) know that regular mp3s you download from Napster (128kb/s) are of average sound quality. A good encoder and a higher bitrate (160+) will do wonders.
Something very similar has already been done (Score:1)
Re:Quaint, but flawed (Score:2)
That said- the sonograms are greyscale plots of deviation from the original signal. They are inevitably offset in time by the encoding process- I aligned them using those ugly transients in the center. There are two little charts under each. The second is the pre-echo and over-ring. The _first_ is precisely the opposite- deviation from the sound that was the 'same', with the weighting of the little chart (a RELATIVE measurement) emphasising the content of the wave rather than areas that are supposed to be free of additional frequency content.
I don't think it's possible for an encoder to mangle the audio quality and have a pristine 'sonogram' as differenced with the source material. A pristine sonogram would be uniformly BLACK when this was done- none of the encoders remotely approached this. Any mangling, no matter what sort, will show up as a lighter-than-black area on the differenced image. I'm very much a high end audio dweeb at heart, but I don't believe there can be mangled audio quality without the Fourier content changing, and thus the sonogram showing big gray or white blobs.
I wholeheartedly agree that quantitative analysis of perceptual audio coding is not easy! :)
Re:MP3 for Audiophiles?? (Score:2)
Um, no, you can't. (Score:2)
The ideal result from the process (totally unaltered waveform information) would be an entirely _black_ 'sonogram' at the end of the process. That's not going to happen. Since there are going to be deviations, it's down to the psychoacoustic model- and the pictures and charts are going to show what the encoder chose to throw away, on a larger scale.
You can argue that the encoder throws away stuff that can't be heard, therefore measuring _that_ is meaningless. This equates to arguing that the result is indistinguishable from the source audio. I disagree, and feel that all mp3s are audibly degraded from the source audio- which is itself degraded, being typically 16 bit 44.1K digital audio :)
I'm trying to measure what the encoder's failing to do. The project was meant to answer my own questions, and has done so.
Re:Visual analysis of MP3 is nonsense (Score:2)
Personally, I'm with LAME for my sonic requirements, although the only mp3s of my music out there (so far) are Blade, done many months ago before I did this research. But the point is not that there is a 'winner'- the point is that the differing sonic characteristics of these encoders CAN BE QUANTIFIED. Perhaps not measured outright (my charts etc. are _relative_ to each other), but these encoders take significantly different approaches to discarding information, and that applies directly to your choice of encoder for recording music, and translates to a completely predictable sonic characteristic of the encoder on ANY music, no matter what.
I put all sorts of music through Blade when I was on mp3.com with only Blade for a free encoder- no matter what I did, the result was always identifiably BladeEnc, with the smooth extended frequency response and absolutely terrible transient impact. For some pieces, this was suitable- for some it was grossly unsuitable. But the sonic characteristics were consistent- and correlate with what I learned about the encoder in this 'torture test'.
why mp3 is ancient history. (Score:1)
Why don't we shift to a more agressive compression-method for todays systems then?
Why not grab the wav files and use the highend "100x" compression methods "you read about, but never see irl"?
(Those articles allways claim: "current systems are too slow for this (fractal method)", but they never mention anything useable.)
Current top-of-the-bill machines should be capable of playing the raw cd-grabs realtime from a highly compressed file, in the process possibly getting near-DOS system loads (who cares), without loosing any detail of the original track and making mp3 sound like the inbreed godzilla version of this pure little salamander.
Re:A much better overview of mp3 encoders (Score:2)
LAME? (Score:1)
LAME and BladeEnc produce terrible sound quality (or you have to use ridiculously high bitrates; no wonder Napster's full of huge 160 kbps files) and Fraunhofer is the only one that I'd call even adequate at 128 kbps. It's the sound not some friggin' visual graph of the music that's ultimately important.
Ogg Vorbis [vorbis.com], on the other hand is superb. It's not only free of patents and also GPLd!
What about... (Score:2)
Not the first. (Score:2)
--
Re:No such thing. (Score:1)
afraid someone beat you to it a while back... (Score:1)
Anyhow, good thought nonetheless.
Re:MP3 for Audiophiles?? (Score:1)
Even then, you'd need to ensure that the rest of the audio reproduction path was the same: a CD played on crappy speakers will almost always sound worse than a high-quality analog setup with top-notch speakers.
Finally, keep in mind that these kinds of do-it-yourself experiments are notoriously lax at controlling for confirmation bias [skepdic.com]. This is particularly troublesome when your goal is to measure something as subjective as audio perception.
An alternate analysis (Score:1)
-s
Re:you are missing the point (Score:1)
That's nice (Score:2)
__________________________________________________ ___
Re:I'd like to see results of that exercise (Score:2)
A lot of the pre-echo that's showing up as resonant peaks could be attributed to this type of equalisation. If that is the case, applying the fractal transform alone would result in noticably coarser frequency information but, at the same time, a much cleaner time domain with pre-echo and over-ring much more diffuse and unhearable. It might be percieved as extremely dynamic but somewhat colored sound with a great deal of openness and energy but compromised tonality. It might be terrific for certain types of electronic music, drum machines and such things.
I'm working on being able to try this experiment. If I can do this, I can also experiment with different types of filtering (realistically, I'd be working with a programmer who would know how to do this but might not have thought to try some of the things I'll suggest).
If anybody tries this sort of thing, don't test it on stuff that would obviously suck! It would be pointless trying it on classical, or easy listening. On the other hand, gabba house music and really harsh techno, or brutally distorted heavy metal... I know I've got stuff that I'd like to have encoded in a way that pushes impact at all costs and brings out the rawness of the sound at the expense of the detail and clarity. That should be possible, don't know if axing the psychoacoustic model would do it- depends on how much it compromises the original sound. Not all filters produce such obvious artifacts- just the ones with really sharp slopes such as the top lowpass filter in Fraunhofer.
Finally! (Score:2)
I'd really like to see something like this with Ogg Vorbis once it matures. Or now even, because it seems to be a bit better already, though it's hard to tell on my laptop speakers.
Re:Kexis -- GPL lossless file format. (Score:2)
can simply losslessly convert from the old format to the new one.
Yeek; that's fine until you have several gigabytes to convert each time the format changes.
One of the best things the Minidisc inventors and the MP3 inventors did was to keep the decoding algorithm static, while allowing the encoding algorithm to improve as technology improved.
--
Re:LAME? (Score:1)
I'm not sure how good such an algoritm would be, but there are trade-offs.
You've got to be a bit careful doing a time-based FFT on your audio, because that means you've got to unencode the entire file before you can play any meaningful segment. You can't stream it through: You've got to pay all your computing expenses up front instead of spreading it out through the entire wave file.
This might be okay for a scheme that doesn't take too much computing power, but if you want to incorporate all the splefty psycoacoustic models and the other stuff that's been flying around this discussion, a piece of software using such a scheme is going to take a hit.
Are you willing to wait for five seconds after selecting a song to hear it play? How about thirty seconds or a minute?
-BMagneton
Lame encoder (Score:2)
Yes! (Score:1)
Ugh... Everybody thought that whatever they were using was the best thing under the sun, no research supporting their claims, and in the several hundred comments, not even a hint of some general concensus.
Finally, something that'll allow me to choose based on fact, something that'll allow me to make an *informed* decision. Thank you.
---
Classical Music (Score:1)
I'm glad to see I've been using the right encoder (BladeEnc) for all my classical music. I can't remember why I started using it (I don't think I've used anything else), but now I see that it beats the others as far as tonality goes. Classical music is all about the right pitches (I even have perfect pitch), so perhaps that's why all those bad sounding classical mp3's off napster sound so bad (or they were ripped off records...).
Re:Visual analysis of MP3 is nonsense (Score:2)
Ah, but nowhere does this article try to disprove that, does it? The whole point is that certain codecs does a better, more intelligent job of discarding information, and that is what the author set out to prove.- ---------
-----------------------------------------
Re:What about... (Score:3)
--
Re:So what? (Score:2)
Re:LAME? (Score:3)
I'm taking a course currently on audio and image compression, and his article annoys me greatly. He uses ambiguous terminology and often the wrong terminology (for example, calling things "wavelets" that aren't actually wavelets). He describes things which can't be seen clearly in the graphs and would much better be viewed with a different display format. Etc.
I'm still wondering if some of my compression ideas will work... I plan to test them out before too long: grouping some of the generally weak high-frequency signals together since the human ear is less sensitive to high frequency pitch variation (we're sensitive to frequency on a logarithmic scale - an octave is a doubling of frequency); and, instead of doing block transforms on the music, generate a 2d image of the signal (graph: frequency vs. time), compress the frequency axis as you normally would, and instead of saving the time axis as a series of blocks of discrete frequencies, actually compress it greatly with a fft - doing this, you should be able to save space on recurring themes in songs (such as a chorus, a regular beat, etc). Voice may introduce complications, though, and I may end up having to do some kind of combination between the two (such as, compressing the difference between the original and final signal as a low quality block transform and saving it with the compressed signal). Two ideas of mine I plan to test when this incredible work load from my senior year stops bearing down on me
- Rei
Those tests are Worthless. (Score:2)
It is quite possible to have a singal that very much resembles the original wave graph, and yet sounds horrible to the ear. It is also equally possible to have a signal who's graph doesn't resemble the original very much, and yet has a much higher 'percieved' quality.
Just remember: The first rule in every single BEGINNERS guide to sound is to "Trust your Ears," and that is the only way to tell a good codec from a bad one.
-----
Very cool! (Score:2)
In fact I think I have seen this before and r3mix actually affected my approach to my encoder analysis. Definite kudos to r3mix, and I entirely agree with many of this site's decisions and approaches- interestingly they reach precisely the same conclusion as I did, that LAME 256 was the ideal archival encoder and LAME VBR was the best one for smaller file sizes- except that r3mix has added the recommendation that joint stereo be used in the latter case! (this would really hurt the relative comparison with higher bit rate stereo encoders with my mono test signal, but I think I will take the advice and try that for my own mp3s...)
r3mix also chooses to use _relative_ graphs rather than attempting to give absolute measurements, something I heartily approve of.
Now, here's the thing- r3mix's results are sometimes a subset and sometimes comparable to mine, just depicted in a different way. The primary measurement of a frequency sweep produces different-colored graphs- if you take the horizontal axis and express the vertical deviation of each graph, from an ideal line of flat reproduction at the top, as a brightness value of a single pixel, you'd get something akin to a single line on one of my 'sonograms'. The test with the 'applaud' signal is an example too- if you subtracted the source from the results you'd end up with distortion levels very similar to my differenced sonograms.
More interesting to me is the fact that my sonograms show an _intermediate_ step- several r3mix tests are the averaged responses of an encoder over time. That is exactly what my 'charts' are- they are sums of all the deviation and distortion over the entire length of a sonogram, over a range of frequencies.
I'm almost certain I'd seen r3mix before doing my own analyses- I think it's very likely that this site significantly helped me define the processes I used for my own stuff. I heartily recommend checking it out- this is good work, I totally endorse it, in fact I'm going to put a link to it on my own encoder page right now :) *put* there!
Re:LAME? (Score:2)
I think the guy's hearing with his eyes, or using a totally different set of music than what I listen to.
If you wanna hear how dog-fuckingly-shit Blade is, encode the first 10 seconds of New Order's "Blue Monday" (a basic drum machine emitting a sound common to much new-wave, dance, and industrial from around 1980 to the present day) at 128/160/192 using Blade, Fraun, and LAME.
Blade will be unlistenable at 128, shit at 160, and you may hear artifacts at 192 if you know what to listen for. LAME and Fraun sound sweet, even at 128.
Similar results can be achieved with a heavy guitar track, e.g. Def Leppard or other 80's "hair metal" bands.
I don't have data on string quartets - but for non-classical music, Blade blows steaming piles of donkey dick.
Re:An alternate analysis (Score:2)
_Definitely_ an interesting site. Also, referring to the listening tests: "The Fraunhofer encoder produced a surprisingly harsh sounding attack on the guitar; it remained quick and sharp, but was artificially crisp and accentuated." That's precisely what I was trying to say, couldn't have put it better myself. It turns out Ars _likes_ that. I do not. But if you do- clearly, you're going to like Fraunhofer. It's not about picking a winner, it's like picking a musical instrument...
cuecat (Score:3)
Well, after calibrating my cat on a couple of Pop-Tarts boxes, I tried several scans on the diagrams on the web page... nothing! I can therefore conclusively answer this question with a big, fat NO.
-----
Visual analysis of MP3 is nonsense (Score:3)
Conventional Wisdom (Score:2)
I don't know about his rigor, but the guy's alright by me.
Who knows where the time goes?
So what? (Score:4)
Also, while using the 32 kbps bitrate amplifies the effects of perceptual quantization, so it's easy to see them, the problem is that not all the encoders where meant to work at this bitrate.
Think about it, when standard institutes want to evaluate audio/speech codecs, they don't calculate sonograms like this, they make subjective tests. They make a bunch of listeners hear the result of many encoders on *many* audio files. That's right you need many files to evaluate a codec. Some will perform better for certain musical instruments, some will perform better with or without background noise, echo,
For all these reasons, I do NOT consider this analysis rigorous at all!
Re:So what? (Score:3)
Yes, the guy's sonogram is more *precise* but it is still irrelevant. I could write an encoder that gives a much better result when evaluated with this "precise" sonogram, but yet will sound like crap.
This is the point of perceptual encoding. The goal is not to produce the best result in terms of signal-to-noise ratio or spectral distortion, but to cause the encoder "errors" where the "non-precise" ear won't hear it. And if you don't hear it, you don't care, even if your oscilloscope of spectral analyser tells you there's an error.
The most critical part of a perceptual encoder is the "psycho-acoustic model", which tries to model as best as it can the sensitivity of the ear at a given frequency, given the rest of the spectrum. This is not an easy task, and you have to make lots of approximations. Given two encoders that produce the same quantitive result (SNR,
MP3 for Audiophiles?? (Score:2)
Second, I want to challenge some of the assumptions and declarations that this experimenter made. The experiments placed on these encoders are mostly "torture tests" that one would never encounter in real situations... And by using this series of torture tests he tells people which encoders are best for encoding mp3's. Does anyone see this reasoning as flawed? He's subjecting encoders to situations that NONE of them have been designed for, and proclaiming that this has something to do with reality. I see little correlation... How often do you hear pure sine sweeps in any song?
I found the previous mp3 performance analysis posted on Slashdot to be much more informative. It put the encoders up on real world performance, and rates them accordingly.
The guys who wrote the encoders realized that some things just wouldn't happen in normal music, such as these torture tests, so they wrote "shortcuts" that ignored these conditions, and resulted in a higher compression rate! How dare he rate encoders on something that the programmers all deliberately IGNORED.
My friends, trust no statistics that you did not falsify.
Re:In the final analysis (Score:2)
Re:Quaint, but flawed (Score:2)
I'm sorry, from what you're saying, I just don't think you really understand what perceptual encoders are. First, if you have a 10:1 compression ratio, your sonogram cannot be all black (that would be lossless). Now, writing an encoder for which everything is grey (instead of black and white as the sonograms you found) is very easy to do, but it will sound like sh*t.
Very simple experiment: take a signal and add white noise so you get a 20 dB SNR. It'll sound _very_ noisy. Now, while preserving the noise energy, shape that noise to look like the signal (of course, still 20 dB lower). The audio you'll hear will sound quite OK (though not perfect) and much better that with the white noise. You have just used a (quite simple) psycho acoustic model.
Re:Um, no, you can't. (Score:2)
Quite easy... strip out the psycho-acoustic model from a good MP3 encoder (like LAME) and you get a crappy MP3 encoder that performs very well in you sonogram test.
Re:Caveat Lector (Score:2)
Re:What about Xing (AudioCatalyst)? (Score:3)
Well, actually, there is a reason: [belgacom.net] the Xing encoder blows chunks. Sure, it's fast, but the sound quality sucks. If all you're encoding is Teeny Bopper of the Week music, then you're not missing out on anything. If you're encoding stuff that's a lot more complex, you're better off with soemthing that doesn't sacrifice quality for speed..
hymie
Re:What about... (Score:5)
He's measuring the MP3 encoders, and Ogg Vorbis is not an MP3 encoder, but an Ogg Vorbis (duh!) encoder, it doesn't use exactly the same encoding scheme, though it is still a perceptual encoder (based on time-frequency masking).
A similar, if not better comparison (Score:3)
Often wrong but never in doubt.
I am Jack9.
Re:Quaint, but flawed (Score:2)
For this reason, writing an encoder for which everything is grey (given the techniques I've been using) is far from very easy to do- and the sound of the file that would produce this result would have to be 50% perfect uncolored reproduction of the wave, and 50% pink noise. That's basically as tough to do as 100% perfect uncolored reproduction of the wave, and it'll sound bad because of the loud pink noise, but I don't think it would sound like you are imagining it to sound.
The point about noise and psychoacoustic model is well taken- I'm not claiming that my testing is illustrating psychoacoustic model suitability. If you think about it you can see that's not testable- it's going to be different for every song, and every listener. Some people can't hear over 12K- nuke it! Some people are acutely sensitive to peaks at around 3K- for instance, someone with tinnitus who's subject to the phenomenon of _recruitment_ will find a resonance there to be painfully unpleasant.
I can't possibly test for that and am not trying. I can, however, work out where the errors are, where artifacts are being produced in the frequency band, and what types of resonance are present, and that information can be used by any person who knows what their psychoacoustic model will accept. For instance: if you like Xing, you'll probably like Fraunhofer at high bit rate still better. If you run screaming from Xing and hate all mp3 encoders, you might need to go with Blade assuming you listen to smooth music like classical. If you can't stand Blade at all, Fraunhofer might be right up your alley. These are quantifiable observations based on driving all these encoders completely beyond their ability to cope, and watching where they break down.