To begin at the beginning, MPEG stands for Moving Picture Experts Group. That body establishes standards for digital video and audio. We are concerned here with the standards for the audio layer in the MPEG1 format. MPEG2 is in use today, but is not related to the MP3 files; they are MPEG1 Layer III audio. Simplifying the situation, different layers impose different loads on the decoding software - the program which converts the MPx file to uncompressed Pulse Code Modulation (PCM) audio to drive the reproducer. Layer III - MP3 - is consistent with modern low-cost dedicated packages and with Pentium-class CPU's. Layer 2 - MP2 - is less demanding; its performance is not adequate for it to be considered quality reproduction. In theory and usually in practice, system and software reproducing a given Layer will handle any lower Layer.
Simplifying again, the MPEG standard for a Layer specifies the playback of a file encoded for that Layer and leaves encoding to the developer. MPEG audio employs perceptual encoding and is lossy. That is, it compresses the data stream by throwing away information which the encoding algorithm 'believes' will affect the listener least. The decode side of the codec (code/decode algorithm) is usually pretty simply implemented from the standard; there are some differences which will be discussed below, but in general the playback algorithm is not an issue.
The encoder is a different matter altogether. There are three parameters input to any encoder to control the way it process the file: channels, sample rate and bitrate. Channels simply means monaural or stereo; in general, the encoder will provide 'stereo' channels (identical content) from a monaural original, combine two original channels into one, or leave the count unchanged. Sample rate is simply the number of samples of the data per second. Bitrate dictates the size of the encoded file. Those factors are interrelated.
Rates in common use for digital processing include 44100, 48000 and 96000 samples per second. Even though it violates convention, those are usually shortened to 44.1, 48 and 96 Ksps - which will be done here. (The standard for CD encoding is published in a book with a red cover; uncompressed, 44.1 Ksps, 16-bit, stereo signals are conveniently referred to as 'redbook' in its honor.) The more serious problem, representing Ksps as KHz, will not be accepted here. As is well known, the maximum frequency which can be encoded digitally is one half the sample rate, providing an incentive to take more samples to preserve high frequencies. In the octave below that limiting frequency, phase shift can substantially alter waveforms even though amplitude loss may be acceptable. For various reasons, it may be desirable to have the sample rate in an MP3 file different from that of the original. Many MP3 encoders will resample for the user. Some are limited to commensurate rates such as downsampling 96 Ksps to 48. Others will handle incommensurate rates and can accomplish the more difficult task of resampling 48 Ksps to 44.1. The quality of resampling may be significant here. Simply put, samples are created or destroyed in resampling and that may be done with slow, careful algorithms or quick, simple ones. Needless to say, the effects are audible if one is willing to listen.
Bitrate dictates the size of the finished MP3 file per minute of audio. CD-quality audio is defined in the standard and requires about 175 Kilobytes per second (175 KBps). MP3 bitrates are specified in Kilobits per second (Kbps - note the lower-case 'b'). The rate most often used for CD audio is 128 Kbps - which corresponds to 16 KBps or about one eleventh of the redbook rate. Other rates are frequently used, usually 64 or 256 Kbps though lower compression (higher bitrates), but 128 Kbps is usually assumed. As you would expect, throwing away 80-95% of the original information will impact the sound of the MP3. There are many encoding algorithms on the market. Some, such as BLADE, are optimized for lower compression than usual and will be neglected. Of the others, both subjective and objective tests indicate that the patented algorithm developed by the Fraunhofer Institute is the most pleasing. That is unfortunate, since it is by far the most costly to license - $300 versus $15 or less.
Quality is a lot easier to hear than to describe. Absent an objective measure, there are two parameters of primary interest: preservation of audio spectrum and avoiding artifacts. In practice, they go together and codecs which maintain spectrum best tend to introduce the fewest audio disturbances. The most common form of artifact is a metallic tone which includes some narrow resonances; it's one of those things you'll know when you hear it but it defies description. It appears to originate when signals near the maximum frequency are encoded differently from those near them in pitch. The Fraunhofer codec gives very nearly flat response up to a frequency at which it cuts off abruptly. Another codec may nominally extend the response another half octave, but in the process substantially distort the response curve. Figure 1 shows the spectrum of a 22.05 Ksps monaural original file. Figure 2 shows the same file after encoding with the Fraunhofer high-quality option and decoding back to PCM. Figure 3 is the same with another algorithm. Note that the spectra are quite similar up to about 1.5 octaves before the cutoff at 11.025 KHz. The Fraunhofer remains nearly flat until it falls off a cliff, making no attempt to encode the last half-octave. Another algorithm provides some output in that half-octave, but is 12 db down at 8 KHz where the Fraunhofer is within about 1 db up to its cutoff. In short, the Fraunhofer provides good encoding within its frequency range where the alternative generates artifacts amid some signal on significant overtones. Needless to say, the perception of the sound of the samples is very different even to an untrained listener.
If artifacts are audible in the encoded file, the only solution with a given algorithm is to limit the frequency response of the original. Needless to say, that limiting must be more severe, perhaps by a full octave, with an inferior encoder. Another solution is to increase the bitrate or, in the case of variable bitrate, to increase the maximum bitrate, but either of those would increase the file size.
There are listeners who find even an inferior codec's 128 Kbps file identical to a redbook source. With any audio acuity at all, a listener should be able to recognize loss in the Fraunhofer at that rate in A/B comparison with the source; she will be significantly disturbed by the artifacts of a poor encoder, though she may be unable to describe the faults in the sound.
One point in all of this is that evaluating an MP3 player as a high-fidelity device is not as simple as evaluating CD players. It is necessary to know not only the sound of the source - often, an audio CD - but also the encoding algorithm that was used. Many of us are familiar with the influence of engineering when transferring a master analogue tape to CD; the same original can sound quite different depending on the equipment used and the engineer's judgement. Fair comparison of two reproducers requires that they be loaded with the same MP3 file, not simply compared using whatever file it may have against the original. There are MP3 playback products which 'compensate' for the faults of the encoder they are intended to follow. That is a kind of inversion of the deservedly deprecated Dynagroove process in which the original was distorted to correct for faulty reproduction. As with any high-fidelity product - video, audio or otherwise - the ideal is to maximize quality of each element in the chain, not to correct in one for expected faults in another.
E-mail me at cdrecording@mrichter.com
Return to Mike's home page