17 September 2008

Waveform Plots Considered Harmful

(Revision 2.)

One of the most commonly used audio measurement tools used today is the waveform plot. This is the graph of the audio signal versus time. It directly relates the amplitude of the signal to the various parts of the musical piece, and seems very easily interpreted. It is also extremely sensitive to changes in mastering, and so is the tool of choice used for illustrating "the loudness war". From sound editor screenshots to animations to elaborate, egg-in-a-frying-pan-style "this is your music on drugs!" Youtube videos, it has become something of a mascot for hypercompression.

Unfortunately, it can also be highly misleading. In general, waveform plots cannot be solely trusted as an evaluator of sound quality. Use caution when using them, because they may lie to you.

An Example: Vinyl waveforms can look better than they really sound
The most common point made with a waveform plot is to prove or disprove the existence of clipping, or more generally, hard limiting. Clipping refers to a music signal that is clamped to a fixed magnitude if that magnitude is exceeded - the peaks of the music are "clipped" off. Hard limiting is roughly the same concept - clipping is a form of hard limiting - but allows for more flexibility in how the fixed magnitude is approached. Typically, a hard limiter will allow for a softer transition between the "undisturbed" region of the music and the "limited" region.
The outer shape of a waveform plot shows the peak level of the music over time. If that peak is at exactly the same level across an entire song, down to a fraction of a dB, it is believe to be good evidence that a limiter was applied to the music; the flatter the peaks are, the more limited the music is, and thus the more distorted the music is. Conversely, the more ragged the peaks are, the less distorted the music is.

Rather than describe an elaborate explanation as to why this is not true, it is far easier to just find a counterexample. Here's a waveform plot of a song from the album Mirrored, by Battles, in both CD and LP versions, loudness-equalized.

(Battle's Leyendecker. White: CD version amplified -7.44db. Red: Vinyl version. Recorded with Technics SL-1200, AT-OC9MLII, flat transferred with Yamaha GO46 at 44.1khz. 2M samples of left channel at +4M samples from start of track. Time axis in seconds.)

"Excellent! The vinyl really is better!" you think. Visually, the CD is quite obviously "hypercompressed", and the vinyl is clearly not - in fact, the vinyl peaks around 6.6db higher than the CD does, once the two tracks are loudness-equalized. Based on this sort of plot, 99% of the music listeners who care about dynamic range in their music would believe the mastering on the vinyl release is superior.
But 99% of music listeners are dead wrong.
(Same plot; zoomed in to 27.43-27.44s)

Vinyl is full of linear distortions and noise sources that do not exist on CD, and so the waveforms rarely if ever match at short time scales, even if the mastering is the same. Such is the case here. But it should be quite obvious that the clipping that exists on the CD also exists on the LP. The "flat" regions on the CD very closely match the "flat" regions on the LP. [1]

This is fairly incontrovertible proof that the vinyl release of Mirrored originates from masters utilizing a similar amount of dynamic range compression as exists on the CD. And yet, the first waveform plot appeared to clearly indicate a higher dynamic range. In the case of vinyl specifically, this conclusion can be wrong for a number of reasons:

  • The recorded vinyl waveform is not sample-aligned with the CD waveform. Intersample peaks that are not visible on the CD waveform suddenly become visible on the vinyl waveform. In exceptional cases this can artificially raise the peak level by a few dB.
  • Variations in the frequency response of the vinyl system (both in recording and playback) can artificially raise peak levels. One particularly strong culprit here is low-frequency rumble at the tonearm-cartridge resonance (around 10hz) which can easily dwarf all the other musical content on the recording. (In this comparison, the rumble was filtered out.)
  • Tracing and tracking error in the tonearm system can artificially raise peak levels.
A similar situation exists with FM radio, which is also compressed very aggressively, although waveform plots of recordings do not show an excessive amount of limiting. But applying the preemphasis, resulting in a signal analogous to what is actually transmitted, results in a much more obviously limited waveform.

Another Example, with a CD
Metallica's Death Magnetic is being widely criticized as one of the most sonically unpleasant popular records of recent memory. And yet, its waveform plot, while ugly, is not perfectly flat:
"My Apocalypse"; CD; source.

Many albums that visually appear worse than this sound much better.

Why can waveform plots lie? Blame your ears

The human ear may be relatively insensitive to limiting, if it only happens a few times a second - for instance, if only the percussion is affected. Bob Katz has commented in Mastering Audio that in regards to peak limiting, "a rule of thumb is that short duration (a few milliseconds) transients of unprocessed digital sources can be reduced by 4 to 6 dB with little effect on the sound." Those people who would worry about 10-20 sample runs of clipping on CDs should note that 1ms = 44 CD samples; examples of hard limiting become exponentially harder to find as the number of consecutively limited samples increases. Mastering engineers use peak limiting all the time because, in all honesty, the human ear ear often lets them get away with it! In other words, clipping that may appear blindingly obvious on the waveform plot may not be audible.

On the flip side, when limiting occurs with periodic signals - like those emitted by most non-percussion instruments - it becomes a form of high-order distortion, and may become very, very audible. In the worst case, limiting of very few samples at a time (1-5) may be audible.

Furthermore, different forms of hard limiting have different effects on the sound quality. Clipping is merely one of the most dissonant forms of hard limiting. A large market exists for high-quality limiters that remove as much of the peak as possible, while maintaining as much of the sound quality as possible. And yet all of these limiters will appear more or less the same on a zoomed-out waveform plot. How can can sound quality be evaluated from such a plot when one can't use it to tell apart different forms of limiting?

When limiting is tastefully done, by a professional mastering engineer, it can negligibly distort the music while making it much more enjoyable for those of us with less expensive or desirable listening environments. In this situation, the effect of more general forms of dynamic range compression may be more objectionable, in that the reduction of dynamic range is worse than the addition of light amounts of limiting distortion.

How can I tell if one master is superior to another if waveform plots are bogus?

Unfortunately, while many alternatives exist to the waveform plot to measuring mastering quality, they all have major downfalls. None of them are trustworthy.

  • Many people use ReplayGain track gain values as an estimator for the dynamic range of the music. Most importantly, it is easily fooled by global changes in gain - so that the "fake" dynamic range of the Battles recording above would completely fake out ReplayGain. Certainly, ReplayGain values place limits on how much dynamic range may exist in the music, and it can be used to guess a lot about the mastering, but it does not, in fact, actually measure dynamic range, nor does it measure distortion. Many albums considered great-sounding from the likes of Radiohead and Gorillaz tend to Replaygain in the -10dB range, which many people would consider "hypercompressed".
  • Audio engineers use an "RMS" value that is conceptually about the same as ReplayGain, and shares its faults. However, it is an acceptable figure of merit proving that Iggy Pop is a crappy producer.
  • pfpf was more or less explicitly designed for this sort of thing, but only in the context of changes in dynamic range - not in terms of clipping, or any kind of timbral changes. This approach should be fairly good teasing out changes in dynamic range compression, but quite insensitive to limiting. Also, pfpf is still sort of half-baked, and it's not clear yet how close the long/medium/short term numbers need to be before concluding that two samples are of the same master.
  • The Sparklemeter was designed by somebody else on somewhat similar lines as pfpf, but specifically relating to measuring mastering quality. See that thread for my comments.
  • Some people are content with using the "Peak Level" result from CD audio rippers such as EAC as an indicator of audio quality: the idea being that CDs with a peak level lower than 99% run a much lower risk of hard limiting than those at that peak level or above. This is wrong in so many ways it's hard to count. It is, quite simply, the least accurate method possible of evaluating mastering quality.

Perhaps the most accurate method of evaluating mastering quality, though, is the simplest: Asking the mastering engineer. They, of course, would know best.

Another method is to simply use your ears. Know what clipping sounds like, and what other forms of hard limiting sound like, etc., but also evaluate your own music collection's sound quality, and investigate the mastering techniques used with its songs.

What is to be done?

  • Avoid the use of zoomed-out waveform plots to prove points about sound quality. They convey less information than you might think, and they are easy to misinterpret.
  • Do not trust the sound quality of a record simply because it looks good on a waveform plot. The ear is not an oscilloscope. Waveform plots are an informational tool, but the only relate to the perception of hearing in an abstract sense. There are plenty of ways that a good-looking waveform plot can sound terrible!
  • Avoid using any numeric measurement for evaluating audio quality, unless you understand exactly its exact meaning for audio perception. Most numeric tools today are fairly flawed. They can be used to make meaningful statements about mastering quality in specific circumstances, but they can also make lots of meaningless or flawed statements.
  • Do not buy vinyl on mastering merits alone, unless you have information coming from the vinyl mastering engineer attesting to its superiority over other release. As a rule of thumb, the cost of a vinyl remaster is high enough that those labels that choose to remaster will make it quite clear to potential customers that they did so, and the labels that didn't, won't. But of course, just because a special mastering job was done for a vinyl release doesn't mean that the mastering changes were significant. Caveat emptor.
  • Enjoy music purely on its subjective merits, but pay attention to your perceptions and look for ways to quantify it. There's too much good music out there today to ignore because the mastering is crap. And despite the shrill cries of the hi-fi set, the sound quality of music today is still considerably better than what it was for (most of) the last 60 years. And if all the kids today have no problem listening to the music, who's to say that us old farts can't listen to it too?

    Nevertheless, reduced dynamic range and increased distortion in modern mastering are real issues. Solving them requires subjective and objective evaluations of sound quality, rigorous in their execution, to convince the audio world that this is not mere idle talk of ignorable audiophiles.
For further reading: the HydrogenAudio forums, including some recent topics on the subject, and a wiki entry on vinyl mastering (semi-maintained by me). Related thread.

[1] The regions of clipping on the LP version of "Leyendecker" seem to imply that it is slightly less limited than the CD version. However, the difference is so slight that it is not believed to be audibly superior - and besides, the example still breaks the myth that vinyl must necessarily not be hypercompressed.

17 September 2008: Incorporated feedback. Previous revision. Thanks to David "2BDecided" Robinson, Lyx, krabapple, and others on HA, for their feedback.


bobkatz said...

If the waveform plot of the vinyl actually represents a transfer from a less compressed master which did not go through the heavy processing that created the CD, then it could possibly sound better.

Whenever I make dual masters for CD and for vinyl, I produce the vinyl master prior to any of the peak limiting or any additional loudness makers other than the ones there for esthetic purposes.

The measurement that should work is
Peak to Loudness ratio, with the loudness using a standard measure like the new BS.1770, which is close enough to perceived loudness to be useful. Adapt this to a K-System-style meter and you're in like flynt.

Richard Tollerton said...

Thanks as usual for your excellent comments, Bob. It is extremely informative to know that you treat your vinyl masters nicely.

Peak to loudness measurement, eh? I didn't catch that the first time I read it. You mean a time-varying adaptation like 50ms blocking with BS.1770? That actually makes a lot of sense.

Not sure I want to be in like Larry Flynt though. ;)

tung said...

I've been posting a lot of links to this very blog post lately. I think you hit the nail on the head with this one!

Those of us who are fighting the loudness war need as much info as possible, and I think many are misinformed about the vinyl versions of recent rock albums.