17 December 2008

Geek-out time: a CD4 record

CD4 records are sufficiently uncommon (or sufficiently expensive) that I was quite pleased when I found this one for $4. (In fact, this specific pressing is not currently in eBay's auction records, or GEMM's. It's mentioned once on popsike.) Ironically, I don't have a CD4 decoder (yet), I don't own a surround sound speaker system, and I'm not a fan of the Raspberries. I haven't even listened to the music yet. I just wanted to see a used LP have a frequency response go out to 45khz.

One occasionally still sees the argument that "anything on vinyl above 30khz gets wiped out after a few plays". While that may certainly be true for a large number of turntables in use today, it certainly hasn't happened here - this record very clearly has been played before.

13 November 2008

Think classical music mastering does not employ clipping? Think again

This is a waveform plot from the opening two seconds of the third movement of Bela Bartók's second piano concerto, conducted by Pierre Boulez, recorded in this decade, and released by Deutsche Grammophon in 2005. There is an opening drum hit, shown here, that is fairly unequivocally clipped near 0dbFS for 121 samples (2.7 ms). This is a large length of clipping by most standards.

In many music listener's minds this would be considered a gross mastering error. But the sound quality notwithstanding is spectacular, and the long-term loudness plot shows no signs of hypercompression (or indeed, perhaps any compression): the peaks of the music are merely being sawed off. However, I have no independent proof showing that the recording has not been compressed in other ways.

A very similar style of mastering can also be found on two other recent Boulez discs: one of Varèse, and another with his own ..explosante-fixe... and Notations (the latter played by Aimard). These recordings consistently brickwall-clip at or near 0dbFS, for between 4-100 samples, but otherwise are mastered clean as a whistle. But another recent DG CD of mine (Aimard's performance of Bach's Art of the Fugue) has no clipping whatsoever.

It's important to note that, as I've mentioned in the past, drum transients are perhaps the kind of signal most forgiving of clipping/limiting. As they are already very spread out in the frequency domain, the added smearing from this kind of clipping could be rather effectively masked. Listening to this music, there is no "snoking gun" kind of artifact that lets me say with certainty that this distortion is audible.

Nevertheless, that DG would choose to make this sort of compromise in their modern releases is troubling. I used to consider Boulez's recording of Amériques one of my reference tracks for critical listening; I'm considering searching for alternatives now.

Original thread on HydrogenAudio.

Update 19 May 2009: Steve Hoffman notes that the recent DG CD of Boulez's Maher 8 is also clipped.

02 October 2008

Some clarifications on "Lars's Paradox"

From what I have heard of the CD master of Death Magnetic versus the GH3 mix, the GH3 mix is quite superior and the CD master is inferior even to other contemporary records. In that I'm pretty my in agreement with most other commentary.

That said, I certainly speak for nobody besides myself when I say that. My musical tastes, are, shall we say, immodest. My idea of a song with real dynamic range is the recent Boulez/CSO recording of Varèse's Amériques, or perhaps the Autechre and Hafler Trio collaboration æo³/³hæ. Or Mahler 3. It's to a certain degree ironic that we are debating relatively minuscule differences in dynamic range compared to stuff like that.

While many people apparently agree with my impressions of DM, many people don't. Look at the roadrunner thread below, or the comments for this YouTube video, or any number of dreck threads on metallicabb.com, and you'll find at least a few people who either do not mind the CD or believe it is better than the GH3 mix. It's been pointed out that some of their thinking is rather stupid - "this is what metal is supposed to sound like! you aren't a true metal fan if you don't like the CD mix! Roll over and die if you don't like loud music, pops!" - etc. But spaced between that are people who are saying, well, the CD sounds just fine. And a lot of people seem to have a problem with that - they do not understand why people would prefer music that is of far higher distortion than necessary.

I really can buy the argument that some people - artists and listeners alike - really are looking for this kind of distortion. Listen to some samples of Times New Viking's Rip It Off , and then marvel at the adoration it has received in some circles of the music community (and put the gun away - I know you hate Pitchfork, but hear me out here).
The layer of fuzz works like a security blanket-- a way of creating not just a distinctive sound, but of putting up an awning of safety over them and their listeners. Only the slightest bit of straining brings you to the pop virtues of these songs, on the band's own terms. Sure, it's an affectation, but its just another way of using the studio as an instrument in a way that makes these songs more intimate by design-- for better or worse, you can't sell a Volkswagen with a Times New Viking song. If cleaner production means truckloads of new bands who can summon their influences with little effort, and even less enthusiasm or creativity, then I'll stick with my tinnitus, thanks.
It's like I've said elsewhere: where do you draw the line on artistic intent? Where do you draw the line on unlistenability? Would people seriously believe that Venereology would sound "better" if Merzbow didn't use as much hard limiting? What kind of limits are acceptable on some genres of music when music in other genres would completely obliterate them?

Second-guessing the musician is easy, and occasionally even accurate, but it can also be cheap and unpersuasive. If mainstream production/mastering practices are being called into question, yet the musicians's music itself is not, the musicians will always set the terms of the debate. Examples abound of artists who seemingly add distortion for distortion's sake - Billy Corgan and Igga Pop are two examples of people who helped produce records from some time ago that were reviled for their lack of dynamics and ear-piercing distortion (Zwan's Mary Star of the Sea, in Corgan's case).

If you're not prepared to put your money where your mouth is, and stop buying music from the bands you love with the mastering you hate, then you will need to empathize with those artists, and those listeners who agree with their decisions, and truly understand what's going on in their heads comapred to yours. Only then will you be able to reconcile their musical tastes with your own, and hopefully, support a solution that makes everybody happy. Otherwise, Metallica and their business associates will continue to sleep quite soundly over all of this.

30 September 2008

Lars's Paradox, or, Everything You Know Is Wrong

"Listen, there's nothing up with the audio quality. It's 2008, and that's how we make records... Of course, I've heard that there are a few people complaining. But I've been listening to it the last couple of days in my car, and it sounds fuckin' smokin'."
Look, people. The dude isn't fucking deaf. Rick Rubin is also, contrary to popular opinion, not deaf (he owns a rather nice hifi, in fact). Metallica as a band is not deaf. Vlado Meller is not deaf. Millions of music listeners are not deaf. And now quite a few people are coming out the woodwork and saying that Death Magnetic sounds just fine, thank you very much. They too are not deaf.

To suggest otherwise, or to suggest that something is inherently wrong with the way they are listening, is merely fallacious smearing, and honestly, unintelligent. Continuing to insist that music products like Death Magnetic are not of a sufficiently high quality without further proof - especially in the face of #1 sales - is only going to continue the abject apathy that the rest of the music world seems to treat this whole issue with.

Certainly, Rick Rubin knows exactly what he's doing when he produces records like this, and he is quite certain in his belief that it is towards delivering a superior product, as his interview with Michael Fremer made abundantly clear:
Ultimately, if you listen on a car sound system or in the mainstream place where most people listen to music—cars, boomboxes sound systems you get at (chain stores), and if you “A/B” the less compressed version to the more compressed version, you pick the compressed version... Even in a good car stereo. We do shoot-outs all the time. I master with as many as five different mastering engineers mastering the same album and then we “A/B” them and it’s interesting, Vlado wins nine out of ten times, and he claims it’s not him. He’s got technology in that room that’s a 2 million dollar mastering suite that other people don’t have. All I’ll tell you is that my whole job in life is to A/B things, that’s all I do, and for some reason, I don’t know that what he’s doing is necessarily the best, but I haven’t heard anything to beat it and we try.
That the album distorts needlessly is established beyond a reasonable doubt, thanks to mastering engineer Ted Jensen's comments, and comparisons with the vinyl. I haven't bought the album, but I have listened to the free clips from Metallica's web site, and the YouTube GH3 rips, enough to know that I'd prefer the GH3 versions.

But let's have some perspective here. The truth of the matter is that this is a serious counterexample to the entire narrative of the "loudness war": that, despite diverse objective and subjective evidence that modern hypercompressed mastering styles degrade sound quality and music appreciation, the vast majority of music listeners, at all experience levels, at least continue to buy such purportedly terrible masterings, and may even prefer them to less compressed styles. I am going to call this Lars's Paradox, since Lars Ulrich, belligerent bastard that he is, has managed to wade neck-deep into the middle of this like he always tends to do. But whether due to a similar level of belligerence, or devil's-advocacy, or whatnot, I'm actually going to take his side here for a minute.

I believe any fight against hypercompressed mastering in the "loudness war" will founder until this paradox is resolved. More concretely, and extending to other issues, I am claiming the following:
  • Claims of the hypercompressed style resulting in reduced musical enjoyment are completely unproven except on personal, anecdotal, and therefore meaningless, grounds. Real studies need to be done, in real listening environments, to show that the application of hypercompression is a detriment to popular music and the popular music industry.
  • Objective evidence is inaccurate in arguments regarding mastering. Objective evidence cannot prove statements about enjoyment. Such analyses must be more explicit in their relationship between the music, the dynamic range, and the dissonant distortions if they are to be ultimately taken seriously. Waveform plots, ReplayGain, RMS, and pfpf are all highly deficient in one way or another here.
  • (Lars's Paradox) Evidence suggests that the hypercompressed style is preferred by at a large amount, and probably most, of the popular music listening population. Both audio professions and untrained listeners are making this preference. For the uncompressed styles to be taken more seriously, it must be shown concretely that this preference is based on faulty measurements, or is otherwise false in meaningful and important ways.
As long as these points stand, the argument against hypercompression will remain fundamentally flawed, and popular music will continue to be released in the hypercompressed style. Regardless of how many petitions get signed. Marginal releases like on vinyl and high-res formats obviously don't follow this logic as much, nor does classical and experimental music, etc. By and large, those are not popular genres or (yet) popular formats, and this discussion revolves largely around popular music. But there's simply no hope for popular CD/iTunes releases to follow any different mastering style as long as these issues exist with this whole argument.

(I have my own ideas, revolving mostly around psychoacoustics, for resolving the paradox, but they are as yet unfinished.)

Update, October 1:
Debate on the new JusticeForAudio.org forums.

Update, October 2.

17 September 2008

Waveform Plots Considered Harmful

(Revision 2.)

One of the most commonly used audio measurement tools used today is the waveform plot. This is the graph of the audio signal versus time. It directly relates the amplitude of the signal to the various parts of the musical piece, and seems very easily interpreted. It is also extremely sensitive to changes in mastering, and so is the tool of choice used for illustrating "the loudness war". From sound editor screenshots to animations to elaborate, egg-in-a-frying-pan-style "this is your music on drugs!" Youtube videos, it has become something of a mascot for hypercompression.

Unfortunately, it can also be highly misleading. In general, waveform plots cannot be solely trusted as an evaluator of sound quality. Use caution when using them, because they may lie to you.

An Example: Vinyl waveforms can look better than they really sound
The most common point made with a waveform plot is to prove or disprove the existence of clipping, or more generally, hard limiting. Clipping refers to a music signal that is clamped to a fixed magnitude if that magnitude is exceeded - the peaks of the music are "clipped" off. Hard limiting is roughly the same concept - clipping is a form of hard limiting - but allows for more flexibility in how the fixed magnitude is approached. Typically, a hard limiter will allow for a softer transition between the "undisturbed" region of the music and the "limited" region.
The outer shape of a waveform plot shows the peak level of the music over time. If that peak is at exactly the same level across an entire song, down to a fraction of a dB, it is believe to be good evidence that a limiter was applied to the music; the flatter the peaks are, the more limited the music is, and thus the more distorted the music is. Conversely, the more ragged the peaks are, the less distorted the music is.

Rather than describe an elaborate explanation as to why this is not true, it is far easier to just find a counterexample. Here's a waveform plot of a song from the album Mirrored, by Battles, in both CD and LP versions, loudness-equalized.

(Battle's Leyendecker. White: CD version amplified -7.44db. Red: Vinyl version. Recorded with Technics SL-1200, AT-OC9MLII, flat transferred with Yamaha GO46 at 44.1khz. 2M samples of left channel at +4M samples from start of track. Time axis in seconds.)

"Excellent! The vinyl really is better!" you think. Visually, the CD is quite obviously "hypercompressed", and the vinyl is clearly not - in fact, the vinyl peaks around 6.6db higher than the CD does, once the two tracks are loudness-equalized. Based on this sort of plot, 99% of the music listeners who care about dynamic range in their music would believe the mastering on the vinyl release is superior.
But 99% of music listeners are dead wrong.
(Same plot; zoomed in to 27.43-27.44s)

Vinyl is full of linear distortions and noise sources that do not exist on CD, and so the waveforms rarely if ever match at short time scales, even if the mastering is the same. Such is the case here. But it should be quite obvious that the clipping that exists on the CD also exists on the LP. The "flat" regions on the CD very closely match the "flat" regions on the LP. [1]

This is fairly incontrovertible proof that the vinyl release of Mirrored originates from masters utilizing a similar amount of dynamic range compression as exists on the CD. And yet, the first waveform plot appeared to clearly indicate a higher dynamic range. In the case of vinyl specifically, this conclusion can be wrong for a number of reasons:

  • The recorded vinyl waveform is not sample-aligned with the CD waveform. Intersample peaks that are not visible on the CD waveform suddenly become visible on the vinyl waveform. In exceptional cases this can artificially raise the peak level by a few dB.
  • Variations in the frequency response of the vinyl system (both in recording and playback) can artificially raise peak levels. One particularly strong culprit here is low-frequency rumble at the tonearm-cartridge resonance (around 10hz) which can easily dwarf all the other musical content on the recording. (In this comparison, the rumble was filtered out.)
  • Tracing and tracking error in the tonearm system can artificially raise peak levels.
A similar situation exists with FM radio, which is also compressed very aggressively, although waveform plots of recordings do not show an excessive amount of limiting. But applying the preemphasis, resulting in a signal analogous to what is actually transmitted, results in a much more obviously limited waveform.

Another Example, with a CD
Metallica's Death Magnetic is being widely criticized as one of the most sonically unpleasant popular records of recent memory. And yet, its waveform plot, while ugly, is not perfectly flat:
"My Apocalypse"; CD; source.

Many albums that visually appear worse than this sound much better.

Why can waveform plots lie? Blame your ears

The human ear may be relatively insensitive to limiting, if it only happens a few times a second - for instance, if only the percussion is affected. Bob Katz has commented in Mastering Audio that in regards to peak limiting, "a rule of thumb is that short duration (a few milliseconds) transients of unprocessed digital sources can be reduced by 4 to 6 dB with little effect on the sound." Those people who would worry about 10-20 sample runs of clipping on CDs should note that 1ms = 44 CD samples; examples of hard limiting become exponentially harder to find as the number of consecutively limited samples increases. Mastering engineers use peak limiting all the time because, in all honesty, the human ear ear often lets them get away with it! In other words, clipping that may appear blindingly obvious on the waveform plot may not be audible.

On the flip side, when limiting occurs with periodic signals - like those emitted by most non-percussion instruments - it becomes a form of high-order distortion, and may become very, very audible. In the worst case, limiting of very few samples at a time (1-5) may be audible.

Furthermore, different forms of hard limiting have different effects on the sound quality. Clipping is merely one of the most dissonant forms of hard limiting. A large market exists for high-quality limiters that remove as much of the peak as possible, while maintaining as much of the sound quality as possible. And yet all of these limiters will appear more or less the same on a zoomed-out waveform plot. How can can sound quality be evaluated from such a plot when one can't use it to tell apart different forms of limiting?

When limiting is tastefully done, by a professional mastering engineer, it can negligibly distort the music while making it much more enjoyable for those of us with less expensive or desirable listening environments. In this situation, the effect of more general forms of dynamic range compression may be more objectionable, in that the reduction of dynamic range is worse than the addition of light amounts of limiting distortion.

How can I tell if one master is superior to another if waveform plots are bogus?

Unfortunately, while many alternatives exist to the waveform plot to measuring mastering quality, they all have major downfalls. None of them are trustworthy.

  • Many people use ReplayGain track gain values as an estimator for the dynamic range of the music. Most importantly, it is easily fooled by global changes in gain - so that the "fake" dynamic range of the Battles recording above would completely fake out ReplayGain. Certainly, ReplayGain values place limits on how much dynamic range may exist in the music, and it can be used to guess a lot about the mastering, but it does not, in fact, actually measure dynamic range, nor does it measure distortion. Many albums considered great-sounding from the likes of Radiohead and Gorillaz tend to Replaygain in the -10dB range, which many people would consider "hypercompressed".
  • Audio engineers use an "RMS" value that is conceptually about the same as ReplayGain, and shares its faults. However, it is an acceptable figure of merit proving that Iggy Pop is a crappy producer.
  • pfpf was more or less explicitly designed for this sort of thing, but only in the context of changes in dynamic range - not in terms of clipping, or any kind of timbral changes. This approach should be fairly good teasing out changes in dynamic range compression, but quite insensitive to limiting. Also, pfpf is still sort of half-baked, and it's not clear yet how close the long/medium/short term numbers need to be before concluding that two samples are of the same master.
  • The Sparklemeter was designed by somebody else on somewhat similar lines as pfpf, but specifically relating to measuring mastering quality. See that thread for my comments.
  • Some people are content with using the "Peak Level" result from CD audio rippers such as EAC as an indicator of audio quality: the idea being that CDs with a peak level lower than 99% run a much lower risk of hard limiting than those at that peak level or above. This is wrong in so many ways it's hard to count. It is, quite simply, the least accurate method possible of evaluating mastering quality.

Perhaps the most accurate method of evaluating mastering quality, though, is the simplest: Asking the mastering engineer. They, of course, would know best.

Another method is to simply use your ears. Know what clipping sounds like, and what other forms of hard limiting sound like, etc., but also evaluate your own music collection's sound quality, and investigate the mastering techniques used with its songs.

What is to be done?

  • Avoid the use of zoomed-out waveform plots to prove points about sound quality. They convey less information than you might think, and they are easy to misinterpret.
  • Do not trust the sound quality of a record simply because it looks good on a waveform plot. The ear is not an oscilloscope. Waveform plots are an informational tool, but the only relate to the perception of hearing in an abstract sense. There are plenty of ways that a good-looking waveform plot can sound terrible!
  • Avoid using any numeric measurement for evaluating audio quality, unless you understand exactly its exact meaning for audio perception. Most numeric tools today are fairly flawed. They can be used to make meaningful statements about mastering quality in specific circumstances, but they can also make lots of meaningless or flawed statements.
  • Do not buy vinyl on mastering merits alone, unless you have information coming from the vinyl mastering engineer attesting to its superiority over other release. As a rule of thumb, the cost of a vinyl remaster is high enough that those labels that choose to remaster will make it quite clear to potential customers that they did so, and the labels that didn't, won't. But of course, just because a special mastering job was done for a vinyl release doesn't mean that the mastering changes were significant. Caveat emptor.
  • Enjoy music purely on its subjective merits, but pay attention to your perceptions and look for ways to quantify it. There's too much good music out there today to ignore because the mastering is crap. And despite the shrill cries of the hi-fi set, the sound quality of music today is still considerably better than what it was for (most of) the last 60 years. And if all the kids today have no problem listening to the music, who's to say that us old farts can't listen to it too?

    Nevertheless, reduced dynamic range and increased distortion in modern mastering are real issues. Solving them requires subjective and objective evaluations of sound quality, rigorous in their execution, to convince the audio world that this is not mere idle talk of ignorable audiophiles.
For further reading: the HydrogenAudio forums, including some recent topics on the subject, and a wiki entry on vinyl mastering (semi-maintained by me). Related thread.

[1] The regions of clipping on the LP version of "Leyendecker" seem to imply that it is slightly less limited than the CD version. However, the difference is so slight that it is not believed to be audibly superior - and besides, the example still breaks the myth that vinyl must necessarily not be hypercompressed.

17 September 2008: Incorporated feedback. Previous revision. Thanks to David "2BDecided" Robinson, Lyx, krabapple, and others on HA, for their feedback.

12 September 2008

Metallica's "Death Magnetic": Clips on both CD and Vinyl

Much hay has been raised about the sound quality of this album, although interestingly enough, none of it has been from mainstream music journalism (or even the blogosphere) yet. The mastering engineer has even disowned the sound quality of the record, passing the buck onto the mixing engineer, and ultimately, the band itself.

The MP3 prerelease leaks have been so commonplace (and so bad sounding) that a lot of people are buying the 2LP and 5LP versions thinking that they are getting superior sound quality. However, judging by the large amount of clipping still extant on the vinyl, they probably won't:

Image credits: hdsemaj on stevehoffman.tv. Original post.

Subjectively, some people are reporting that the warmer sound of the record is dampening the clipping somewhat, but really, that's damning with faint praise.

My recommendation is to spend as little money as possible on the release until they sell a better master. Metallica.com is streaming the album for free right now. Beyond that, buy the MP3s. Personally, I smell a rat.

11 September 2008

Some thoughts on a new technique for clipping detection

I decided to riff on an idea I had for smarter ways to detect clipping. I'm sure it's not a new approach - a quick search on Google pointed out several papers on the same basic concept - but I'm not aware of it being used for audio, or for mastering evaluation specifically.

Clipping clamps the signal to a constant value. It also tends to occur right in the middle of signal content which is of a high power. If the signal derivative is calculated, the DC component of the clipping is effectively eliminated, bringing the values to 0, while the values before/after the clipping are relatively unmodified. (Specifically, low frequencies are attenuated and high frequencies are boosted.) So naively, one could use this derivative as the basis for a clipping detector - compare y' to y, and if y' is zero or very close to 0 while y is of a high power, you may have clipping. This technique would be immune to attenuated clipping - if it occurs at -10db it should work as well as if it is at 0db.

However, this approach fails when gross frequency response distortions are introduced - like what exists on vinyl. As discussed earlier, vinyl clipping examples exist which are sloped, not flat. The derivative of these a sloped line is a constant nonzero value. The workaround for this is simple: take another derivative, the second derivative, so that this constant nonzero value collapses to 0. In theory this could be extended to an arbitrary number of derivatives, but because high frequencies are amplified, background noise tends to dominate the response after the 2nd derivative, so the 3rd and beyond are pretty useless for vinyl analysis.

What I'm ultimately hoping for is to have the final output be a histogram and running a threshold on that to give an estimate for how many clipped samples exist in the signal. This allows comparisons between signals that are not sample-aligned (as is usually the case with vinyl vs. CD comparisons).

Here's what I have so far. First, some clipped stuff from Leyendecker again, on CD:

Note how the clipped content is neatly crushed to 0 on the second derivative plot - and most importantly, that the histogram plot shows a lot of values on the left, outside the distribution curve with a high signal energy and a low 2nd derivative energy. Those are the clipped samples.

Now for the LP version, different part of the file:

The 2nd derivative crush still occurs, but it's not nearly as prominent as on the CD. And the clip signal is so weak on the 2nd derivative plot (or the noise is so high) that none of it really shows up on the histogram plot; there's no real indication of clipping there.

But it's a start.

10 September 2008

Feedback on "Waveform Plots Considered Harmful"

I received lots of feedback on my HA thread and some in the comments. I intend to update the paper incorporating this feedback, archive the original somewhere, and post it for more general consumption soon.
  • Several people commented that the tone of the article in general was too inflammatory, and given that I used waveform plots to make a few important points, potentially hypocritical. I blame the insanely late hour that I wrote it. It will be edited to be a bit more evenhanded.
  • One person commeneted that the particular waveform examples I used seemed to imply that the vinyl master is clipping significantly less than the CD master. I'm seeing that too, and I cant' really explain it. It's clearly not some sort of analog effect in the playback system, and it seems to be consistent across the entire disc. The fundamental issue is the same - buying vinyl does not always mean you are getting a product without hard limiting - so I think the article still stands up well. But the specific point of that example, that there exists a vinyl master which provably has just as much clipping as the CD master, is compromised significantly.

    Note, also, that other people have observed vinyl waveform plots that don't share Mirrored's subtle difference in clipping levels. The clipping levels really do seem to match in those cases.
  • A clearer distinction needs to be made between periodic clipping of periodic signals - which is extremely audible even in small amounts - and clipping of transients, which is far less audible. Modern mastering practices can occupy either extreme or somewhere in between.
  • I'm equivocating between clipping and hard/brickwall limiting; clipping is only one form of hard limiting. A proprietary hard limiter is capable of doing the same job that clipping does, but with potentially far less audible distortion. It's an open question as to how much more audible clipping is than a good hard limiter. Nevertheless, the damage done to the audio is quite significant for all hard limiters.
  • While we're on the subject of things that should be made more clear, the extensive use of dynamic range compression (of the non-limiting variety) clearly has a far more audible effect on the sound than limiting/clipping.
  • Important tip: Bob Katz commented that he always sends vinyl masters out without the use of the hard limiters used on the CD masters. Yay!

04 August 2008

MT9: Terrible marketing, better potential?

(Reposted from an Audioholics post.)

There was a little hubbub recently over this Korean codec called "MT9" that claims to attempt to unseat MP3. Most sane people rightfully call that bollocks. Even the English-language MT9 site calls it bollocks.

But I wouldn't count it out quite yet. There are a lot of subtle implications to MT9 that I don't think people have fully realized.

All MT9 appears to be is a container format for an unmixed record. That is, instead of taking a multitrack production and downmixing all the instruments to stereo, you encode each instrument to a separate track as .MT9 and let the player do the downmixing. There's no technical innovation involved here. MT9 is probably (well hopefully) just a container around a mainstream codec like MP3 or AAC.

Therefore, MT9 does not in any way compete with MP3 or other mono/stereo lossy codecs - although it may be able to use them internally. As is mentioned, it could be used as an alternative means to deliver music, but the odds of it ever catching on in popular music are rather slim. That all press discussion (and MT9's own web site!) have focused on that aspect is quite unfortunate.

From a encoder standpoint, this is still kind of a win - because there's a 1 to 1 correspondence between channel and instrument, you no longer have to worry about weird stereo collapse issues, you only have to tune the encodings for mono, etc. The bitrate would likely still be much higher than MP3 for high quality, simply due to the number of channels involved.

From a playback quality point of view, the MT9 system precludes the use of global dynamic range compression and limiting. That is, because mixing is deferred until playback, mastering must also necessarily be deferred until playback. This, of course, is a partial solution for ending the loudness war and is a huge win. Compression can still be applied to individual tracks, but because the listener has control over the volume of individual tracks, there would be much less impetus for producers to try to make a particular track stand out in the mix. Of course, this also strongly implies that producers would not need to employ mastering engineers in the traditional sense, bringing costs down.

This has virtually no chance of supplanting other formats for commercial music. But the deals with karaoke and possibly cell phones are probably the perfect application for this at the moment: very closed markets, and the music is often custom produced anyway so doing a multichannel production is not a big deal. But as I mention, I suspect I wouldn't mind buying normal music in this format either.

18 July 2008

Foot wrapped in tonearm cables and inserted in mouth - film at 11

I made a recording of a 2-LP set before ever listening to it, and after mucho processing, I listened and discovered that it was corrupted by large amounts of ground noise. Bleh. I had to throw them out and re-record.

After much searching I discovered that the power supply from my Dell laptop was the culprit. Moving the turntable cables away from the PS relieved the noise. Moving them back increased it. Wrapping the cable around the power supply amplified the noise tremendously.

The noise consisted of peaks about 30db above the noise floor at 60hz and 180hz, many peaks in the 20db range from 200-10khz, and a huge mass of +20db harmonics centered at about 13khz and 18khz. Not subtle at all.

This throws a major wrench into my whole way of thinking about balanced turntable connections. I was under the impression that a well-shielded balanced cable is fairly immune to EMI of this nature; clearly I'm mistaken.

Microphone and pro audio connections have the same issues; I'll need to research how sensitive they are supposed to be to this sort of thing before I consider making cabling/equipment changes.

11 July 2008

Recording sample: Einstein on the Beach

Here is a 30-second sample of my current recording process. It's the first 30 seconds of side 3 of the CBS Masterworks pressing of "Einstein On The Beach", encoded with lossyWAV --portable in FLAC.

I bought the boxed set at Amoeba Records in SF for $9, and it's easily some of the best pressed and maintained vinyl I have ever laid eyes upon. (The music is excellent too.)

Note the hiss before the violin begins - that is the 0404 USB's mic preamp noise I was talking about. It is fairly prominent in silent passages, but is difficult to identify while most music is playing, and is also pretty hard to spot with noisy vinyl. So it's fairly important to get this fixed, as far as vinyl upgrades are concerned. Besides that, I think the sound quality is excellent.

27 June 2008

The art of the flat balanced digital phono preamp

This is a brief overview of the work I've done with recording vinyl directly to a computer, with balanced cabling, without the use of RIAA equalization. Sorry, no pictures.

The Problem

Somewhere in my audio pursuits, I wound up desiring access to a flat-eq phono preamp - one that didn't have any RIAA equalization applied. I also wanted balanced inputs and an integrated ADC, so that the signal path would go straight to the computer. (My audio stuff is very close to my computer stuff, so issues like ground hum and EM interference are pretty important to me.) Such things don't seem to exist in the audiophile world. The closest tailor-made solution is EnhancedAudio's flat preamp system, which is not balanced, and perhaps, not audiophile enough.

The Solution

But enter Pure Vinyl. Although it's Mac only, it has full support for flat transfers (and in fact recommends them), and goes into great detail on how to actually accomplish it.

In summary: a phono preamp without RIAA equalization is really just a fancy way of describing a microphone preamp. The gains for a phono preamp are about the same as for a mic preamp, 40-70db. The load impedances for most mic preamps are in a reasonable range (1.5kohm) to load an MC with, although to be honest, I've never cared much for the finer points of MC loading (but that is for a separate post). So about a year ago I bought an E-MU 0404 USB, which is widely praised in some circles for its high quality microphone inputs, and I got started.

  • Mic preamps tend to be much more sanely valued than phono preamps. Even major tweak territory is not going to run you more than a thousand dollars a channel (compared to perhaps 5k/channel for a phono stage).
  • Mic preamps can obviously perform double duty as, well, mic preamps. This may help their resale value.
  • External computer audio interfaces tend to be a) very inexpensively priced, b) ridiculously packed with features, and c) very high quality. There are many extremely good interfaces in the <$500 price range worthy of consideration. Many also have high quality analog/digital outs, headphone outs, etc.
  • When used with balanced connections, a virtually noise-free signal path is guaranteed.
  • If the mic preamp has accurate low-frequency response or is DC coupled, very accurate rumble measurements can be derived. This is extremely useful for several technical measurements.
  • Software support for flat recordings is virtually nonexistent, with the notable exceptions of Pure Vinyl and DC Six. I've pretty much written my own software for this.
  • MC gains may test the upper limits of mic preamp performance.
  • May not be compatible with MM cartridges, depending on configuration.
  • Because of the scarcity of users of this scheme, many unforseen issues may develop (see below).
  • Per-channel mic gain control prevents completely accurate level matching. (But level matching was never that good on vinyl to begin with.)
  • Phantom power on XLR cabling may damage turntable gear if improperly wired.
  • XLR/TRS turntable wiring doesn't exist. Even balanced wiring in the form of 5-pin DIN is hard to find, and requires special cabling to terminate to XLR/TRS.
  • There are theoretical objections to flat-eq recordings on the basis of reduced dynamic range at some frequencies and a greater risk of overload in others. Rob Robinson of Pure Vinyl has argued convincingly that this should not be an issue.

Software Implementation

Software equalization was accomplished with a naive implementation of an RIAA filter as a prewarped bilinear transformed IIR filter, in LabVIEW. At 44khz the response isn't great (it's several db off at 20khz). I spent some time trying to make a better filter, but was mostly rebuffed. LabVIEW's facilities for optimizing FIR filters were not able to get to within +-0.1db across the audio band without fairly long lengths. Regardless, it sounds pretty good as is.

I've been informed of a technique by Robert Orban to use Remez optimization on IIR filters to make an extremely accurate filter. I'll probably do this in the future.


For rumble, I use two Butterworth highpass filters operating on the L+R and L-R signals. L+R is order 6 at 25hz; L-R is order 8 at 35hz. These numbers were evaluated a long time ago observing spectral content of blank grooves on HFNRR - the lateral and vertical content of the rumble is largely different. Of course, the exact nature of the rumble varies from record to record.

It's worth noting that Pure Vinyl has its own very high quality filters that would moot all this work had I be using a Mac. In particular, the rumble filter is phase-distortion-free.

Hardware Implementation #1

I didn't have a low impedance MC cartridge the first time around, merely an AT440ML, and I didn't want to rewire the turntable I had at the time, so I built an adapter box to convert RCA to XLR and add 47kohm of resistance. (I added capacitance too but there is likely enough capacitance in the entire circuit to make that unnecessary.)

This sorta worked, but the frequency response was completely off. After reading the 0404 USB docs between the lines, I determined that the XLR inputs were constantly loaded at 1.5kohms - completely unsuitable for MM use - but the TRS inputs were at 1Mohm. So I replaced the plugs with TRS and went on my way.

The resulting recordings sounded acceptable, but there were major EM interference issues. 60hz ground harmonics were severe, and oddly, peaked quite strongly at high frequencyes (15khz), which wound up being audible. I chalked this up to three issues:

  • the resistance/connector adapter box I built was fairly ramshackle.
  • The turntable wiring itself was not great.
  • The AT440ML (and apparantly most MM cartridges) wire one of the signal return pins to its ground. This may cause more issues than it fixes in this particular configuration.

Hardware Implementation #2

So I scored a Technics SL-1200MK2 over Christmas, and an Audio Technica OC9MLII a few months ago. I also procured a balanced audio cable from Blue Jeans cable - about 3 meters of Belden 1800F terminated with Neutrik TRS. I cut the cable in two and replaced the external wiring on the 1200 with the 1800F (causing some pretty severe damage to the 1200's cabling circuit board in the process).

Everything worked great, except for one glaring problem: noise. The SNR was way, way too low. In fact, what had happened was that the 60db of gain on the 0404 counted against its SNR, resulting in roughly 50db of SNR before equalization. This is apparantly 40db (or more!) worse than it should be.

The built-in gain on the XLR inputs was a smidge better, and the load impedance is compatible with the cartridge, so I was able to buy a few more db by rewiring back to XLR. But longer term the only surefire solution is to get a better preamp. One of the bigger potential drawbacks to XLR is that phantom power (+48V to the turntable ground!) is only a push button away, but after deliberately pressing this and not seeing any ill effect beyond lots of noise while it's enabled, it should not be a concern as long as the cables are properly wired.

Recording process

Recording is currently done in the version of Cubase LE that came with the 0404. The 0404 USB has major driver issues and consumes 50% of the CPU time, on a 3ghz P4, while playing back or recording audio. Overrun susceptibility is also quite high, and it's virtually unusable on my Dell laptop (although that may be more of an issue with the network drivers than the card). So most of the time no other interrupt-intensive work can be done while the recording is in progress.

After the recording is over I run RIAA eq and derumble in LabVIEW. Then I open the wavs up in Audacity and manually remove the loudest pops, normalize, trim the edges and export to 16/44 lossless. Tagging and transcoding is done in foobar2000.

The 0404 USB (and most 2-channel mic preamps for that matter) have separate gain pots for each channel, resulting in a fairly obvious level imbalance. This can be calibrated away somewhat by using a test record, but my de-rumble utility also estimates L/R energy content in db, which I can then use to amplify one channel over the other in Audacity.

Future Plans
  • The 0404 USB really has to go.
  • I would prefer some more specific tool for recording the audio over ASIO. A LabVIEW interface to PortAudio would arguably be most powerful, and would most easily let me do online monitoring, but short of that, a commad line recorder would be great.
  • If a preamp can be obtained that has acceptable gain with Hi-Z inputs, then I could craft in-line load resistance correction for MM cartridges over the XLR pins. That would cleanly solve MM loading issues.
  • Develop an accurate real-time high speed meter capability, to read test band levels off in real time, to aid in adjusting mic gains.
  • Grab a hold of (or write) some A-weighting filter routines to compare SNR ratios with.
  • Streamline the recording and processing chain. Pop/tick removal, normalization and trimming can in large degree be made automatic. Those steps take up perhaps 10-30 minutes per record and speeding those up would make large-scale recording much faster.
  • Of course, I could just buy a Mac and Pure Vinyl and forget about this, or (Goddess forbid) buy an MC phono stage, but what's the fun in that?

Thanks to Rob Robinson at Pure Vinyl for pointing me to this technique and overall guidance.

UPDATE 1: recording sample.

UPDATE 2: "Trouble In Vinyl-dise", Or, "How does Dell Manage To Sell Such Cheap Computers? Film At 11".

Some notes on pfpf

It's been quite a while since the first update, and I didn't mean this to be a one shot deal, so I might as will give a status update on pfpf. I haven't had the opportunity to work on a new version, but plenty of comments have been made so far:

On Missing Files. First, my hosting provider, storing my screenshots and files, disappeared without any contact information. I just got a new one set up, so all the links work again.

On Download Sizes. Lots of people don't like the 90MB LabVIEW Runtime download, or that registration is required on NI's website to download. I can't get rid of the runtime download entirely, but there is a smaller (28MB) runtime that may work for you - download here.

On Magnitudes. Several people commented that the dynamic range figures seemed too low. A well-mastered pop track may only show up as having only 3-4db of dynamic range on short/medium time scales, and virtually no range on long time scales. Extremely dynamic symphonic works may only have 16db of dynamic range, where by most "normal" evaluations, there should be more like 40-60db. While the choice of scaling has little effect on comparison of pfpf-derived numbers, it has a great effect on their overall interpretation in relation to other decibel figures

Much of this stems from the choice of percentiles used in the variance measurement (from the 50th to the 98th). If this range were to be doubled, the numbers would probably fall more into line with what people normally expect. This could be accomplished by either doubling the 50-97.7 figure.

On Dynamic Range Manipulation. Michael Jamsmith and I independently came up with the idea of running pfpf's histogram plot in reverse to make a Photoshop-like levels adjustment for loudness on a music track. In other words, a reversible 2-pass dynamic range compressor. Certainly something to work on in my copious free time :)

On Resolutions. One persom didn't like that a greater than 1024x768 resolution is required. I'll see what I can do to make the resolution requirements nicer, but I can't promise much. I might just require that people use a 1680x1050 or higher display.

Bob Katz's comments.

An Alternative Proposal - the Sparklemeter. Chromatix on HA has recently proposed measuring dynamic range by comparing the ratio between a PPM and a VU meter (suitably modified), preparing a histogram of the result values, and computing a figure of merit based on the mean and variance of that histogram. The resulting functionality is similar to the medium- and short-timescale figures that pfpf outputs, but the use of exponential-decay meters is new, as is using direct percentile values from the histogram rather than ranges between percentiles. Using VU/PPM meters gives the result great intuitive meaning for audio engineers, although I fear the required modifications may compromise that. Watch that thread.

13 January 2008

pfpf: An Experimental Estimator of Dynamic Range in Music

The dynamic range of a selection of music is dependent on both estimating the time-varying loudness of the music and the timescale used for loudness evaluation. I propose a numerical method of estimating dynamic range that satisfies those dependencies using a modified ITU-R 1770 loudness filter and three moving windows to estimate loudness across three different timescales. The goal is to more accurately measure and compare dynamic range between different music genres and different masterings and processing techniques for the same music.
Summary of algorithm:
  1. Apply ITU-R 1770 filters to convert amplitude to instantaneous loudness.
  2. Estimate loudness across three different timescales by computing 10ms ("short term"), 200ms ("medium term") and 3000ms ("long term") windowed RMS power.
  3. Decouple timescales by scaling 10ms loudness by 200ms loudness, and 200ms loudness by 3000ms loudness.
  4. Threshold loudness at each timescale to remove silence (optional)
  5. Compute histogram for each loudness estimate
  6. Dynamic range = range between 50th and 97.7th percentile, for each timescale

Using the pfpf application

The algorithm is prototyped in a LabVIEW application built for Windows, downloadable here. Unzip it into a new directory. You also need to download the LabVIEW 8.2.1 runtime.

Basic instructions: Run the program and open the folder icon on top to select a WAV file. Press the "Run analysis" button. The file is scanned for instantaneous loudness (indicated by the progress bar) and then a histogram operation is performed to calculate the dynamic range. The output is displayed at the bottom. Additionally, other tabs display plots of instantaneous loudness and histograms.

Interpreting the results:
  1. Long-term dynamic range - loudness changes across multiple seconds, or across multiple measures of a piece of music. Wide swings in orchestration and sustained loud/quiet passages increase this number. Dynamic range compression, in any form, decreases this number. Typical values range from 16db for extremely dynamic orchestral and experimental music to 1-2db for pop/rock singles.
  2. Medium-term dynamic range - loudness changes across hundreds of milliseconds, or single notes. Aggressive dynamic range compression can reduce this.
  3. Short-tern dynamic range - loudness changes across single milliseconds. The use of extremely percussive instruments can increase this. Extremely aggressive dynamic range compression, especially limiting, can decrease it.
  4. ITU-R 1770 loudness - estimate of loudness as per the ITU-R 1770 recommendation.

Example Results: Long/medium/short dynamic range for various tracks:
Musical piece
Autechre - "Sublimit"
Autechre - "Dial"
Shellac - "Genuine Lullabelle" (long term thresh=-50db)
Merzbow - "I Lead You Towards Glorious Times"
John Mayer - "Waiting On The World To Change"
Battles - "Tonto"
Soundgarden - "Black Hole Sun"
Autechre & The Hafler Trio - "æo³"
Harnoncourt, Beethoven Sym. 9, Chamber Orchestra of Europe


Configuration and output tab:

Loudness plot tab:

Histogram tab:

Advanced configuration

These options affect the computation of the dynamic range; when they are modified, the results should always include the new configuration. The "Output" string was created for this purpose.

  • Thresholds: If the instantaneous loudness drops under the threshold associated for that time scale, that timescale loudness (and the loudness for any shorter timescale) is clamped to NaN, and ignored in future dynamic range calculations. This is to prevent silence (assumed to be below the listening noise floor and is therefore inaudible) between music from affecting the results. Silence skews the histogram results so as to artificially compress dynamic range across all timescales. Its loudness also varies considerably between different formats (notably vinyl vs CD) and masking it aids in making an accurate comparison of formats.
  • Time scales: Controls the rms window size (in seconds) for each time scale.
  • Percentiles: By default, dynamic range is calculated as the loudness range between the 50% and 97.7% percentiles from histograms at each time scale of loudness. These percentile levels may be adjusted.

Application License

The pfpf application is free for non-commercial use. Do not redistribute it. Source code is available upon request (requires LabVIEW 8.2 or above and the Digital Filter Design toolkit).

Contact Info

Message me (Axon) on HydrogenAudio, or comment below.

Known Issues

  • It is important to take the results with a grain of salt. Transient loudness estimation is a topic of ongoing research, and no truly accurate method has yet to be agreed on. pfpf currently uses a moving-window modification to Leq(RLB), but in the future, a more elaborate loudness estimator, like HEIMDAL, might be used.
  • DC removal is applied at each short block (defaults to 0.01 seconds of signal) that is read, which are composed into the larger medium/long (0.2/3s) blocks. The end result is that the signal receives a 100hz highpass before analysis, removing all bass information. This is anticipated to not be a big deal because of the relatively small contribution that LFE provides to loudness models.
  • Histogram computation is not factored into the progress bar, so there is a noticeable pause between the completion of the progress bar and the display of results.
  • Beware of falling code. Parameters may not be well tested for failure cases or obviously incorrect inputs.

Document Revision History

13 January 2008: Initial revision.