Audiamorous

RMAA analysis of EVO Shift

2011-06-19T22:57:00.002-05:00

Summary

Frequency response (from 40 Hz to 15 kHz), dB
+0.27, -2.77
Average
Noise level, dB (A)
-91.4
Very good
Dynamic range, dB (A)
91.2
Very good
THD, %
0.011
Good
THD + Noise, dB (A)
-68.2
Average
IMD + Noise, %
0.413
Poor
Stereo crosstalk, dB
-80.6
Very good
IMD at 10 kHz, %
0.032
Good
General performance

Good

See this link for complete results/charts/etc.

Tracking paper available for free

2010-09-14T14:42:00.002-05:00

I'm not dead!

I've uploaded my tracking simulation paper here. It is not the same PDF as what you would download from aes.org; an added footnote to the paper reads:

This is a revised version of the preprint presented at the 127th AES convention in October 2009, and available from the AES. That preprint remains the authoritative version of this text. Here, the copyrighted AES logo and other convention livery are removed, but the contents of the paper are unmodi ed.

Digital Simulation of Phonograph Tracking Distortion: Download the poster; buy the paper

2009-11-09T11:38:00.003-06:00

The paper and poster

The poster I presented for my paper at the 127th AES convention is available for download here. It provides a good overview of all aspects of the paper, although it leaves out many technical details and explanations.

KlausR has alerted me to the existence of my paper on the AES E-Library, which means I am no longer tempted to share my LaTeX PDFs with anybody. $5 for AES members, $20 for nonmembers. Or you can go all out and buy a CD-ROM of all the convention preprints on the AES store for $150 (if you can find the link). Or you can wait until 6 months after the convention, which appears to be the usual time when authors are allowed to upload their preprints themselves.

The simulator

At some point in the hopefully-near-future I will release the source and executable to the simulator itself. It will be written in LabVIEW 8.2.1 with the Digital Filter Design toolkit. It is more or less the exact same code as I used to build the test samples for the paper: there are a few optimizations I've left out of the code because I first wanted to release what was documented in the paper.

The listening test

One of the points I tried very hard to emphasize in the paper was that, while a listening test was conducted, which provided very interesting results, from the standpoint of comprehensiveness or authority, it wasn't a very good test.

Only two people were involved with the test, who used two different protocols, neither of which was particularly great. Ideally, many more people should be involved, in a wider range of listening environments. And the protocol should be in some sense similar to the listening test you get at an audiologist: As the magnitude of the distortion varies in an automatic fashion, the listener records whether or not the distortion is audible. (This is actually pretty similar to Klaus did, as he did not have a computer audio setup in his listening room.) That is, the usual test concepts used for eg the absolute threshold of hearing should be employed, rather than ABX test concepts. This should allow a test which loses no statistical meaning, is potentially easier to take without the use of a computer (eg by burning all the test tracks onto CD), and avoids all the controversy around ABX testing.
The audio samples used were chosen on the basis of relatively little evaluation for sensitivity to tracking distortion, and how easy they were to obtain and use. I'd like to perform a more thorough survey of audio samples, potentially employing automated tools such as PEAQ to estimate audibility.
The interaction between tracing distortion and tracking distortion was ignored for this paper. For the purposes of testing, and more importantly of relating the simulation to real-world playback situations, this is problematic. Tracing distortion may distort the signal in such a way as to make any extant tracking distortion more audible than it otherwise would be. For instance, with an elliptical or hyperelliptical stylus, a tracking error will shift the position of the two stylus contact areas along the time axis - ie, a time delay exists between the two channels that does not exist when either tracing or tracking distortion is considered in isolation. For this simulator to be directly applicable to real-world situations, the importance of tracing distortion must be quantified and bounded, either through placing strict limits on the magnitude of its effect on tracking distortion, or by going all-out and writing an accurate tracing simulation.

Once these issues are addressed to my satisfaction, I will announce the specifics on a listening test open to the public.

Digital Simulation of Phonograph Tracking Distortion

2009-08-10T14:00:00.003-05:00

Phonograph tracking distortion results from the misalignment of a playback cartridge with respect to the cutting head. While it has been researched for decades, it remains a source of mystery: it cannot be isolated, it has not been accurately simulated, and its importance remains undecided. Here, a PCM simulation of horizontal and vertical tracking distortion of extremely high quality is presented, operating on the principle of phase modulation, allowing tracking distortion to be evaluated in isolation with real musical content. In this context, tracking distortion is equivalent to digital audio sampling jitter, with the jitter spectrum equal to the signal spectrum. Implications of this connection, as well as simulation accuracy, preliminary listening test results, and potential applications are discussed.

127th AES Convention, New York; Poster Session P14 (Signal Processing). Currently scheduled for Sunday 11 October at 10am. Date/time possibly subject to change.

Getting this paper out the door has been the predominant reason for the recent dearth of activity on this blog. More info will follow here, including a downloadable simulator implementation, the samples used for the existing listening tests, and an open invitation for a new listening test.

Some thoughts on digital wow/flutter demodulation and analysis

2009-02-28T07:17:00.005-06:00

Background

Accurate analysis of the speed stability of turntables requires a wow/flutter spectrum: a frequency analysis of speed deviation. This is performed by treating a sine wave as a carrier for a modulated signal (the speed deviation). Perform a frequency demodulation of the wave and you have your speed deviation; run an amplitude spectrum on that deviation and you have your wow/flutter spectrum, or apply the IEC wow/flutter weighting filter and obtain a peak w/f measurement to compare against rated specs. However, the peak W/F measurements tend to be vanishingly low for virtually all turntables nowadays and the spectrum is generally considered far more useful. Ladegaard's classic app note on vibration analysis contains good explanations of the importance of wow/flutter spectra when analyzing turntable sound fidelity, and it is quite unfortunate that its measurement has remained generally out of the reach of the lay audiophile.

Historically this measurement has required expensive ($10,000 or more!) lab hardware. Nowadays there are a few ways to do this digitially, but the most commonly encountered ones are rather poor. The crudest method is simply to look at a spectrogram of the tone: this gives the rough outline of the modulation, but is generally horribly CPU-intensive and has a time and frequency resolution so low as to be meaningless. A method of better utility is to simply look at the high-resolution spectrum of the wow/flutter tone; this is what Stereophile typically uses on its turntable tests. This is a trivial operation in any audio package, although some care is necessary to set block lengths, window parameters and overlap to maximize dynamic range. Alternatively, the signal may be mixed (multiplied) with a synthesized sine wave with a frequency matching the average frequency of the test tone. This shifts the test tone at F into mirror tones at 0 and 2F; the latter can be lowpassed out and the former can be run through an amplitude spectrum plot to obtain a baseband spectrum. This might yield improved performance by folding sidebands together to reduce noise, but any error in the synthesis frequency will cause the sidebands to not line up exactly, leading to a sort of "double vision" in the spectrum plot, where every peak appears twice.

All of these methods have major issues specific to their implementation, but by far, their biggest weakness is that they do not demodulate the signal. Accurate FM demodulation serves many extremely useful roles that simpler FFT-based schemes cannot:

It yields an accurate time-domain waveform of the instantaneous frequency. This waveform is absolutely necessary for computing IEC weighted wow/flutter figures.
It removes the many sidebands present in FM signals and integrates them into a more accurate result. Sidebands from low frequency wow may overlap the carrier band belonging to higher frequency flutter in an FFT-based scheme, ultimately leading a loss of precision in the final spectrum.
Usually, it is very insensitive to amplitude modulation. All FFT-based schemes will flatten both the AM and FM signals into a single spectrum, which may be a major inaccuracy. In some cases, the FM demodulation can also output the AM signal along with the FM one.
Because an exact lock is made at the instantaneous frequency, frequency drift does not affect the precision of the result. This is an inevitable problem with FFT-based schemes.

Analog and digital FM demodulation is based on a variety of schemes. I implemented quadrature demodulation: a direct application of the Hilbert transform. This allows computation of the instantaneous frequency on a
sample-by-sample basis, at effectively unbounded frequency resolution. What's more, quadrature demodulation separates the amplitude modulation signal away from the FM signal, and once the FM is computed, the AM signal is effectively free. Also, in comparison to PLL schemes, there are very few tweakable parameters to a quadrature demodulator: set up the filter lengths, set up an input conditioning filter, and go to town. It is one of the most computationally intensive demodulation schemes, but it with some light optimization it still runs faster than real time. I got the exact frequency demodulation equation used from the comp.dsp archives , which is a corrected version of an equation in Frerking:

I(n)*Q'(n) - Q(n)*I'(n)
----------------------- = omega (F)
I^2(n) + Q^2(n)

Where I(n) is the original signal, Q(n) is the Hilbert transform of I(n), and I'(n) and Q'(n) are their respective bandlimited derivatives. The formula for amplitude demodulation is:

A = sqrt(I(n)^2 + Q(n)^2)

For the case of a pure single-tone signal, these equations are exact. However that does not occur in a noisy environment and so errors can generally creep into the demodulation, most prominently through clicks and pops in the recording.

Implementation

I took my existing RIAA filter equations to apply reverse RIAA if needed - my thinking here is that when recording a wow/flutter tone, the non-flat filter response around the tone frequency will lead to inaccuracies in the flutter spectrum. It should also yield a smaller AM signal amplitude. It has Orban's hardcoded coefficients for 44100 and 96000hz sampling rates. For other sample rates the bilinear transform filter is used, which is notoriously inaccurate at low sample rates.

Input conditioning is currently handled with a Kaiser FIR bandpass, centered on the tone frequency, with beta=5 and (IIRC) 2047 taps. Obviously, the filter must be linear-phase. The subjective cleanliness of the demodulated output waveforms, as well as the quality of Audacity's declick operations on the output files, demands that this filter exist and it have relatively gentle transition bands. But the demodulation operation itself does not appear that sensitive to wideband noise, so this may be optional for a spectrum analysis. The rejection of the stopbands and the linearity of the demodulation and further analysis is high enough that post-demodulation filtering is rendered unnecessary.

Some care must be taken to keep the time delays of all the various data paths the same. Most importantly, the 1-tap derivative filter x[1]-x[0] will not work because the effective time delay is 0.5. A FIR filter must be constructed with integral delay, along with delay lines to keep I(n) and Q(n) delayed the same amount as I'(n) and Q'(n).

The Hilbert FIR filter was constructed by crafting an impulse signal and applying a Hilbert transform to it; the derivative FIR filter was hand constructed. I'm using FFT-based convolution, so the number of taps available for the Hilbert and derivative filters is scandalously large. I use 262143-tap filters and the whole thing runs 3x realtime. At those sizes, windowing the filters is unnecessary.

Stereo recordings are summed to mono, and there is some control over the L/R balance to ensure an optimum signal. FM signal outputs are in units of the specified frequency: If the test tone is set to be 3150hz in the demodulator, then 0dbFS = 3150hz, -20dbFS = 10% and so on. The AM signal is in PCM amplitude units but makes no attempt to reverse the RIAA eq if applied in the demodulator.

Signals had DC removal done in Audacity. If there were extremely strong clicks they were usually removed manually.

Normalization of AM signals (to a fraction of the carrier amplitude) is not yet done. This means that while the AM results may be shown in the same chart, they are not directly comparable against one another. They are nevertheless provided for comparison against the FM plots.

Spectrum analysis was done in my usual handrolled code. Windowing is Hamming to maximize frequency resolution and because dynamic range requirements are not great; overlapping the previous FFT window at the rear 75%.

IEC wow/flutter measurements are not yet implemented. IEC drift measurements are not yet implemented, although even if they were, they allegedly require a 30s averaging time which might be problematic when dealing with 60s wow/flutter tracks.

I implemented this in LabVIEW 8.21. (Full disclosure: I am a former employee of NI and I own shares.)

Samples

AT-LP2D-USB recording of the Ultimate LP wow/flutter, supplied by knowzy.com
My recording of the STR-151 3khz tone - Technics SL-1200, Audio Technica AT-OC9MLII. Recorded with flat eq on Yamaha GO46.
A recording of a HFS75 wow/flutter track from a currently-unnamed contributor. Unfortunately this track is only ~10 seconds long, so especially the wow results will have a lower resolution. The noise levels seem to be much higher too.

Results

Assuming a carrier frequency of 3.15khz, a peak speed deviation of 2% and a peak modulation frequency of 250khz, Carson's Rule suggests a signal bandwidth of 626hz. I therefore bandpassed the input signal to a 700hz bandwidth before demodulating. (A discussion and analysis of higher frequency flutter is well beyond the scope of this paper.)

Spectrum analysis is done at window lengths of 28.8 and 1.8 seconds. At 28.8s, the record warp peaks are extremely visible while the underlying spectrum is easily seen. At 1.8s, much more windows are available to average together and the warp harmonics completely merge together, resulting in a clearer, more accurate plot. The 28.8s plots go from 0hz to 20hz in a linear scale while the 1.8s plots progress logarithmically from 1hz to 250hz. Plot smoothing begins at 100hz for the 1.8s plots.

For the following spectrum plots, the AT-LP2D results are in blue, my results are in red, and the HFS75's in green. The AT2D had an average speed deviation of 0.167% relative to 3.15khz; the Technics deviated by 0.42% against 3khz, and the HFD75 recording deviated by -2.2% against 3khz - I don't have a good explanation for the latter, except perhaps that the HFS75 track might be something like 2950hz instead of 3000hz; also, the test records for the latter two tables are considerably older than the first, and this might reflect the quality of the lathes than the quality of the turntable.

LP2D demodulated waveforms: black is FM, red is AM

Technics demodulated waveforms: black is FM, red is AM

HFS75 demodulated waveforms: black is FM, red is AM

Frequency modulation spectrum, 0-20hz

Frequency modulation spectrum, 1-250hz

Amplitude modulation spectrum, 0-20hz

Amplitude modulation spectrum, 1-250hz

Discussion

Analysis of the process:

In general, because different test records were used for each recording, the turntables are not entirely comparable based on these measurements except in situations where a clear interpretation can be made. They are plotted on the same charts primarily for easier viewing.
Demodulation is extremely sensitive to input noise - the HFS75 recording was substantially noisier and that really did compromise all its plots. Its modulation spectrum above 50hz simply cannot be trusted. In general, maximizing the SNR of the tone will maximize the SNR of the flutter spectrum. Using a declick utility may improve performance.
The averaged flutter spectra required roughly 30s of input signal to mostly average out, so at least this length is necessary for a good plot. The HFS75 track is 10 seconds long which is way too short. Testing suggests that enabling/disabling inverse RIAA eq does not significantly change the AM or FM waveforms.
Testing at higher bandwidths does not show significant flutter beyond 200hz. The noise levels are markedly higher starting at 500hz, and background noise tones in the recording (such as a 1khz sine and harmonics on the LP2D) show up spuriously in the modulation spectra.
Compare to Ladegaard's B&K plots - is my noise floor higher or lower? I can't really tell at the moment.

Analysis of plots:

The LP2D plots show clear peaks at 18hz, 40-50hz and 55hz, with 20-30db of SNR. Warp harmonics can be very clearly separated from the background noise floor. These are good things because they show the flutter analysis showing actually useful stuff.
The Technics AM flutter plot rises remarkably above 200hz, while the other tables do not. The reason why this is so potentially remarkable is because the speed control loop is supposed to operate somewhere around this frequency. However, it does not show up on the Technics FM plot, which is a strike against this being control-related.
While I am not automatically normalizing the AM results yet, I can say that the average AM DC value for the LP2D was 0.20955, which means signal has a peak-to-peak fluctuation of roughly 3.6%. This is astonishingly high. The Technics AM peak-to-peak value is roughly 1.6% which is also very high. There are no really good explanations for this yet.
There are other puzzling things about the AM spectra. Besides the rising response on the Technics plot and the large magnitude, there is a hump in the AM response of the LP2D at 140hz, several peaks on the FM flutter plot that do not exist in AM plots, and the FM wow plot indicates very many tones that do not exist on the AM plot.

Submarine remastering: The secret, inferior version of Pearl Jam's "Vs."

2009-01-08T14:04:00.012-06:00

Despite no documentation, there are at least two separate masterings of Pearl Jam's LP "Vs." floating around. One of them is clearly inferior to the other. Simply put, one of the masters was highpassed at around 40hz, with the resulting digital overs clipped, to generate the new master. Such a master suffers from both reduced bass and greater amounts of clipping.

I haven't seen any documentation, either online or on the printed packaging of "Vs." that I just saw at the store, to suggest that this remaster ever took place. Therefore I'll dub it a "submarine remaster". Just like in the bad old days of vinyl, when different pressings of the same record might have used vastly different mastering practices with their only documentation being their dead wax info, a submarine CD remaster may be considerably different from the original, with a different pressing indicated by a different ISRC.

Here's a plot of the eq differences between the two masterings, measured from the first 60 seconds of "Animal". The bottom axis is frequency in hz; the side axis is attenuation in db. A 1-second window was used with a 50% overlap. At 20hz the masterings differ by 12db, so this ought to be quite audible with a suitably good system.

Here's a zoom-in comparison of the waveforms, illustrating the two stages of clipping in one of the masterings, where one of the clipping stages matches the clipping found in the other mastering. This may not be audible, but it certainly could have been avoided and is unfortunate.

Unfortunately, I don't have ISRCs for the two pressings, but ~~I am 95% certain that the bassier/less clipped master is from the original 1993 CD, and the newer/more distorted master is from the 2004 Sony reissue.~~ (Update: After speaking with greynol further and comparing notes, we now have no idea exactly which master corresponds to which release. Both of our CDs predate 2004, but his has the two clipping slopes identified in the red plot above, while my CD has the single clipping slope of the black plot. Comments here affected by this have been struck.)

The pressings both ReplayGain to the same value - no gain or added dynamic range compression was applied. I don't know which version is used for iTunes/Amazon music downloads. Looking at a copy of the Vs. CD at a store, there is no visual indication on the outside of the case that it is either the 1993 or the 2004 pressing.

So what's my point? Besides being a very odd example of a "remaster", and one that's worth posting about, the circumstances surrounding this release suggest that no notice may be given if a newer pressing of an album is a remaster. ~~Those looking for the "authentic" version of Vs., and certainly the one with the most bass, should almost certainly look towards the original 1993 CD.~~

More generally, if you're looking for an "authoritative" master of a back catalog CD, it may still be important to buy the original pressing, even if a newer pressing does not advertise any remastering. Or, if the newer pressing surreptitiously corrected a flaw in the original, vice versa. But if the CRCs match between two pressings, then the newer pressing matches the old one exactly. And If they don't, it's still possible the differences are trivial, like a difference in track offsets. A thorough analysis must be done to concretely identify the differences between each master. Got that?

Thanks to greynol for the heads up.

Update: See above for new notes on pre-2004 pressings.

A good USB turntable guide

2009-01-01T13:44:00.003-06:00

USB Turntable Comparison Guide

The maintainer of Knowzy (a commission-based website with detailed guides on selected topics) recently posted a guide to USB turntables based on a lot of input from the HydrogenAudio community, including myself. It is very accurate and free of audiophile hyperbole. I'd recommend it to people who are considering purchasing a USB turntable and would like to know all the technical reasons to (not) purchase one. (FWIW, I have absolutely no ties to Knowzy besides my contributions to the guide.)

Geek-out time: a CD4 record

2008-12-17T00:26:00.005-06:00

CD4 records are sufficiently uncommon (or sufficiently expensive) that I was quite pleased when I found this one for $4. (In fact, this specific pressing is not currently in eBay's auction records, or GEMM's. It's mentioned once on popsike.) Ironically, I don't have a CD4 decoder (yet), I don't own a surround sound speaker system, and I'm not a fan of the Raspberries. I haven't even listened to the music yet. I just wanted to see a used LP have a frequency response go out to 45khz.

One occasionally still sees the argument that "anything on vinyl above 30khz gets wiped out after a few plays". While that may certainly be true for a large number of turntables in use today, it certainly hasn't happened here - this record very clearly has been played before.

Think classical music mastering does not employ clipping? Think again

2008-11-13T10:56:00.005-06:00

This is a waveform plot from the opening two seconds of the third movement of Bela Bartók's second piano concerto, conducted by Pierre Boulez, recorded in this decade, and released by Deutsche Grammophon in 2005. There is an opening drum hit, shown here, that is fairly unequivocally clipped near 0dbFS for 121 samples (2.7 ms). This is a large length of clipping by most standards.

In many music listener's minds this would be considered a gross mastering error. But the sound quality notwithstanding is spectacular, and the long-term loudness plot shows no signs of hypercompression (or indeed, perhaps any compression): the peaks of the music are merely being sawed off. However, I have no independent proof showing that the recording has not been compressed in other ways.

A very similar style of mastering can also be found on two other recent Boulez discs: one of Varèse, and another with his own ..explosante-fixe... and Notations (the latter played by Aimard). These recordings consistently brickwall-clip at or near 0dbFS, for between 4-100 samples, but otherwise are mastered clean as a whistle. But another recent DG CD of mine (Aimard's performance of Bach's Art of the Fugue) has no clipping whatsoever.

It's important to note that, as I've mentioned in the past, drum transients are perhaps the kind of signal most forgiving of clipping/limiting. As they are already very spread out in the frequency domain, the added smearing from this kind of clipping could be rather effectively masked. Listening to this music, there is no "snoking gun" kind of artifact that lets me say with certainty that this distortion is audible.

Nevertheless, that DG would choose to make this sort of compromise in their modern releases is troubling. I used to consider Boulez's recording of Amériques one of my reference tracks for critical listening; I'm considering searching for alternatives now.

Original thread on HydrogenAudio.

Update 19 May 2009: Steve Hoffman notes that the recent DG CD of Boulez's Maher 8 is also clipped.

Some clarifications on "Lars's Paradox"

2008-10-02T02:32:00.004-05:00

From what I have heard of the CD master of Death Magnetic versus the GH3 mix, the GH3 mix is quite superior and the CD master is inferior even to other contemporary records. In that I'm pretty my in agreement with most other commentary.

That said, I certainly speak for nobody besides myself when I say that. My musical tastes, are, shall we say, immodest. My idea of a song with real dynamic range is the recent Boulez/CSO recording of Varèse's Amériques, or perhaps the Autechre and Hafler Trio collaboration æo³/³hæ. Or Mahler 3. It's to a certain degree ironic that we are debating relatively minuscule differences in dynamic range compared to stuff like that.

While many people apparently agree with my impressions of DM, many people don't. Look at the roadrunner thread below, or the comments for this YouTube video, or any number of dreck threads on metallicabb.com, and you'll find at least a few people who either do not mind the CD or believe it is better than the GH3 mix. It's been pointed out that some of their thinking is rather stupid - "this is what metal is supposed to sound like! you aren't a true metal fan if you don't like the CD mix! Roll over and die if you don't like loud music, pops!" - etc. But spaced between that are people who are saying, well, the CD sounds just fine. And a lot of people seem to have a problem with that - they do not understand why people would prefer music that is of far higher distortion than necessary.

I really can buy the argument that some people - artists and listeners alike - really are looking for this kind of distortion. Listen to some samples of Times New Viking's Rip It Off , and then marvel at the adoration it has received in some circles of the music community (and put the gun away - I know you hate Pitchfork, but hear me out here).

The layer of fuzz works like a security blanket-- a way of creating not just a distinctive sound, but of putting up an awning of safety over them and their listeners. Only the slightest bit of straining brings you to the pop virtues of these songs, on the band's own terms. Sure, it's an affectation, but its just another way of using the studio as an instrument in a way that makes these songs more intimate by design-- for better or worse, you can't sell a Volkswagen with a Times New Viking song. If cleaner production means truckloads of new bands who can summon their influences with little effort, and even less enthusiasm or creativity, then I'll stick with my tinnitus, thanks.

It's like I've said elsewhere: where do you draw the line on artistic intent? Where do you draw the line on unlistenability? Would people seriously believe that Venereology would sound "better" if Merzbow didn't use as much hard limiting? What kind of limits are acceptable on some genres of music when music in other genres would completely obliterate them?

Second-guessing the musician is easy, and occasionally even accurate, but it can also be cheap and unpersuasive. If mainstream production/mastering practices are being called into question, yet the musicians's music itself is not, the musicians will always set the terms of the debate. Examples abound of artists who seemingly add distortion for distortion's sake - Billy Corgan and Igga Pop are two examples of people who helped produce records from some time ago that were reviled for their lack of dynamics and ear-piercing distortion (Zwan's Mary Star of the Sea, in Corgan's case).

If you're not prepared to put your money where your mouth is, and stop buying music from the bands you love with the mastering you hate, then you will need to empathize with those artists, and those listeners who agree with their decisions, and truly understand what's going on in their heads comapred to yours. Only then will you be able to reconcile their musical tastes with your own, and hopefully, support a solution that makes everybody happy. Otherwise, Metallica and their business associates will continue to sleep quite soundly over all of this.

Lars's Paradox, or, Everything You Know Is Wrong

2008-09-30T03:11:00.009-05:00

"Listen, there's nothing up with the audio quality. It's 2008, and that's how we make records... Of course, I've heard that there are a few people complaining. But I've been listening to it the last couple of days in my car, and it sounds fuckin' smokin'."

Look, people. The dude isn't fucking deaf. Rick Rubin is also, contrary to popular opinion, not deaf (he owns a rather nice hifi, in fact). Metallica as a band is not deaf. Vlado Meller is not deaf. Millions of music listeners are not deaf. And now quite a few people are coming out the woodwork and saying that Death Magnetic sounds just fine, thank you very much. They too are not deaf.

To suggest otherwise, or to suggest that something is inherently wrong with the way they are listening, is merely fallacious smearing, and honestly, unintelligent. Continuing to insist that music products like Death Magnetic are not of a sufficiently high quality without further proof - especially in the face of #1 sales - is only going to continue the abject apathy that the rest of the music world seems to treat this whole issue with.

Certainly, Rick Rubin knows exactly what he's doing when he produces records like this, and he is quite certain in his belief that it is towards delivering a superior product, as his interview with Michael Fremer made abundantly clear:

Ultimately, if you listen on a car sound system or in the mainstream place where most people listen to music—cars, boomboxes sound systems you get at (chain stores), and if you “A/B” the less compressed version to the more compressed version, you pick the compressed version... Even in a good car stereo. We do shoot-outs all the time. I master with as many as five different mastering engineers mastering the same album and then we “A/B” them and it’s interesting, Vlado wins nine out of ten times, and he claims it’s not him. He’s got technology in that room that’s a 2 million dollar mastering suite that other people don’t have. All I’ll tell you is that my whole job in life is to A/B things, that’s all I do, and for some reason, I don’t know that what he’s doing is necessarily the best, but I haven’t heard anything to beat it and we try.

That the album distorts needlessly is established beyond a reasonable doubt, thanks to mastering engineer Ted Jensen's comments, and comparisons with the vinyl. I haven't bought the album, but I have listened to the free clips from Metallica's web site, and the YouTube GH3 rips, enough to know that I'd prefer the GH3 versions.

But let's have some perspective here. The truth of the matter is that this is a serious counterexample to the entire narrative of the "loudness war": that, despite diverse objective and subjective evidence that modern hypercompressed mastering styles degrade sound quality and music appreciation, the vast majority of music listeners, at all experience levels, at least continue to buy such purportedly terrible masterings, and may even prefer them to less compressed styles. I am going to call this Lars's Paradox, since Lars Ulrich, belligerent bastard that he is, has managed to wade neck-deep into the middle of this like he always tends to do. But whether due to a similar level of belligerence, or devil's-advocacy, or whatnot, I'm actually going to take his side here for a minute.

I believe any fight against hypercompressed mastering in the "loudness war" will founder until this paradox is resolved. More concretely, and extending to other issues, I am claiming the following:

Claims of the hypercompressed style resulting in reduced musical enjoyment are completely unproven except on personal, anecdotal, and therefore meaningless, grounds. Real studies need to be done, in real listening environments, to show that the application of hypercompression is a detriment to popular music and the popular music industry.
Objective evidence is inaccurate in arguments regarding mastering. Objective evidence cannot prove statements about enjoyment. Such analyses must be more explicit in their relationship between the music, the dynamic range, and the dissonant distortions if they are to be ultimately taken seriously. Waveform plots, ReplayGain, RMS, and pfpf are all highly deficient in one way or another here.
(Lars's Paradox) Evidence suggests that the hypercompressed style is preferred by at a large amount, and probably most, of the popular music listening population. Both audio professions and untrained listeners are making this preference. For the uncompressed styles to be taken more seriously, it must be shown concretely that this preference is based on faulty measurements, or is otherwise false in meaningful and important ways.

As long as these points stand, the argument against hypercompression will remain fundamentally flawed, and popular music will continue to be released in the hypercompressed style. Regardless of how many petitions get signed. Marginal releases like on vinyl and high-res formats obviously don't follow this logic as much, nor does classical and experimental music, etc. By and large, those are not popular genres or (yet) popular formats, and this discussion revolves largely around popular music. But there's simply no hope for popular CD/iTunes releases to follow any different mastering style as long as these issues exist with this whole argument.

(I have my own ideas, revolving mostly around psychoacoustics, for resolving the paradox, but they are as yet unfinished.)

Update, October 1: Debate on the new JusticeForAudio.org forums.

Update, October 2.

Waveform Plots Considered Harmful

2008-09-17T21:00:00.006-05:00

(Revision 2.)

One of the most commonly used audio measurement tools used today is the waveform plot. This is the graph of the audio signal versus time. It directly relates the amplitude of the signal to the various parts of the musical piece, and seems very easily interpreted. It is also extremely sensitive to changes in mastering, and so is the tool of choice used for illustrating "the loudness war". From sound editor screenshots to animations to elaborate, egg-in-a-frying-pan-style "this is your music on drugs!" Youtube videos, it has become something of a mascot for hypercompression.

Unfortunately, it can also be highly misleading. In general, waveform plots cannot be solely trusted as an evaluator of sound quality. Use caution when using them, because they may lie to you.

An Example: Vinyl waveforms can look better than they really sound

The most common point made with a waveform plot is to prove or disprove the existence of clipping, or more generally, hard limiting. Clipping refers to a music signal that is clamped to a fixed magnitude if that magnitude is exceeded - the peaks of the music are "clipped" off. Hard limiting is roughly the same concept - clipping is a form of hard limiting - but allows for more flexibility in how the fixed magnitude is approached. Typically, a hard limiter will allow for a softer transition between the "undisturbed" region of the music and the "limited" region.

The outer shape of a waveform plot shows the peak level of the music over time. If that peak is at exactly the same level across an entire song, down to a fraction of a dB, it is believe to be good evidence that a limiter was applied to the music; the flatter the peaks are, the more limited the music is, and thus the more distorted the music is. Conversely, the more ragged the peaks are, the less distorted the music is.

Rather than describe an elaborate explanation as to why this is not true, it is far easier to just find a counterexample. Here's a waveform plot of a song from the album Mirrored, by Battles, in both CD and LP versions, loudness-equalized.

(Battle's Leyendecker. White: CD version amplified -7.44db. Red: Vinyl version. Recorded with Technics SL-1200, AT-OC9MLII, flat transferred with Yamaha GO46 at 44.1khz. 2M samples of left channel at +4M samples from start of track. Time axis in seconds.)

"Excellent! The vinyl really is better!" you think. Visually, the CD is quite obviously "hypercompressed", and the vinyl is clearly not - in fact, the vinyl peaks around 6.6db higher than the CD does, once the two tracks are loudness-equalized. Based on this sort of plot, 99% of the music listeners who care about dynamic range in their music would believe the mastering on the vinyl release is superior.

But 99% of music listeners are dead wrong.

(Same plot; zoomed in to 27.43-27.44s)

Vinyl is full of linear distortions and noise sources that do not exist on CD, and so the waveforms rarely if ever match at short time scales, even if the mastering is the same. Such is the case here. But it should be quite obvious that the clipping that exists on the CD also exists on the LP. The "flat" regions on the CD very closely match the "flat" regions on the LP. [1]

This is fairly incontrovertible proof that the vinyl release of Mirrored originates from masters utilizing a similar amount of dynamic range compression as exists on the CD. And yet, the first waveform plot appeared to clearly indicate a higher dynamic range. In the case of vinyl specifically, this conclusion can be wrong for a number of reasons:

The recorded vinyl waveform is not sample-aligned with the CD waveform. Intersample peaks that are not visible on the CD waveform suddenly become visible on the vinyl waveform. In exceptional cases this can artificially raise the peak level by a few dB.
Variations in the frequency response of the vinyl system (both in recording and playback) can artificially raise peak levels. One particularly strong culprit here is low-frequency rumble at the tonearm-cartridge resonance (around 10hz) which can easily dwarf all the other musical content on the recording. (In this comparison, the rumble was filtered out.)
Tracing and tracking error in the tonearm system can artificially raise peak levels.

A similar situation exists with FM radio, which is also compressed very aggressively, although waveform plots of recordings do not show an excessive amount of limiting. But applying the preemphasis, resulting in a signal analogous to what is actually transmitted, results in a much more obviously limited waveform.

Another Example, with a CD

Metallica's Death Magnetic is being widely criticized as one of the most sonically unpleasant popular records of recent memory. And yet, its waveform plot, while ugly, is not perfectly flat:

"My Apocalypse"; CD; source.

Many albums that visually appear worse than this sound much better.

Why can waveform plots lie? Blame your ears

The human ear may be relatively insensitive to limiting, if it only happens a few times a second - for instance, if only the percussion is affected. Bob Katz has commented in Mastering Audio that in regards to peak limiting, "a rule of thumb is that short duration (a few milliseconds) transients of unprocessed digital sources can be reduced by 4 to 6 dB with little effect on the sound." Those people who would worry about 10-20 sample runs of clipping on CDs should note that 1ms = 44 CD samples; examples of hard limiting become exponentially harder to find as the number of consecutively limited samples increases. Mastering engineers use peak limiting all the time because, in all honesty, the human ear ear often lets them get away with it! In other words, clipping that may appear blindingly obvious on the waveform plot may not be audible.

On the flip side, when limiting occurs with periodic signals - like those emitted by most non-percussion instruments - it becomes a form of high-order distortion, and may become very, very audible. In the worst case, limiting of very few samples at a time (1-5) may be audible.

Furthermore, different forms of hard limiting have different effects on the sound quality. Clipping is merely one of the most dissonant forms of hard limiting. A large market exists for high-quality limiters that remove as much of the peak as possible, while maintaining as much of the sound quality as possible. And yet all of these limiters will appear more or less the same on a zoomed-out waveform plot. How can can sound quality be evaluated from such a plot when one can't use it to tell apart different forms of limiting?

When limiting is tastefully done, by a professional mastering engineer, it can negligibly distort the music while making it much more enjoyable for those of us with less expensive or desirable listening environments. In this situation, the effect of more general forms of dynamic range compression may be more objectionable, in that the reduction of dynamic range is worse than the addition of light amounts of limiting distortion.

How can I tell if one master is superior to another if waveform plots are bogus?

Unfortunately, while many alternatives exist to the waveform plot to measuring mastering quality, they all have major downfalls. None of them are trustworthy.

Many people use ReplayGain track gain values as an estimator for the dynamic range of the music. Most importantly, it is easily fooled by global changes in gain - so that the "fake" dynamic range of the Battles recording above would completely fake out ReplayGain. Certainly, ReplayGain values place limits on how much dynamic range may exist in the music, and it can be used to guess a lot about the mastering, but it does not, in fact, actually measure dynamic range, nor does it measure distortion. Many albums considered great-sounding from the likes of Radiohead and Gorillaz tend to Replaygain in the -10dB range, which many people would consider "hypercompressed".
Audio engineers use an "RMS" value that is conceptually about the same as ReplayGain, and shares its faults. However, it is an acceptable figure of merit proving that Iggy Pop is a crappy producer.
pfpf was more or less explicitly designed for this sort of thing, but only in the context of changes in dynamic range - not in terms of clipping, or any kind of timbral changes. This approach should be fairly good teasing out changes in dynamic range compression, but quite insensitive to limiting. Also, pfpf is still sort of half-baked, and it's not clear yet how close the long/medium/short term numbers need to be before concluding that two samples are of the same master.
The Sparklemeter was designed by somebody else on somewhat similar lines as pfpf, but specifically relating to measuring mastering quality. See that thread for my comments.
Some people are content with using the "Peak Level" result from CD audio rippers such as EAC as an indicator of audio quality: the idea being that CDs with a peak level lower than 99% run a much lower risk of hard limiting than those at that peak level or above. This is wrong in so many ways it's hard to count. It is, quite simply, the least accurate method possible of evaluating mastering quality.

Perhaps the most accurate method of evaluating mastering quality, though, is the simplest: Asking the mastering engineer. They, of course, would know best.

Another method is to simply use your ears. Know what clipping sounds like, and what other forms of hard limiting sound like, etc., but also evaluate your own music collection's sound quality, and investigate the mastering techniques used with its songs.

What is to be done?

Avoid the use of zoomed-out waveform plots to prove points about sound quality. They convey less information than you might think, and they are easy to misinterpret.
Do not trust the sound quality of a record simply because it looks good on a waveform plot. The ear is not an oscilloscope. Waveform plots are an informational tool, but the only relate to the perception of hearing in an abstract sense. There are plenty of ways that a good-looking waveform plot can sound terrible!
Avoid using any numeric measurement for evaluating audio quality, unless you understand exactly its exact meaning for audio perception. Most numeric tools today are fairly flawed. They can be used to make meaningful statements about mastering quality in specific circumstances, but they can also make lots of meaningless or flawed statements.
Do not buy vinyl on mastering merits alone, unless you have information coming from the vinyl mastering engineer attesting to its superiority over other release. As a rule of thumb, the cost of a vinyl remaster is high enough that those labels that choose to remaster will make it quite clear to potential customers that they did so, and the labels that didn't, won't. But of course, just because a special mastering job was done for a vinyl release doesn't mean that the mastering changes were significant. Caveat emptor.
Enjoy music purely on its subjective merits, but pay attention to your perceptions and look for ways to quantify it. There's too much good music out there today to ignore because the mastering is crap. And despite the shrill cries of the hi-fi set, the sound quality of music today is still considerably better than what it was for (most of) the last 60 years. And if all the kids today have no problem listening to the music, who's to say that us old farts can't listen to it too?

Nevertheless, reduced dynamic range and increased distortion in modern mastering are real issues. Solving them requires subjective and objective evaluations of sound quality, rigorous in their execution, to convince the audio world that this is not mere idle talk of ignorable audiophiles.

For further reading: the HydrogenAudio forums, including some recent topics on the subject, and a wiki entry on vinyl mastering (semi-maintained by me). Related thread.

Footnotes
[1] The regions of clipping on the LP version of "Leyendecker" seem to imply that it is slightly less limited than the CD version. However, the difference is so slight that it is not believed to be audibly superior - and besides, the example still breaks the myth that vinyl must necessarily not be hypercompressed.

Changes
17 September 2008: Incorporated feedback. Previous revision. Thanks to David "2BDecided" Robinson, Lyx, krabapple, and others on HA, for their feedback.

Metallica's "Death Magnetic": Clips on both CD and Vinyl

2008-09-12T17:09:00.004-05:00

Much hay has been raised about the sound quality of this album, although interestingly enough, none of it has been from mainstream music journalism (or even the blogosphere) yet. The mastering engineer has even disowned the sound quality of the record, passing the buck onto the mixing engineer, and ultimately, the band itself.

The MP3 prerelease leaks have been so commonplace (and so bad sounding) that a lot of people are buying the 2LP and 5LP versions thinking that they are getting superior sound quality. However, judging by the large amount of clipping still extant on the vinyl, they probably won't:

Image credits: hdsemaj on stevehoffman.tv. Original post.

Subjectively, some people are reporting that the warmer sound of the record is dampening the clipping somewhat, but really, that's damning with faint praise.

My recommendation is to spend as little money as possible on the release until they sell a better master. Metallica.com is streaming the album for free right now. Beyond that, buy the MP3s. Personally, I smell a rat.

Some thoughts on a new technique for clipping detection

2008-09-11T01:38:00.005-05:00

I decided to riff on an idea I had for smarter ways to detect clipping. I'm sure it's not a new approach - a quick search on Google pointed out several papers on the same basic concept - but I'm not aware of it being used for audio, or for mastering evaluation specifically.

Clipping clamps the signal to a constant value. It also tends to occur right in the middle of signal content which is of a high power. If the signal derivative is calculated, the DC component of the clipping is effectively eliminated, bringing the values to 0, while the values before/after the clipping are relatively unmodified. (Specifically, low frequencies are attenuated and high frequencies are boosted.) So naively, one could use this derivative as the basis for a clipping detector - compare y' to y, and if y' is zero or very close to 0 while y is of a high power, you may have clipping. This technique would be immune to attenuated clipping - if it occurs at -10db it should work as well as if it is at 0db.

However, this approach fails when gross frequency response distortions are introduced - like what exists on vinyl. As discussed earlier, vinyl clipping examples exist which are sloped, not flat. The derivative of these a sloped line is a constant nonzero value. The workaround for this is simple: take another derivative, the second derivative, so that this constant nonzero value collapses to 0. In theory this could be extended to an arbitrary number of derivatives, but because high frequencies are amplified, background noise tends to dominate the response after the 2nd derivative, so the 3rd and beyond are pretty useless for vinyl analysis.

What I'm ultimately hoping for is to have the final output be a histogram and running a threshold on that to give an estimate for how many clipped samples exist in the signal. This allows comparisons between signals that are not sample-aligned (as is usually the case with vinyl vs. CD comparisons).

Here's what I have so far. First, some clipped stuff from Leyendecker again, on CD:

Note how the clipped content is neatly crushed to 0 on the second derivative plot - and most importantly, that the histogram plot shows a lot of values on the left, outside the distribution curve with a high signal energy and a low 2nd derivative energy. Those are the clipped samples.

Now for the LP version, different part of the file:

The 2nd derivative crush still occurs, but it's not nearly as prominent as on the CD. And the clip signal is so weak on the 2nd derivative plot (or the noise is so high) that none of it really shows up on the histogram plot; there's no real indication of clipping there.

But it's a start.

Feedback on "Waveform Plots Considered Harmful"

2008-09-10T20:50:00.003-05:00

I received lots of feedback on my HA thread and some in the comments. I intend to update the paper incorporating this feedback, archive the original somewhere, and post it for more general consumption soon.

Several people commented that the tone of the article in general was too inflammatory, and given that I used waveform plots to make a few important points, potentially hypocritical. I blame the insanely late hour that I wrote it. It will be edited to be a bit more evenhanded.
One person commeneted that the particular waveform examples I used seemed to imply that the vinyl master is clipping significantly less than the CD master. I'm seeing that too, and I cant' really explain it. It's clearly not some sort of analog effect in the playback system, and it seems to be consistent across the entire disc. The fundamental issue is the same - buying vinyl does not always mean you are getting a product without hard limiting - so I think the article still stands up well. But the specific point of that example, that there exists a vinyl master which provably has just as much clipping as the CD master, is compromised significantly.

Note, also, that other people have observed vinyl waveform plots that don't share Mirrored's subtle difference in clipping levels. The clipping levels really do seem to match in those cases.
A clearer distinction needs to be made between periodic clipping of periodic signals - which is extremely audible even in small amounts - and clipping of transients, which is far less audible. Modern mastering practices can occupy either extreme or somewhere in between.
I'm equivocating between clipping and hard/brickwall limiting; clipping is only one form of hard limiting. A proprietary hard limiter is capable of doing the same job that clipping does, but with potentially far less audible distortion. It's an open question as to how much more audible clipping is than a good hard limiter. Nevertheless, the damage done to the audio is quite significant for all hard limiters.
While we're on the subject of things that should be made more clear, the extensive use of dynamic range compression (of the non-limiting variety) clearly has a far more audible effect on the sound than limiting/clipping.
Important tip: Bob Katz commented that he always sends vinyl masters out without the use of the hard limiters used on the CD masters. Yay!

MT9: Terrible marketing, better potential?

2008-08-04T12:57:00.005-05:00

(Reposted from an Audioholics post.)

There was a little hubbub recently over this Korean codec called "MT9" that claims to attempt to unseat MP3. Most sane people rightfully call that bollocks. Even the English-language MT9 site calls it bollocks.

But I wouldn't count it out quite yet. There are a lot of subtle implications to MT9 that I don't think people have fully realized.

All MT9 appears to be is a container format for an unmixed record. That is, instead of taking a multitrack production and downmixing all the instruments to stereo, you encode each instrument to a separate track as .MT9 and let the player do the downmixing. There's no technical innovation involved here. MT9 is probably (well hopefully) just a container around a mainstream codec like MP3 or AAC.

Therefore, MT9 does not in any way compete with MP3 or other mono/stereo lossy codecs - although it may be able to use them internally. As is mentioned, it could be used as an alternative means to deliver music, but the odds of it ever catching on in popular music are rather slim. That all press discussion (and MT9's own web site!) have focused on that aspect is quite unfortunate.

From a encoder standpoint, this is still kind of a win - because there's a 1 to 1 correspondence between channel and instrument, you no longer have to worry about weird stereo collapse issues, you only have to tune the encodings for mono, etc. The bitrate would likely still be much higher than MP3 for high quality, simply due to the number of channels involved.

From a playback quality point of view, the MT9 system precludes the use of global dynamic range compression and limiting. That is, because mixing is deferred until playback, mastering must also necessarily be deferred until playback. This, of course, is a partial solution for ending the loudness war and is a huge win. Compression can still be applied to individual tracks, but because the listener has control over the volume of individual tracks, there would be much less impetus for producers to try to make a particular track stand out in the mix. Of course, this also strongly implies that producers would not need to employ mastering engineers in the traditional sense, bringing costs down.

This has virtually no chance of supplanting other formats for commercial music. But the deals with karaoke and possibly cell phones are probably the perfect application for this at the moment: very closed markets, and the music is often custom produced anyway so doing a multichannel production is not a big deal. But as I mention, I suspect I wouldn't mind buying normal music in this format either.

Foot wrapped in tonearm cables and inserted in mouth - film at 11

2008-07-18T14:27:00.005-05:00

I made a recording of a 2-LP set before ever listening to it, and after mucho processing, I listened and discovered that it was corrupted by large amounts of ground noise. Bleh. I had to throw them out and re-record.

After much searching I discovered that the power supply from my Dell laptop was the culprit. Moving the turntable cables away from the PS relieved the noise. Moving them back increased it. Wrapping the cable around the power supply amplified the noise tremendously.

The noise consisted of peaks about 30db above the noise floor at 60hz and 180hz, many peaks in the 20db range from 200-10khz, and a huge mass of +20db harmonics centered at about 13khz and 18khz. Not subtle at all.

This throws a major wrench into my whole way of thinking about balanced turntable connections. I was under the impression that a well-shielded balanced cable is fairly immune to EMI of this nature; clearly I'm mistaken.

Microphone and pro audio connections have the same issues; I'll need to research how sensitive they are supposed to be to this sort of thing before I consider making cabling/equipment changes.

Recording sample: Einstein on the Beach

2008-07-11T23:33:00.005-05:00

Here is a 30-second sample of my current recording process. It's the first 30 seconds of side 3 of the CBS Masterworks pressing of "Einstein On The Beach", encoded with lossyWAV --portable in FLAC.

I bought the boxed set at Amoeba Records in SF for $9, and it's easily some of the best pressed and maintained vinyl I have ever laid eyes upon. (The music is excellent too.)

Note the hiss before the violin begins - that is the 0404 USB's mic preamp noise I was talking about. It is fairly prominent in silent passages, but is difficult to identify while most music is playing, and is also pretty hard to spot with noisy vinyl. So it's fairly important to get this fixed, as far as vinyl upgrades are concerned. Besides that, I think the sound quality is excellent.

The art of the flat balanced digital phono preamp

2008-06-27T11:07:00.010-05:00

This is a brief overview of the work I've done with recording vinyl directly to a computer, with balanced cabling, without the use of RIAA equalization. Sorry, no pictures.

The Problem

Somewhere in my audio pursuits, I wound up desiring access to a flat-eq phono preamp - one that didn't have any RIAA equalization applied. I also wanted balanced inputs and an integrated ADC, so that the signal path would go straight to the computer. (My audio stuff is very close to my computer stuff, so issues like ground hum and EM interference are pretty important to me.) Such things don't seem to exist in the audiophile world. The closest tailor-made solution is EnhancedAudio's flat preamp system, which is not balanced, and perhaps, not audiophile enough.

The Solution

But enter Pure Vinyl. Although it's Mac only, it has full support for flat transfers (and in fact recommends them), and goes into great detail on how to actually accomplish it.

In summary: a phono preamp without RIAA equalization is really just a fancy way of describing a microphone preamp. The gains for a phono preamp are about the same as for a mic preamp, 40-70db. The load impedances for most mic preamps are in a reasonable range (1.5kohm) to load an MC with, although to be honest, I've never cared much for the finer points of MC loading (but that is for a separate post). So about a year ago I bought an E-MU 0404 USB, which is widely praised in some circles for its high quality microphone inputs, and I got started.

Pros

Mic preamps tend to be much more sanely valued than phono preamps. Even major tweak territory is not going to run you more than a thousand dollars a channel (compared to perhaps 5k/channel for a phono stage).
Mic preamps can obviously perform double duty as, well, mic preamps. This may help their resale value.
External computer audio interfaces tend to be a) very inexpensively priced, b) ridiculously packed with features, and c) very high quality. There are many extremely good interfaces in the <$500 price range worthy of consideration. Many also have high quality analog/digital outs, headphone outs, etc.
When used with balanced connections, a virtually noise-free signal path is guaranteed.
If the mic preamp has accurate low-frequency response or is DC coupled, very accurate rumble measurements can be derived. This is extremely useful for several technical measurements.

Cons

Software support for flat recordings is virtually nonexistent, with the notable exceptions of Pure Vinyl and DC Six. I've pretty much written my own software for this.
MC gains may test the upper limits of mic preamp performance.
May not be compatible with MM cartridges, depending on configuration.
Because of the scarcity of users of this scheme, many unforseen issues may develop (see below).
Per-channel mic gain control prevents completely accurate level matching. (But level matching was never that good on vinyl to begin with.)
Phantom power on XLR cabling may damage turntable gear if improperly wired.
XLR/TRS turntable wiring doesn't exist. Even balanced wiring in the form of 5-pin DIN is hard to find, and requires special cabling to terminate to XLR/TRS.
There are theoretical objections to flat-eq recordings on the basis of reduced dynamic range at some frequencies and a greater risk of overload in others. Rob Robinson of Pure Vinyl has argued convincingly that this should not be an issue.

Software Implementation

Software equalization was accomplished with a naive implementation of an RIAA filter as a prewarped bilinear transformed IIR filter, in LabVIEW. At 44khz the response isn't great (it's several db off at 20khz). I spent some time trying to make a better filter, but was mostly rebuffed. LabVIEW's facilities for optimizing FIR filters were not able to get to within +-0.1db across the audio band without fairly long lengths. Regardless, it sounds pretty good as is.

I've been informed of a technique by Robert Orban to use Remez optimization on IIR filters to make an extremely accurate filter. I'll probably do this in the future.

http://www.dsprelated.com/showmessage/73300/3.php

For rumble, I use two Butterworth highpass filters operating on the L+R and L-R signals. L+R is order 6 at 25hz; L-R is order 8 at 35hz. These numbers were evaluated a long time ago observing spectral content of blank grooves on HFNRR - the lateral and vertical content of the rumble is largely different. Of course, the exact nature of the rumble varies from record to record.

It's worth noting that Pure Vinyl has its own very high quality filters that would moot all this work had I be using a Mac. In particular, the rumble filter is phase-distortion-free.

Hardware Implementation #1

I didn't have a low impedance MC cartridge the first time around, merely an AT440ML, and I didn't want to rewire the turntable I had at the time, so I built an adapter box to convert RCA to XLR and add 47kohm of resistance. (I added capacitance too but there is likely enough capacitance in the entire circuit to make that unnecessary.)

This sorta worked, but the frequency response was completely off. After reading the 0404 USB docs between the lines, I determined that the XLR inputs were constantly loaded at 1.5kohms - completely unsuitable for MM use - but the TRS inputs were at 1Mohm. So I replaced the plugs with TRS and went on my way.

The resulting recordings sounded acceptable, but there were major EM interference issues. 60hz ground harmonics were severe, and oddly, peaked quite strongly at high frequencyes (15khz), which wound up being audible. I chalked this up to three issues:

the resistance/connector adapter box I built was fairly ramshackle.
The turntable wiring itself was not great.
The AT440ML (and apparantly most MM cartridges) wire one of the signal return pins to its ground. This may cause more issues than it fixes in this particular configuration.

Hardware Implementation #2

So I scored a Technics SL-1200MK2 over Christmas, and an Audio Technica OC9MLII a few months ago. I also procured a balanced audio cable from Blue Jeans cable - about 3 meters of Belden 1800F terminated with Neutrik TRS. I cut the cable in two and replaced the external wiring on the 1200 with the 1800F (causing some pretty severe damage to the 1200's cabling circuit board in the process).

Everything worked great, except for one glaring problem: noise. The SNR was way, way too low. In fact, what had happened was that the 60db of gain on the 0404 counted against its SNR, resulting in roughly 50db of SNR before equalization. This is apparantly 40db (or more!) worse than it should be.

The built-in gain on the XLR inputs was a smidge better, and the load impedance is compatible with the cartridge, so I was able to buy a few more db by rewiring back to XLR. But longer term the only surefire solution is to get a better preamp. One of the bigger potential drawbacks to XLR is that phantom power (+48V to the turntable ground!) is only a push button away, but after deliberately pressing this and not seeing any ill effect beyond lots of noise while it's enabled, it should not be a concern as long as the cables are properly wired.

Recording process

Recording is currently done in the version of Cubase LE that came with the 0404. The 0404 USB has major driver issues and consumes 50% of the CPU time, on a 3ghz P4, while playing back or recording audio. Overrun susceptibility is also quite high, and it's virtually unusable on my Dell laptop (although that may be more of an issue with the network drivers than the card). So most of the time no other interrupt-intensive work can be done while the recording is in progress.

After the recording is over I run RIAA eq and derumble in LabVIEW. Then I open the wavs up in Audacity and manually remove the loudest pops, normalize, trim the edges and export to 16/44 lossless. Tagging and transcoding is done in foobar2000.

The 0404 USB (and most 2-channel mic preamps for that matter) have separate gain pots for each channel, resulting in a fairly obvious level imbalance. This can be calibrated away somewhat by using a test record, but my de-rumble utility also estimates L/R energy content in db, which I can then use to amplify one channel over the other in Audacity.

Future Plans

The 0404 USB really has to go.
I would prefer some more specific tool for recording the audio over ASIO. A LabVIEW interface to PortAudio would arguably be most powerful, and would most easily let me do online monitoring, but short of that, a commad line recorder would be great.
If a preamp can be obtained that has acceptable gain with Hi-Z inputs, then I could craft in-line load resistance correction for MM cartridges over the XLR pins. That would cleanly solve MM loading issues.
Develop an accurate real-time high speed meter capability, to read test band levels off in real time, to aid in adjusting mic gains.
Grab a hold of (or write) some A-weighting filter routines to compare SNR ratios with.
Streamline the recording and processing chain. Pop/tick removal, normalization and trimming can in large degree be made automatic. Those steps take up perhaps 10-30 minutes per record and speeding those up would make large-scale recording much faster.
Of course, I could just buy a Mac and Pure Vinyl and forget about this, or (Goddess forbid) buy an MC phono stage, but what's the fun in that?

Thanks to Rob Robinson at Pure Vinyl for pointing me to this technique and overall guidance.

UPDATE 1: recording sample.

UPDATE 2: "Trouble In Vinyl-dise", Or, "How does Dell Manage To Sell Such Cheap Computers? Film At 11".

Some notes on pfpf

2008-06-27T10:11:00.003-05:00

It's been quite a while since the first update, and I didn't mean this to be a one shot deal, so I might as will give a status update on pfpf. I haven't had the opportunity to work on a new version, but plenty of comments have been made so far:

On Missing Files. First, my hosting provider, storing my screenshots and files, disappeared without any contact information. I just got a new one set up, so all the links work again.

On Download Sizes. Lots of people don't like the 90MB LabVIEW Runtime download, or that registration is required on NI's website to download. I can't get rid of the runtime download entirely, but there is a smaller (28MB) runtime that may work for you - download here.

On Magnitudes. Several people commented that the dynamic range figures seemed too low. A well-mastered pop track may only show up as having only 3-4db of dynamic range on short/medium time scales, and virtually no range on long time scales. Extremely dynamic symphonic works may only have 16db of dynamic range, where by most "normal" evaluations, there should be more like 40-60db. While the choice of scaling has little effect on comparison of pfpf-derived numbers, it has a great effect on their overall interpretation in relation to other decibel figures

Much of this stems from the choice of percentiles used in the variance measurement (from the 50th to the 98th). If this range were to be doubled, the numbers would probably fall more into line with what people normally expect. This could be accomplished by either doubling the 50-97.7 figure.

On Dynamic Range Manipulation. Michael Jamsmith and I independently came up with the idea of running pfpf's histogram plot in reverse to make a Photoshop-like levels adjustment for loudness on a music track. In other words, a reversible 2-pass dynamic range compressor. Certainly something to work on in my copious free time :)

On Resolutions. One persom didn't like that a greater than 1024x768 resolution is required. I'll see what I can do to make the resolution requirements nicer, but I can't promise much. I might just require that people use a 1680x1050 or higher display.

Bob Katz's comments.

An Alternative Proposal - the Sparklemeter. Chromatix on HA has recently proposed measuring dynamic range by comparing the ratio between a PPM and a VU meter (suitably modified), preparing a histogram of the result values, and computing a figure of merit based on the mean and variance of that histogram. The resulting functionality is similar to the medium- and short-timescale figures that pfpf outputs, but the use of exponential-decay meters is new, as is using direct percentile values from the histogram rather than ranges between percentiles. Using VU/PPM meters gives the result great intuitive meaning for audio engineers, although I fear the required modifications may compromise that. Watch that thread.

pfpf: An Experimental Estimator of Dynamic Range in Music

2008-01-13T20:49:00.001-06:00

The dynamic range of a selection of music is dependent on both estimating the time-varying loudness of the music and the timescale used for loudness evaluation. I propose a numerical method of estimating dynamic range that satisfies those dependencies using a modified ITU-R 1770 loudness filter and three moving windows to estimate loudness across three different timescales. The goal is to more accurately measure and compare dynamic range between different music genres and different masterings and processing techniques for the same music.
Summary of algorithm:

Apply ITU-R 1770 filters to convert amplitude to instantaneous loudness.
Estimate loudness across three different timescales by computing 10ms ("short term"), 200ms ("medium term") and 3000ms ("long term") windowed RMS power.
Decouple timescales by scaling 10ms loudness by 200ms loudness, and 200ms loudness by 3000ms loudness.
Threshold loudness at each timescale to remove silence (optional)
Compute histogram for each loudness estimate
Dynamic range = range between 50th and 97.7th percentile, for each timescale

Using the pfpf application

The algorithm is prototyped in a LabVIEW application built for Windows, downloadable here. Unzip it into a new directory. You also need to download the LabVIEW 8.2.1 runtime.

Basic instructions: Run the program and open the folder icon on top to select a WAV file. Press the "Run analysis" button. The file is scanned for instantaneous loudness (indicated by the progress bar) and then a histogram operation is performed to calculate the dynamic range. The output is displayed at the bottom. Additionally, other tabs display plots of instantaneous loudness and histograms.

Interpreting the results:

Long-term dynamic range - loudness changes across multiple seconds, or across multiple measures of a piece of music. Wide swings in orchestration and sustained loud/quiet passages increase this number. Dynamic range compression, in any form, decreases this number. Typical values range from 16db for extremely dynamic orchestral and experimental music to 1-2db for pop/rock singles.
Medium-term dynamic range - loudness changes across hundreds of milliseconds, or single notes. Aggressive dynamic range compression can reduce this.
Short-tern dynamic range - loudness changes across single milliseconds. The use of extremely percussive instruments can increase this. Extremely aggressive dynamic range compression, especially limiting, can decrease it.
ITU-R 1770 loudness - estimate of loudness as per the ITU-R 1770 recommendation.

Example Results: Long/medium/short dynamic range for various tracks:

Musical piece	Long	Medium	Short
Autechre - "Sublimit"	4.0	4.0	10.6
Autechre - "Dial"	1.0	2.8	12.6
Shellac - "Genuine Lullabelle" (long term thresh=-50db)	14.3	7.4	6.9
Merzbow - "I Lead You Towards Glorious Times"	0.64	0.34	0.65
John Mayer - "Waiting On The World To Change"	2.7	3.2	8.9
Battles - "Tonto"	3.2	2.7	5.2
Soundgarden - "Black Hole Sun"	2.5	2.4	4.4
Autechre & The Hafler Trio - "æo³"	14.9	4.0	6.7
Harnoncourt, Beethoven Sym. 9, Chamber Orchestra of Europe (Harnoncourt)	13.5	4.0	4.4

Screenshots

Configuration and output tab:

Loudness plot tab:

Histogram tab:

Advanced configuration

These options affect the computation of the dynamic range; when they are modified, the results should always include the new configuration. The "Output" string was created for this purpose.

Thresholds: If the instantaneous loudness drops under the threshold associated for that time scale, that timescale loudness (and the loudness for any shorter timescale) is clamped to NaN, and ignored in future dynamic range calculations. This is to prevent silence (assumed to be below the listening noise floor and is therefore inaudible) between music from affecting the results. Silence skews the histogram results so as to artificially compress dynamic range across all timescales. Its loudness also varies considerably between different formats (notably vinyl vs CD) and masking it aids in making an accurate comparison of formats.
Time scales: Controls the rms window size (in seconds) for each time scale.
Percentiles: By default, dynamic range is calculated as the loudness range between the 50% and 97.7% percentiles from histograms at each time scale of loudness. These percentile levels may be adjusted.

Application License

The pfpf application is free for non-commercial use. Do not redistribute it. Source code is available upon request (requires LabVIEW 8.2 or above and the Digital Filter Design toolkit).

Contact Info

Message me (Axon) on HydrogenAudio, or comment below.

Known Issues

It is important to take the results with a grain of salt. Transient loudness estimation is a topic of ongoing research, and no truly accurate method has yet to be agreed on. pfpf currently uses a moving-window modification to Leq(RLB), but in the future, a more elaborate loudness estimator, like HEIMDAL, might be used.
DC removal is applied at each short block (defaults to 0.01 seconds of signal) that is read, which are composed into the larger medium/long (0.2/3s) blocks. The end result is that the signal receives a 100hz highpass before analysis, removing all bass information. This is anticipated to not be a big deal because of the relatively small contribution that LFE provides to loudness models.
Histogram computation is not factored into the progress bar, so there is a noticeable pause between the completion of the progress bar and the display of results.
Beware of falling code. Parameters may not be well tested for failure cases or obviously incorrect inputs.

Document Revision History

13 January 2008: Initial revision.

Frequency response (from 40 Hz to 15 kHz), dB	+0.27, -2.77	Average
Noise level, dB (A)	-91.4	Very good
Dynamic range, dB (A)	91.2	Very good
THD, %	0.011	Good
THD + Noise, dB (A)	-68.2	Average
IMD + Noise, %	0.413	Poor
Stereo crosstalk, dB	-80.6	Very good
IMD at 10 kHz, %	0.032	Good
General performance		Good