09 November 2009

Digital Simulation of Phonograph Tracking Distortion: Download the poster; buy the paper

The paper and poster

The poster I presented for my paper at the 127th AES convention is available for download here. It provides a good overview of all aspects of the paper, although it leaves out many technical details and explanations.

KlausR has alerted me to the existence of my paper on the AES E-Library, which means I am no longer tempted to share my LaTeX PDFs with anybody. $5 for AES members, $20 for nonmembers. Or you can go all out and buy a CD-ROM of all the convention preprints on the AES store for $150 (if you can find the link). Or you can wait until 6 months after the convention, which appears to be the usual time when authors are allowed to upload their preprints themselves.

The simulator

At some point in the hopefully-near-future I will release the source and executable to the simulator itself. It will be written in LabVIEW 8.2.1 with the Digital Filter Design toolkit. It is more or less the exact same code as I used to build the test samples for the paper: there are a few optimizations I've left out of the code because I first wanted to release what was documented in the paper.

The listening test

One of the points I tried very hard to emphasize in the paper was that, while a listening test was conducted, which provided very interesting results, from the standpoint of comprehensiveness or authority, it wasn't a very good test.
  • Only two people were involved with the test, who used two different protocols, neither of which was particularly great. Ideally, many more people should be involved, in a wider range of listening environments. And the protocol should be in some sense similar to the listening test you get at an audiologist: As the magnitude of the distortion varies in an automatic fashion, the listener records whether or not the distortion is audible. (This is actually pretty similar to Klaus did, as he did not have a computer audio setup in his listening room.) That is, the usual test concepts used for eg the absolute threshold of hearing should be employed, rather than ABX test concepts. This should allow a test which loses no statistical meaning, is potentially easier to take without the use of a computer (eg by burning all the test tracks onto CD), and avoids all the controversy around ABX testing.
  • The audio samples used were chosen on the basis of relatively little evaluation for sensitivity to tracking distortion, and how easy they were to obtain and use. I'd like to perform a more thorough survey of audio samples, potentially employing automated tools such as PEAQ to estimate audibility.
  • The interaction between tracing distortion and tracking distortion was ignored for this paper. For the purposes of testing, and more importantly of relating the simulation to real-world playback situations, this is problematic. Tracing distortion may distort the signal in such a way as to make any extant tracking distortion more audible than it otherwise would be. For instance, with an elliptical or hyperelliptical stylus, a tracking error will shift the position of the two stylus contact areas along the time axis - ie, a time delay exists between the two channels that does not exist when either tracing or tracking distortion is considered in isolation. For this simulator to be directly applicable to real-world situations, the importance of tracing distortion must be quantified and bounded, either through placing strict limits on the magnitude of its effect on tracking distortion, or by going all-out and writing an accurate tracing simulation.
Once these issues are addressed to my satisfaction, I will announce the specifics on a listening test open to the public.

10 August 2009

Digital Simulation of Phonograph Tracking Distortion

Phonograph tracking distortion results from the misalignment of a playback cartridge with respect to the cutting head. While it has been researched for decades, it remains a source of mystery: it cannot be isolated, it has not been accurately simulated, and its importance remains undecided. Here, a PCM simulation of horizontal and vertical tracking distortion of extremely high quality is presented, operating on the principle of phase modulation, allowing tracking distortion to be evaluated in isolation with real musical content. In this context, tracking distortion is equivalent to digital audio sampling jitter, with the jitter spectrum equal to the signal spectrum. Implications of this connection, as well as simulation accuracy, preliminary listening test results, and potential applications are discussed.

127th AES Convention, New York; Poster Session P14 (Signal Processing). Currently scheduled for Sunday 11 October at 10am. Date/time possibly subject to change.

Getting this paper out the door has been the predominant reason for the recent dearth of activity on this blog. More info will follow here, including a downloadable simulator implementation, the samples used for the existing listening tests, and an open invitation for a new listening test.

28 February 2009

Some thoughts on digital wow/flutter demodulation and analysis

Background

Accurate analysis of the speed stability of turntables requires a wow/flutter spectrum: a frequency analysis of speed deviation. This is performed by treating a sine wave as a carrier for a modulated signal (the speed deviation). Perform a frequency demodulation of the wave and you have your speed deviation; run an amplitude spectrum on that deviation and you have your wow/flutter spectrum, or apply the IEC wow/flutter weighting filter and obtain a peak w/f measurement to compare against rated specs. However, the peak W/F measurements tend to be vanishingly low for virtually all turntables nowadays and the spectrum is generally considered far more useful. Ladegaard's classic app note on vibration analysis contains good explanations of the importance of wow/flutter spectra when analyzing turntable sound fidelity, and it is quite unfortunate that its measurement has remained generally out of the reach of the lay audiophile.

Historically this measurement has required expensive ($10,000 or more!) lab hardware. Nowadays there are a few ways to do this digitially, but the most commonly encountered ones are rather poor. The crudest method is simply to look at a spectrogram of the tone: this gives the rough outline of the modulation, but is generally horribly CPU-intensive and has a time and frequency resolution so low as to be meaningless. A method of better utility is to simply look at the high-resolution spectrum of the wow/flutter tone; this is what Stereophile typically uses on its turntable tests. This is a trivial operation in any audio package, although some care is necessary to set block lengths, window parameters and overlap to maximize dynamic range. Alternatively, the signal may be mixed (multiplied) with a synthesized sine wave with a frequency matching the average frequency of the test tone. This shifts the test tone at F into mirror tones at 0 and 2F; the latter can be lowpassed out and the former can be run through an amplitude spectrum plot to obtain a baseband spectrum. This might yield improved performance by folding sidebands together to reduce noise, but any error in the synthesis frequency will cause the sidebands to not line up exactly, leading to a sort of "double vision" in the spectrum plot, where every peak appears twice.

All of these methods have major issues specific to their implementation, but by far, their biggest weakness is that they do not demodulate the signal. Accurate FM demodulation serves many extremely useful roles that simpler FFT-based schemes cannot:
  • It yields an accurate time-domain waveform of the instantaneous frequency. This waveform is absolutely necessary for computing IEC weighted wow/flutter figures.
  • It removes the many sidebands present in FM signals and integrates them into a more accurate result. Sidebands from low frequency wow may overlap the carrier band belonging to higher frequency flutter in an FFT-based scheme, ultimately leading a loss of precision in the final spectrum.
  • Usually, it is very insensitive to amplitude modulation. All FFT-based schemes will flatten both the AM and FM signals into a single spectrum, which may be a major inaccuracy. In some cases, the FM demodulation can also output the AM signal along with the FM one.
  • Because an exact lock is made at the instantaneous frequency, frequency drift does not affect the precision of the result. This is an inevitable problem with FFT-based schemes.

Analog and digital FM demodulation is based on a variety of schemes. I implemented quadrature demodulation: a direct application of the Hilbert transform. This allows computation of the instantaneous frequency on a
sample-by-sample basis, at effectively unbounded frequency resolution. What's more, quadrature demodulation separates the amplitude modulation signal away from the FM signal, and once the FM is computed, the AM signal is effectively free. Also, in comparison to PLL schemes, there are very few tweakable parameters to a quadrature demodulator: set up the filter lengths, set up an input conditioning filter, and go to town. It is one of the most computationally intensive demodulation schemes, but it with some light optimization it still runs faster than real time. I got the exact frequency demodulation equation used from the comp.dsp archives , which is a corrected version of an equation in Frerking:


I(n)*Q'(n) - Q(n)*I'(n)
----------------------- = omega (F)
I^2(n) + Q^2(n)


Where I(n) is the original signal, Q(n) is the Hilbert transform of I(n), and I'(n) and Q'(n) are their respective bandlimited derivatives. The formula for amplitude demodulation is:


A = sqrt(I(n)^2 + Q(n)^2)


For the case of a pure single-tone signal, these equations are exact. However that does not occur in a noisy environment and so errors can generally creep into the demodulation, most prominently through clicks and pops in the recording.

Implementation

I took my existing RIAA filter equations to apply reverse RIAA if needed - my thinking here is that when recording a wow/flutter tone, the non-flat filter response around the tone frequency will lead to inaccuracies in the flutter spectrum. It should also yield a smaller AM signal amplitude. It has Orban's hardcoded coefficients for 44100 and 96000hz sampling rates. For other sample rates the bilinear transform filter is used, which is notoriously inaccurate at low sample rates.

Input conditioning is currently handled with a Kaiser FIR bandpass, centered on the tone frequency, with beta=5 and (IIRC) 2047 taps. Obviously, the filter must be linear-phase. The subjective cleanliness of the demodulated output waveforms, as well as the quality of Audacity's declick operations on the output files, demands that this filter exist and it have relatively gentle transition bands. But the demodulation operation itself does not appear that sensitive to wideband noise, so this may be optional for a spectrum analysis. The rejection of the stopbands and the linearity of the demodulation and further analysis is high enough that post-demodulation filtering is rendered unnecessary.

Some care must be taken to keep the time delays of all the various data paths the same. Most importantly, the 1-tap derivative filter x[1]-x[0] will not work because the effective time delay is 0.5. A FIR filter must be constructed with integral delay, along with delay lines to keep I(n) and Q(n) delayed the same amount as I'(n) and Q'(n).

The Hilbert FIR filter was constructed by crafting an impulse signal and applying a Hilbert transform to it; the derivative FIR filter was hand constructed. I'm using FFT-based convolution, so the number of taps available for the Hilbert and derivative filters is scandalously large. I use 262143-tap filters and the whole thing runs 3x realtime. At those sizes, windowing the filters is unnecessary.

Stereo recordings are summed to mono, and there is some control over the L/R balance to ensure an optimum signal. FM signal outputs are in units of the specified frequency: If the test tone is set to be 3150hz in the demodulator, then 0dbFS = 3150hz, -20dbFS = 10% and so on. The AM signal is in PCM amplitude units but makes no attempt to reverse the RIAA eq if applied in the demodulator.

Signals had DC removal done in Audacity. If there were extremely strong clicks they were usually removed manually.

Normalization of AM signals (to a fraction of the carrier amplitude) is not yet done. This means that while the AM results may be shown in the same chart, they are not directly comparable against one another. They are nevertheless provided for comparison against the FM plots.

Spectrum analysis was done in my usual handrolled code. Windowing is Hamming to maximize frequency resolution and because dynamic range requirements are not great; overlapping the previous FFT window at the rear 75%.

IEC wow/flutter measurements are not yet implemented. IEC drift measurements are not yet implemented, although even if they were, they allegedly require a 30s averaging time which might be problematic when dealing with 60s wow/flutter tracks.

I implemented this in LabVIEW 8.21. (Full disclosure: I am a former employee of NI and I own shares.)

Samples

  • AT-LP2D-USB recording of the Ultimate LP wow/flutter, supplied by knowzy.com
  • My recording of the STR-151 3khz tone - Technics SL-1200, Audio Technica AT-OC9MLII. Recorded with flat eq on Yamaha GO46.
  • A recording of a HFS75 wow/flutter track from a currently-unnamed contributor. Unfortunately this track is only ~10 seconds long, so especially the wow results will have a lower resolution. The noise levels seem to be much higher too.

Results

Assuming a carrier frequency of 3.15khz, a peak speed deviation of 2% and a peak modulation frequency of 250khz, Carson's Rule suggests a signal bandwidth of 626hz. I therefore bandpassed the input signal to a 700hz bandwidth before demodulating. (A discussion and analysis of higher frequency flutter is well beyond the scope of this paper.)

Spectrum analysis is done at window lengths of 28.8 and 1.8 seconds. At 28.8s, the record warp peaks are extremely visible while the underlying spectrum is easily seen. At 1.8s, much more windows are available to average together and the warp harmonics completely merge together, resulting in a clearer, more accurate plot. The 28.8s plots go from 0hz to 20hz in a linear scale while the 1.8s plots progress logarithmically from 1hz to 250hz. Plot smoothing begins at 100hz for the 1.8s plots.

For the following spectrum plots, the AT-LP2D results are in blue, my results are in red, and the HFS75's in green. The AT2D had an average speed deviation of 0.167% relative to 3.15khz; the Technics deviated by 0.42% against 3khz, and the HFD75 recording deviated by -2.2% against 3khz - I don't have a good explanation for the latter, except perhaps that the HFS75 track might be something like 2950hz instead of 3000hz; also, the test records for the latter two tables are considerably older than the first, and this might reflect the quality of the lathes than the quality of the turntable.

LP2D demodulated waveforms: black is FM, red is AM


Technics demodulated waveforms: black is FM, red is AM



HFS75 demodulated waveforms: black is FM, red is AM


Frequency modulation spectrum, 0-20hz


Frequency modulation spectrum, 1-250hz



Amplitude modulation spectrum, 0-20hz



Amplitude modulation spectrum, 1-250hz


Discussion

Analysis of the process:
  • In general, because different test records were used for each recording, the turntables are not entirely comparable based on these measurements except in situations where a clear interpretation can be made. They are plotted on the same charts primarily for easier viewing.
  • Demodulation is extremely sensitive to input noise - the HFS75 recording was substantially noisier and that really did compromise all its plots. Its modulation spectrum above 50hz simply cannot be trusted. In general, maximizing the SNR of the tone will maximize the SNR of the flutter spectrum. Using a declick utility may improve performance.
  • The averaged flutter spectra required roughly 30s of input signal to mostly average out, so at least this length is necessary for a good plot. The HFS75 track is 10 seconds long which is way too short. Testing suggests that enabling/disabling inverse RIAA eq does not significantly change the AM or FM waveforms.
  • Testing at higher bandwidths does not show significant flutter beyond 200hz. The noise levels are markedly higher starting at 500hz, and background noise tones in the recording (such as a 1khz sine and harmonics on the LP2D) show up spuriously in the modulation spectra.
  • Compare to Ladegaard's B&K plots - is my noise floor higher or lower? I can't really tell at the moment.

Analysis of plots:
  • The LP2D plots show clear peaks at 18hz, 40-50hz and 55hz, with 20-30db of SNR. Warp harmonics can be very clearly separated from the background noise floor. These are good things because they show the flutter analysis showing actually useful stuff.
  • The Technics AM flutter plot rises remarkably above 200hz, while the other tables do not. The reason why this is so potentially remarkable is because the speed control loop is supposed to operate somewhere around this frequency. However, it does not show up on the Technics FM plot, which is a strike against this being control-related.
  • While I am not automatically normalizing the AM results yet, I can say that the average AM DC value for the LP2D was 0.20955, which means signal has a peak-to-peak fluctuation of roughly 3.6%. This is astonishingly high. The Technics AM peak-to-peak value is roughly 1.6% which is also very high. There are no really good explanations for this yet.
  • There are other puzzling things about the AM spectra. Besides the rising response on the Technics plot and the large magnitude, there is a hump in the AM response of the LP2D at 140hz, several peaks on the FM flutter plot that do not exist in AM plots, and the FM wow plot indicates very many tones that do not exist on the AM plot.

08 January 2009

Submarine remastering: The secret, inferior version of Pearl Jam's "Vs."

Despite no documentation, there are at least two separate masterings of Pearl Jam's LP "Vs." floating around. One of them is clearly inferior to the other. Simply put, one of the masters was highpassed at around 40hz, with the resulting digital overs clipped, to generate the new master. Such a master suffers from both reduced bass and greater amounts of clipping.

I haven't seen any documentation, either online or on the printed packaging of "Vs." that I just saw at the store, to suggest that this remaster ever took place. Therefore I'll dub it a "submarine remaster". Just like in the bad old days of vinyl, when different pressings of the same record might have used vastly different mastering practices with their only documentation being their dead wax info, a submarine CD remaster may be considerably different from the original, with a different pressing indicated by a different ISRC.

Here's a plot of the eq differences between the two masterings, measured from the first 60 seconds of "Animal". The bottom axis is frequency in hz; the side axis is attenuation in db. A 1-second window was used with a 50% overlap. At 20hz the masterings differ by 12db, so this ought to be quite audible with a suitably good system.




Here's a zoom-in comparison of the waveforms, illustrating the two stages of clipping in one of the masterings, where one of the clipping stages matches the clipping found in the other mastering. This may not be audible, but it certainly could have been avoided and is unfortunate.

Unfortunately, I don't have ISRCs for the two pressings, but I am 95% certain that the bassier/less clipped master is from the original 1993 CD, and the newer/more distorted master is from the 2004 Sony reissue. (Update: After speaking with greynol further and comparing notes, we now have no idea exactly which master corresponds to which release. Both of our CDs predate 2004, but his has the two clipping slopes identified in the red plot above, while my CD has the single clipping slope of the black plot. Comments here affected by this have been struck.)

The pressings both ReplayGain to the same value - no gain or added dynamic range compression was applied. I don't know which version is used for iTunes/Amazon music downloads. Looking at a copy of the Vs. CD at a store, there is no visual indication on the outside of the case that it is either the 1993 or the 2004 pressing.

So what's my point? Besides being a very odd example of a "remaster", and one that's worth posting about, the circumstances surrounding this release suggest that no notice may be given if a newer pressing of an album is a remaster. Those looking for the "authentic" version of Vs., and certainly the one with the most bass, should almost certainly look towards the original 1993 CD.

More generally, if you're looking for an "authoritative" master of a back catalog CD, it may still be important to buy the original pressing, even if a newer pressing does not advertise any remastering. Or, if the newer pressing surreptitiously corrected a flaw in the original, vice versa. But if the CRCs match between two pressings, then the newer pressing matches the old one exactly. And If they don't, it's still possible the differences are trivial, like a difference in track offsets. A thorough analysis must be done to concretely identify the differences between each master. Got that?

Thanks to greynol for the heads up.

Update: See above for new notes on pre-2004 pressings.

01 January 2009

A good USB turntable guide

USB Turntable Comparison Guide

The maintainer of Knowzy (a commission-based website with detailed guides on selected topics) recently posted a guide to USB turntables based on a lot of input from the HydrogenAudio community, including myself. It is very accurate and free of audiophile hyperbole. I'd recommend it to people who are considering purchasing a USB turntable and would like to know all the technical reasons to (not) purchase one. (FWIW, I have absolutely no ties to Knowzy besides my contributions to the guide.)