13 January 2008

pfpf: An Experimental Estimator of Dynamic Range in Music

The dynamic range of a selection of music is dependent on both estimating the time-varying loudness of the music and the timescale used for loudness evaluation. I propose a numerical method of estimating dynamic range that satisfies those dependencies using a modified ITU-R 1770 loudness filter and three moving windows to estimate loudness across three different timescales. The goal is to more accurately measure and compare dynamic range between different music genres and different masterings and processing techniques for the same music.
Summary of algorithm:
  1. Apply ITU-R 1770 filters to convert amplitude to instantaneous loudness.
  2. Estimate loudness across three different timescales by computing 10ms ("short term"), 200ms ("medium term") and 3000ms ("long term") windowed RMS power.
  3. Decouple timescales by scaling 10ms loudness by 200ms loudness, and 200ms loudness by 3000ms loudness.
  4. Threshold loudness at each timescale to remove silence (optional)
  5. Compute histogram for each loudness estimate
  6. Dynamic range = range between 50th and 97.7th percentile, for each timescale

Using the pfpf application

The algorithm is prototyped in a LabVIEW application built for Windows, downloadable here. Unzip it into a new directory. You also need to download the LabVIEW 8.2.1 runtime.

Basic instructions: Run the program and open the folder icon on top to select a WAV file. Press the "Run analysis" button. The file is scanned for instantaneous loudness (indicated by the progress bar) and then a histogram operation is performed to calculate the dynamic range. The output is displayed at the bottom. Additionally, other tabs display plots of instantaneous loudness and histograms.

Interpreting the results:
  1. Long-term dynamic range - loudness changes across multiple seconds, or across multiple measures of a piece of music. Wide swings in orchestration and sustained loud/quiet passages increase this number. Dynamic range compression, in any form, decreases this number. Typical values range from 16db for extremely dynamic orchestral and experimental music to 1-2db for pop/rock singles.
  2. Medium-term dynamic range - loudness changes across hundreds of milliseconds, or single notes. Aggressive dynamic range compression can reduce this.
  3. Short-tern dynamic range - loudness changes across single milliseconds. The use of extremely percussive instruments can increase this. Extremely aggressive dynamic range compression, especially limiting, can decrease it.
  4. ITU-R 1770 loudness - estimate of loudness as per the ITU-R 1770 recommendation.

Example Results: Long/medium/short dynamic range for various tracks:
Musical piece
Long
Medium
Short
Autechre - "Sublimit"
4.0
4.0
10.6
Autechre - "Dial"
1.0
2.8
12.6
Shellac - "Genuine Lullabelle" (long term thresh=-50db)
14.3
7.4
6.9
Merzbow - "I Lead You Towards Glorious Times"
0.64
0.34
0.65
John Mayer - "Waiting On The World To Change"
2.7
3.2
8.9
Battles - "Tonto"
3.2
2.7
5.2
Soundgarden - "Black Hole Sun"
2.5
2.4
4.4
Autechre & The Hafler Trio - "├Žo³"
14.9
4.0
6.7
Harnoncourt, Beethoven Sym. 9, Chamber Orchestra of Europe
(Harnoncourt)
13.5
4.0
4.4

Screenshots

Configuration and output tab:




Loudness plot tab:



Histogram tab:


Advanced configuration

These options affect the computation of the dynamic range; when they are modified, the results should always include the new configuration. The "Output" string was created for this purpose.

  • Thresholds: If the instantaneous loudness drops under the threshold associated for that time scale, that timescale loudness (and the loudness for any shorter timescale) is clamped to NaN, and ignored in future dynamic range calculations. This is to prevent silence (assumed to be below the listening noise floor and is therefore inaudible) between music from affecting the results. Silence skews the histogram results so as to artificially compress dynamic range across all timescales. Its loudness also varies considerably between different formats (notably vinyl vs CD) and masking it aids in making an accurate comparison of formats.
  • Time scales: Controls the rms window size (in seconds) for each time scale.
  • Percentiles: By default, dynamic range is calculated as the loudness range between the 50% and 97.7% percentiles from histograms at each time scale of loudness. These percentile levels may be adjusted.

Application License

The pfpf application is free for non-commercial use. Do not redistribute it. Source code is available upon request (requires LabVIEW 8.2 or above and the Digital Filter Design toolkit).

Contact Info

Message me (Axon) on HydrogenAudio, or comment below.

Known Issues

  • It is important to take the results with a grain of salt. Transient loudness estimation is a topic of ongoing research, and no truly accurate method has yet to be agreed on. pfpf currently uses a moving-window modification to Leq(RLB), but in the future, a more elaborate loudness estimator, like HEIMDAL, might be used.
  • DC removal is applied at each short block (defaults to 0.01 seconds of signal) that is read, which are composed into the larger medium/long (0.2/3s) blocks. The end result is that the signal receives a 100hz highpass before analysis, removing all bass information. This is anticipated to not be a big deal because of the relatively small contribution that LFE provides to loudness models.
  • Histogram computation is not factored into the progress bar, so there is a noticeable pause between the completion of the progress bar and the display of results.
  • Beware of falling code. Parameters may not be well tested for failure cases or obviously incorrect inputs.

Document Revision History

13 January 2008: Initial revision.

2 comments:

Sean E. Olive said...

Cool. Does it support multichannel wav files as well?

Richard Tollerton said...

No, but if you're ok with BS.1770's multichannel weighting scheme, adding it should be easy enough.