Audio toolbox: using acousticLoudness() in reverse for "normalizing" audio signal

Edited for clarity.
I have a set of speech signals that were peak normalized. I want to adjust the gain of each signal so they have an equal loudness (same idea as replay gain).
I am calling the function :
acousticLoudness()
The signal I'm passing to the function is the original recording.
First I wanted to use the ISO 532-2 method, since the sound presentation will be done with headphones, but my signal is time-varying (speech signal), so I must use the ISO 532-1 method instead. How would I compensate for the non-linearity of the headphones and the fact that the sound will not be presented in a free field? I thought of just building a filter to pre-process the signal before feeding it to acousticLoudness() but I'm not sure where to start.
The headphones used are Sennheiser HDA300 (spec sheet).
One I have a "replay gain" value for each signal based on an arbitrary reference, I will calibrate the amplifier by measuring the dBSPL output of a 1kHz signal.
Thank you!

 Accepted Answer

If you want average perceived loudness, consider omitting the time-varying option even though the signal isn't stationary.
Otherwise, one way to apply a frequency response (besides the obvious filtering of the signal) would be to compute octave SPL first (see the example in the acousticLoudness doc) and compensate for the headphone response in each band using the table you provided before giving those SPL values to acousticLoudness.

6 Comments

First, thank you for your help, it is much appreciated.
Thank you for the suggestion of omitting the time-varying option, I thought it would yield a valid average perceived loudness.
About using the data from the tables provided by Sennheiser, how would I need to transform the data in the sensitivity table to get a headphone calibration table? Would I simply subtract the dBSPL at 1kHz for each frequency?
If I was to use the Zwicker method (your second suggestion) with a free field sound stage, my understanding is that I would have to correct not only for the non-linearity of the headphones, but also for the sensitivity of the ear simulator. That means I would need the "free field equivalent earphone output level". To get that from the datasheet tables, do I take the headphone sensitivity data, reference it to the level at 1kHz and add the numbers from the first table on p.2 (Difference between free-field sensitivity level GF and ear simulator sensitivity level GC)?
Hi Frederik,
I think I should take a step back and ask what kind of precision is required to attain your objectives. On one hand, using 532-2 would be the easiest, but might not work well enough for your application. On the other hand, to be as precise as possible, there are several details to consider; some you mentioned, another would be that the SPL input is also for stationary (it could be called every 0.5 to 2 ms, but that would bypass the persistence part of the time-varying model).
Nonetheless, you still had great questions, so let’s address those. For using the earphone table, you need “an Nx2 matrix containing N frequency-amplitude pairs that describe the earphone's deviations from a flat response. Specify the frequency in Hz (in increasing order) and the amplitude deviation in decibels.” To do like 532-2 with earphone responses, you need to subtract the free field difference for your values (since 532-1 assumes free field by default). Then, I agree that setting an offset at 1 kHz to 0 dB could be a good thing to do.
Just to see how the earphone table works, here is an example. Since this headphone has an attenuation of 13.8 dB at 100 Hz, the computed loudness is lower (exercise: compare loudness without the earphone response to the loudness using gain=db2mag(50-100+13.8) and the earphone response).
% A 100 Hz tone presented binaurally via the Telephonics TDH-39
% earphones with a nominal output sound pressure level of 50 dB.
Fs = 48e3;
Ft = 100;
N = 1e6;
x = sin(2*pi*Ft/Fs*(1:N).');
gain = db2mag(50-100);
% 30x2 table for the TDH-39 response
tdh = [ 0, 80, 100, 200, 500, 574, 660, 758, 871, 1000, 1149, 1320, 1516, 1741, 2000, 2297, ...
2639, 3031, 3482, 4000, 4500, 5000, 5743, 6598, 7579, 8706, 10000, 12000, 16000, 20000 ; ...
-50, -15.3, -13.8, -8.1, -0.5, 0.4, 0.8, 0.9, 0.5, 0.1, -0.8, -1.5, -2.3, -3.2, ...
-3.9, -4.2, -4.3, -4.3, -3.9, -3.2, -2.3, -1.1, -0.3, -2, -5.4, -9, -12.1, -15.2, -30, -50 ].';
acousticLoudness(gain*x,Fs,'Method','ISO 532-2','SoundField','earphones','EarphoneResponse',tdh)
You can also use 532-2 to compare free field to a flat response at the eardrum (which is the same as an ideal earphone).
Hi Jimmy,
Precision
About what precision is required, the short answer is < 0.3 sones.
Say we have 10 stimuli and each will be presented at 5 different intensities. For a same intensity of presentation, a difference in loudness between two stimuli should not be perceptible for the average of all participants.
Based on the figure below, considering the range of intensities in our study roughly translates to a range of 60 to 80 dB SPL, 0.3 sones seems to be the point where listeners can notice a difference in loudness at lower intensities.
Pedrielli, F., Carletti, E., & Casazza, C. (2008). Just noticeable differences of loudness and sharpness for earth moving machines. Journal of the Acoustical Society of America, 123(5), 3164-3164.
Using the artificial ear data from Sennheiser instead of doing real testing of our physical pair of headphones with an in-ear microphones according to ISO 11904-1:2002 is a limitation, but I hope to still be within 0.3 sones of precision.
Our lab uses a high quality sound card and headphone amplifier with an almost perfectly linear response, so it should not distort the spectrum significantly. I think I can get away with not including them in my model.
Sound field
Thank you for the exercises, it helped clarify how the sound field changes the computation. Here is the code for anyone who might find this thread in the future.
% A 100 Hz tone presented binaurally via the Telephonics TDH-39
% earphones with a nominal output sound pressure level of 50 dB.
% ie. 50dB SPL @ 1V RMS @ 1kHz
hpNominalOutput = 50; % Nominal output SPL of headphones
calibrationFactor = 100; % The default calibration factor of acousticLoudness corresponds to a
% full-scale (1V RMS) 1 kHz sine wave with a sound pressure level of 100 dB (SPL)
Fs = 48e3;
Ft = 12000; % Frequency of sine wave in hz
N = 1e6; % Number of samples
x = sin(2*pi*Ft/Fs*(1:N).');
gain = db2mag(hpNominalOutput-calibrationFactor);
% 30x2 table for the TDH-39 response
tdh = [ 0, 80, 100, 200, 500, 574, 660, 758, 871, 1000, 1149, 1320, 1516, 1741, 2000, 2297, ...
2639, 3031, 3482, 4000, 4500, 5000, 5743, 6598, 7579, 8706, 10000, 12000, 16000, 20000 ; ...
-50, -15.3, -13.8, -8.1, -0.5, 0.4, 0.8, 0.9, 0.5, 0.1, -0.8, -1.5, -2.3, -3.2, ...
-3.9, -4.2, -4.3, -4.3, -3.9, -3.2, -2.3, -1.1, -0.3, -2, -5.4, -9, -12.1, -15.2, -30, -50 ].';
figure('Name','With earphone response calib table');
acousticLoudness(gain*x,Fs,'Method','ISO 532-2','SoundField','earphones','EarphoneResponse',tdh)
freqIdx = find(tdh(:,1)==Ft,1); % Index of the frequency of the stim in the headphone calib table
hpRspCorr = tdh(freqIdx,2); % Headphone's deviation from 0dB at stim frequency
correctedGain = db2mag(hpNominalOutput+hpRspCorr-calibrationFactor);
figure('Name','With correction for earphone sensitivity applied to signal');
acousticLoudness(correctedGain*x,Fs,'Method','ISO 532-2','SoundField','earphones')
figure('Name','No earphone calibration / correction for sensitivity');
acousticLoudness(gain*x,Fs,'Method','ISO 532-2','SoundField','earphones')
figure('Name', 'Flat response at eardrum (ideal headphone)');
acousticLoudness(gain*x,Fs,'Method','ISO 532-2','SoundField','eardrum')
figure('Name', 'Free sound field, flat signal');
acousticLoudness(gain*x,Fs,'Method','ISO 532-2','SoundField','free')
Going forward
What do you think my options are?
Is a stationary model acceptable?
If the time-varying 532-1 should be used, is the best option to filter the signal to get the free field equivalent earphone output? Are there any obvious pitfalls I need to avoid?
Thank you again for your help.
If you have short segments where the silent parts have been removed (or they are excluded from the loudness computation), you might get away with the stationary model. In my opinion it's worth experimenting with.
If you are able to design a filter like that, it would probably be ideal since you'd preserve the time persistence part of the model. Have a look at the algorithm section of the doc to see what blocs are added in time-varying mode.
I hope that this has been helpful, it looks like we might have reached the point where you're the subject matter expert.
Thank you for all the help you provided! I will do some experimentation.
I reached out to Sennheiser about how to use their published data to make a free field equivalent earphone output level / mirrored free-field equalization filter. I will publish any answer I get here in case it can be of value to others.
Sounds great, I'm interested in getting a follow-up on your application!

Sign in to comment.

More Answers (0)

Categories

Find more on Simulation, Tuning, and Visualization in Help Center and File Exchange

Products

Release

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!