Compare 2 audio files - with a twist

19 views (last 30 days)
MadMax2015
MadMax2015 on 27 Jul 2015
Edited: mainak GH on 23 Mar 2021
I need to compare 2 audio recordings made on 2 different devices at the same sample rate. The recordings will be made at approximately the same time.
  • Device-1 will record raw audio data
  • Device-2 will record raw audio data
A server will be used to compare the 2 audio recordings to determine if the recordings contain approximately the same audio data (e.g. using cross-correlation). THE PROBLEM: The server cannot have access to the raw audio from either device as it could be sensitive.
Is this possible? Is there a way for the device-1 and device-2 to do something to the data before sending to the server for analysis? We must ensure it is not possible to extract the audio data on the server as it could be a recording of speech.
  3 Comments
Walter Roberson
Walter Roberson on 23 Feb 2017
ahmad shaker, you have a very different question if you are allowed to transmit the audio itself to the PC for processing. This Question dealt with a situation in which the server had to be able to tell if two "conversations" were the same without the server being able to reconstruct the conversations.
For example when you use Spotify or Google to identify what song is playing in the background, then information that is identifiably the music is transmitted, and the point is to match up with a database of known songs and tell you what it was. But in this Question, the device needed to transmit two somethings that were not permitted to be reconstructed into sound, and the server had to say whether they were "the same conversation" without being able to say which conversation they were.
mainak GH
mainak GH on 23 Mar 2021
Edited: mainak GH on 23 Mar 2021
If you wanto compare live audio then the audio has to be saved as a file in hardware . You can see these examples
But if you are interested in processing the data after receing you can use the following logic .I do not know it will help you or not.
desiredFs=48000;
audio_frame_CH1_FF=0;
audio_frame_CH2_FF=0;
[x,OriginalFs] = audioread("file.mp3"); % y samples from audio with Fs sampling frequency in [Hz].
%
N = length(x); % sample lenth
duration_audio=N/OriginalFs;
tx = linspace(0, N/OriginalFs, N);
plot(tx, x);
hold on
afr = dsp.AudioFileReader('file.mp3','ReadRange',[1 8.007*desiredFs]);% Use duration_audio
framecnt=0;
while ~isDone(afr)
audio_frame = afr();
audio_frame_CH1=audio_frame(:,1);
audio_frame_CH2=audio_frame(:,2);
audio_frame_CH1_FF=[audio_frame_CH1_FF ; audio_frame_CH1];
audio_frame_CH2_FF=[audio_frame_CH1_FF ; audio_frame_CH2];
framecnt=framecnt+1;
end
release(afr);
%save("audio_frame_full.mat","audio_frame_CH1_FF");
%whos('-file','audio_frame_full.mat')
fixedAudioFrame1=Audiotrim(x,audio_frame_CH1_FF);
fixedAudioFrame2=Audiotrim(x,audio_frame_CH2_FF);
fixedAudioFrame=[audio_frame_CH1_FF audio_frame_CH2_FF]
plot(fixedAudioFrame,tx);
hold off
%from ---https://in.mathworks.com/matlabcentral/answers/120123-comparing-two-audio-signals-guitar-chords-recognizer%
Y=fft(abs(x));
Z=fft(abs(fixedAudioFrame));
subplot(3,1,1), plot (Y);
subplot(3,1,2), plot (Z);
[C1, lag1] = xcorr(Y,Z);
subplot(3,1,3), plot(lag1/Fs,C1);
ylabel('Amplitude'); grid on
title('Cross-correlation between Transmitted Audio and received AudioFrame')
%------------------------------------------------------------------------------------------
audiowrite('processed.wav',fixedAudioFrame,desiredFs);
Please write the audioframe function as per your choice of clipping the unnecesasary data generated after receiving . For comparing the files you can refer https://in.mathworks.com/matlabcentral/answers/141137-how-can-i-compare-two-audio-files.

Sign in to comment.

Answers (2)

Dinesh Iyer
Dinesh Iyer on 27 Jul 2015
You are attempting to check whether two audio recordings are the same without allowing access to the audio data for any kind of comparison purposes.
Instead of sending audio data, you can do something as simple as send the Fourier Transform co-efficients of the audio to the server and work in the frequency space. However, if the "server" figures out that it is receiving FFTs, a simple IFFT will destroy the privacy.
There are some user submissions on MATLAB File Exchange that allow you to encrypt data using various algorithms. You can use those to encrypt the signals, but u will have to decrypt them before using it.
  1 Comment
MadMax2015
MadMax2015 on 28 Jul 2015
Thanks Dinesh You're right - IFFT will get back the original audio which is not acceptable. It's not going to work for me

Sign in to comment.


Walter Roberson
Walter Roberson on 27 Jul 2015
I make no promises at all that the following would be strong enough for classified materials.
First decide on the maximum lag that you need to be able to correlate for and convert to number of samples, Lag.
Next, decide on a maximum length to handle, Len
Then quantize the signals. Pad to Len long.
For each possible lag L from 0 to Lag samples, hash (in the sense of trap door non-reversible encryption) the quantized signal (L+1:Len). Transmit all Lag+1 versions to the server
On the server, correlate all Lag+1 versions from one device to all Lag+1 versions from the other device. If you find a match then the contents are approximately the same between the two devices.
If you have done a good job on the hashing, then a difference of 1 unit in the quantization at one spot will result in very different outputs of the entire hashed signal. But you'll probably end up doing block hashing, in which case the hashed signal will only differ from that sub-block onwards. Having the leading blocks hash the same way leaves open the possibility of differential analysis which is to be avoided for classified data, so be careful about how you perform the hashing.
Having a difference of 1 unit in a single location result in a non-match is a problem from the perspective of audio correlation, but it is a security feature. When you are dealing with classified data on an unclassified server, the only thing that should be deducible is "are they the same or are they not?". Remember you can use a low-pass filter to get rid of noise.
You could consider techniques such converting the signal into feature vectors before hashing. For example taking blocks of data, scaling them to constant maximum and minimum, and calculating slopes over several samples and quantizing those. Or looking at the techniques of perceptual encoding, or what is done for low-bitrate audio encoding. You could do a wavelet encoding and truncate coefficients.
Though a lot is going to depend upon what is meant by "approximately the same". If you want to be able to tell that the trombone version of We Will Rock You is "approximately the same" as an acapella version considering the whole clips, then you have rather more work to do.
  2 Comments
MadMax2015
MadMax2015 on 28 Jul 2015
Thanks for taking the time to help out Walter. I really appreciate it.
I don't quite following the solution that involves a non-reversible hash. I have certainly thought about this approach but the because the audio is recored on 2 different devices, the result of a hash of "similar" audio recordings on 2 different devices will be wildly different. I don't see how it's possible to compare hashes.
"If you find a match then the contents are approximately the same between the two devices." That's the bit I'm struggling with. If a hash matches then the underlying data is exactly the same ... not approximately the same. And there is no way the underlying audio data will be exactly the same.
I should clarify what I mean by "approximately the same". I need to be able to tell that a recording of a conversation from 2 devices that are listening to the same conversation match. If the 2 devices are listening to 2 different conversations, then there should not be a match.
Walter Roberson
Walter Roberson on 28 Jul 2015
If one of the mics is next to the ice machine, and another is next to the TV that is playing, and you need to be able to tell whether they are listening to the same conversation, then you need to be able to filter out the background noise, which might not be simple repetitive noise (e.g., the TV might be playing a panel discussion.) With the information given so far we need to assume that there might be background speech (TV, radio, passers-by as a car drives) that needs to be automatically removed to concentrate on a "conversation of interest". I think you can see that is going to be more than a little difficult if the server is to be given only information that cannot be used to reconstruct the sound.
A non-reversible hash can be used to decide whether two things are exactly the same without there being any practical method of figuring out what the data is. The data that you would hash would be a filtered version of the original data.
For example instead of transferring a piece of music, the pre-analysis might decompose it down to notes, perhaps transposed to a common key, discarding the information about how the instruments sound (or even what instruments they are), perhaps discarding timing information, to produce some "essential characteristic" of the music. Once that is obtained, the task becomes to compare "essential characteristics" without regards to their "meaning". And you can do that by comparing an arbitrary number produced from the characteristic data, such as by hashing. If two items do not compare the same but you wanted them too then you did not correctly determine their "essential characteristics".

Sign in to comment.

Categories

Find more on Audio Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!