Create new .wav files around findpeak outputs

Hi everyone,
I would like to create short .wav files containing the audio data of each detected peak (see plot below as example). I would like the data in each .wav file to include the peak and a buffer on either side of roughly 0.02 s (e.g. black boxes around peaks would each be seperate .wav files). I have include my code so far and added a zip file with a short sample of the data to run code below.
% Read in audio file
[y,Fs] = audioread('test_MATLABAsk.wav');
info = audioinfo('test_MATLABAsk.wav');
sound(y,Fs) % Play the sound
% plot the data
% Create a time component using info
t = 0:seconds(1/Fs):seconds(info.Duration);
t = t(1:end-1);
figure(1)
plot(t,y)
xlabel('Time (s)')
ylabel('Audio Signal')
% Detect peaks from y
[pk_Fs, locs_Fs] = findpeaks(y,Fs, 'MinPeakDistance',0.03, 'MinPeakHeight',0.01); % See plot example below

 Accepted Answer

Here's how I'd do it:
infile = 'C:\somewhere\somefolder\test_100m.wav'; %I'd recommend you use full path instead of relying on the current directory
outfolder = 'C:\somewhere\someotherfolder';
outformat = 'split%03d.wav'; %using sprintf format to insert peak number
halfwidth = seconds(0.02); %half width of signal to keep around peak
%read file, convert to timetable, find peak locations
[samples, Fs] = audioread(infile);
audiotable = timetable(samples, 'SampleRate', Fs);
[~, peaklocs] = findpeaks(samples, Fs, 'MinPeakDistance', 0.03, 'MinPeakHeight', 0.01);
%iterate over peaks, extract signal and save to file
for peakidx = 1:numel(peaklocs)
peaktime = audiotable.Time(peakloc(peakidx));
tokeep = isbetween(audiotable.Time, peaktime - halfwidth, peaktime + halfwidth);
audiowrite(fullfile(outfolder, sprintf(outformat, peakidx)), audiotable.samples(tokeep), Fs);
end

19 Comments

Hi thanks for this,
I tried the above and it all worked fine until:
%iterate over peaks, extract signal and save to file
for peakidx = 1:numel(peaklocs)
peaktime = audiotable.Time(peaklocs(peakidx));
tokeep = isbetween(audiotable.Time, peaktime - halfwidth, peaktime + halfwidth);
audiowrite(fullfile(outfolder, sprintf(outformat, peakidx)), audiotable.samples(tokeep), Fs);
end
where I'm getting this error:
Error using tabular/dotParenReference (line 108)
Array indices must be positive integers or logical values.
My peaklocs values aren't 1,2,3 etc. but are 0.723... 0.87... etc etc. Could this be the problem? I attached an example sound file to the question if that helps!
Then you didn't implement his code properly, because the second output of findpeaks gives a list of indices (i.e. integer numbers).
[peakvals, peaklocs] = findpeaks(samples, Fs, 'MinPeakDistance', 0.03, 'MinPeakHeight', 0.01);
Use peaklocs as written above, not peakvals, which Guillaume intentionally discarded by using ~.
Hi Daniel,
That still doesn't resolve the issue as when I run your amendment my indicies values are not integers, so when you run the following loop:
for peakidx = 1:numel(peaklocs)
peaktime = audiotable.Time(peaklocs(peakidx));
tokeep = isbetween(audiotable.Time, (peaktime - halfwidth), (peaktime + halfwidth));
audiowrite(fullfile(outfolder, sprintf(outformat, peakidx)), audiotable.samples(tokeep), Fs);
end
It gives the same error...
Ah, yes, when you pass a sampling rate Fs to findpeak, then the 2nd output "locs is a vector of time instants"
Unfortunately, the documentation doesn't specify if that's a duration vector or a plain numeric vector and I don't have the required toolbox to test.
Assuming it's a duration vector, then replace the line:
peaktime = audiotable.Time(peaklocs(peakidx));
by
peaktime = peaklocs(peakidx);
If it's a plain vector, then
peaktime = seconds(peaklocs(peakidx));
should work.
Thank you Guillaume! Okay so that part of the loop now works with:
peaktime = peaklocs(peakidx);
However, another issue then crops up as:
tokeep = isbetween(audiotable.Time, (peaktime - halfwidth), (peaktime + halfwidth));
%ERROR:
Undefined function 'isbetween' for input arguments of type 'duration'.
It seems that the 'isbetween' function cannot work with type 'duration' like the audiotable.Time. Any idea what alternative functions would do the same but allow you to use duration values?
Hum, which version of matlab are you using? In R2019b, isbetween is perfectly happy to work with duration arrays.
In any case, you can also do:
tokeep = abs(audiotable.Time - peaktime) <= halfwidth;
which is actually shorter anyway.
So, to sum up, this code should work:
infile = 'C:\somewhere\somefolder\test_100m.wav'; %I'd recommend you use full path instead of relying on the current directory
outfolder = 'C:\somewhere\someotherfolder';
outformat = 'split%03d.wav'; %using sprintf format to insert peak number
halfwidth = seconds(0.02); %half width of signal to keep around peak
%read file, convert to timetable, find peak locations (in unit of time)
[samples, Fs] = audioread(infile);
audiotable = timetable(samples, 'SampleRate', Fs);
[~, peaktime] = findpeaks(samples, Fs, 'MinPeakDistance', 0.03, 'MinPeakHeight', 0.01);
%iterate over peaks, extract signal and save to file
for peakidx = 1:numel(peaktime)
tokeep = abs(audiotable.Time - peaktime(peakidx)) <= halfwidth;
audiowrite(fullfile(outfolder, sprintf(outformat, peakidx)), audiotable.samples(tokeep), Fs);
end
Thank you so much for your help with this - it's nearly there but one slight issue left.
When I use this line of the loop:
tokeep = abs(audiotable.Time - peaktime(peakidx)) <= halfwidth;
I get 0 values in the 'to keep' which then gives me this error:
Error using audiowrite
The value of 'y' is invalid. Expected input to be nonempty.
I'm trying to get MATLAB 2019 installed on my computer so I can try the 'isbetween' function again.
But any ideas why that part keeps throwing up 0s? I've tried changed - to == to test the rest of the loop and it will output the whole sound file as a new wav files, not the short snippet, so the rest is working it's just the notation in that part I think...
At the command line issue:
dbstop if error
and run the code again. It will break into the debugger when the error happens (the command prompt will change to K>>). At this point, what are the values of peaktime(peakidx) and audiotable.Time?
Note: to cancel the dbstop if error and go back to normal of operation:
dbclear if error
So the code will run correctly to the end if you just have:
for peakidx = 1:numel(peaktime)
tokeep = abs(audiotable.Time - peaktime(peakidx)) <= halfwidth;
%audiowrite(fullfile(outfolder, sprintf(outformat, peakidx)), audiotable.samples(tokeep), Fs); % Didn't work as tokeep has only 0 values - need 1s
end
but output of 'tokeep' is all 0.
If I run the full loop with the dbstop if error it just will stop at 1 because the 'tokeep' output is empty so the 'y' part of the audiowrite function has nothing to put into a wav file.
In order to understand why tokeep is all 0s, I need to see the values of
audiotable.Time
peaktime(peakidx)
I can't get the values of audiotable.Time and peaktime(peakidx) when it breaks using the dbstop if error. I only get this output in the workspace:
dbstop.PNG
audiotable looks and stays like this and it seems like peaktime(peakidx) is only getting to row 1 or it's value of 0.723402777777778
Audiotable.PNG
peaktime having any value greater than 0.43 (audtiotable.Time(end)) is a problem and makes no sense to me since 0.43 is the duration of your wav file. Clearly, I don't understand what findpeaks return if it can return time locations greater than the signal duration. Unfortunately, as I said, I don't have the signal processing toolbox so it's difficult for me to understand what it returns. Can you attach peaktimes as a mat file?
As a workaround, I suggest that you use findpeaks without specifying a sampling rate (so don't pass Fs). Then the locations will be indices as my original answer expected. The downside is that you've got to convert the 'MinPeakDistance' to number of samples instead of time. That's easily done though, just multiply the original distance by the sampling rate. So:
[~, peaklocs] = findpeaks(samples, 'MinPeakDistance', 0.03*Fs, 'MinPeakHeight', 0.01);
for peakidx = 1:numel(peaklocs)
peaktime = audiotable.Time(peaklocs(peakidx));
tokeep = abs(audiotable.Time - peaktime) <= halfwidth;
audiowrite(fullfile(outfolder, sprintf(outformat, peakidx)), audiotable.samples(tokeep), Fs);
end
It's not as clean since findpeaks no longer works in units of time, but that's an easy way to fix the problem until I understand what findpeaks returns when given a sampling rate.
That last amendment to the code seemed to work (see figure below!).
I've attached the peaktime as a .mat so you can have a look at what findpeaks returns when given a sampling rate. I'd actually be interested to know myself if you would mind posting the answer on this thread?
peak.png
% Working code is:
infile = 'C:\somewhere\somefolder\test_100m.wav'; %I'd recommend you use full path instead of relying on the current directory
outfolder = 'C:\somewhere\somefolder\OutWav';
outformat = 'split%03d.wav'; %using sprintf format to insert peak number
halfwidth = seconds(0.02); % half width of signal to keep around peak
%read file, convert to timetable, find peak locations
[samples, Fs] = audioread(infile);
audiotable = timetable(samples, 'SampleRate', Fs);
[~, peaklocs] = findpeaks(samples, 'MinPeakDistance', 0.03*Fs, 'MinPeakHeight', 0.01);
for peakidx = 1:numel(peaklocs)
peaktime = audiotable.Time(peaklocs(peakidx));
tokeep = abs(audiotable.Time - peaktime) <= halfwidth;
audiowrite(fullfile(outfolder, sprintf(outformat, peakidx)), audiotable.samples(tokeep), Fs);
end
Well, these peaktime values don't make much sense at the moment. Could you also attach peaklocs? That would help understanding what the peaktime are.
Ok, so peaktime makes sense, you're using a wav file much longer than the one you originally posted. The duration of that file is at least 110 seconds.
peaktime is simply (peaklocs-1) ./ Fs
and peaktime is a plain vector. I did write in a comment that if it's a plain vector you needed to convert that to a duration vector with seconds. Otherwise, it's interpreted as days when you subtract audiotable.Time. So, correct code using Fs for findpeaks:
infile = 'C:\somewhere\somefolder\test_100m.wav'; %I'd recommend you use full path instead of relying on the current directory
outfolder = 'C:\somewhere\somefolder\OutWav';
outformat = 'split%03d.wav'; %using sprintf format to insert peak number
halfwidth = seconds(0.02); % half width of signal to keep around peak
%read file, convert to timetable, find peak locations
[samples, Fs] = audioread(infile);
audiotable = timetable(samples, 'SampleRate', Fs);
[~, peaktimes] = findpeaks(samples, Fs, 'MinPeakDistance', 0.03, 'MinPeakHeight', 0.01);
peaktimes = seconds(peaktimes);
%iterate over peaks, extract signal and save to file
for peakidx = 1:numel(peaktimes)
tokeep = abs(audiotable.Time - peaktimes(peakidx)) <= halfwidth;
audiowrite(fullfile(outfolder, sprintf(outformat, peakidx)), audiotable.samples(tokeep), Fs);
end
For what it's worth, I've raise a service request with Mathworks to get them to improve the documentation of findpeaks.
This is great! Thank you so much for explaining and helping me to resolve this issue.
Apologies, I must have mis-understood that I needed to convert duration with seconds.
I think updating findpeaks documentation is a good idea as well, just to ensure its clearer for others using the function!

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!