unsupervised thresholding of signal amplitudes: MLE?

I am hoping someone with knowledge of unsupervised signal processing can advise how best to handle the following problem. I have a long time series signal whose noisy baseline fluctuates over time:.
From the image above, you can see that there are low and high amplitude components. The small amplitudes are a mixture or noise plus some very low-amplitude real background signal. The small amplitude events compose the majority of events. However, I am interested in extracing the time occurrences and ampltiudes of the larger amplitude events only.
One problem is the unstable baseline on top of which the large amiplitudes ride (top subplot, black trace). Gven the fluctuating baseline, what would be best to determine the height of large amplitudes is not obvious. Relatedly, what method would be appropriate for this problem to unbiasedly determine a threshold to detect the lage ampltudes?
An approach I have tried is to first obtain the minimums of the troughs and use these values as query points (red circles in blown up image below) for fitting an interpolated line as a moving base (red dashed line). Substracting the moving base from the raw signal gives a flatten baseline (bottom subplot, blue trace):
From the flatten signal, I use a findpeaks and threshold for peak detection for anything above 3 medians. This has worked well as the majority of events are small ampltudes and three medians is selecting out the large peaks well (red and black ticks). With this methods, I can extract the time points of the large peak events and their amplitudes (as the difference of the moving basline from the raw signal at the peak points). However this is not unbiased of course and for other recordings this approach can be less effective.
To have a completely unbiased analysis, I am entertaining using an MLE approach by first using findpeaks (or peakfinder) to get the amplitudes of all the peak amplitudes (small and large) and then using this to get an unbiased threshold for selecting only the large amplitudes. Here's a histogram of the peak amplitudes:
While you can (arguably) discern two overlapping histograms (one centered around 0.5 and the other at around 2), I was hoping what was intuitively a bimodal distribution would have been more apparent. Nonetheless is there a way to use MLE to get an amplitude value at the intersection of these two overlaps corresponding to the more frequent small-amplitude and less frequent larger-amplitude events? Or is there perhaps a better way to approach this problem.
A related question to be asked is if the interpolated moving baseline subtraction is the best way to handle the noisy nature of the signal. But since there are portions that go below starting "zeroed" reference level, it is not at all clear how else to handle this fluctuating baseline.
Much apprecation and thanks for all who have read through this and can give advise/help on running this analysis in an unbiased and well-justified manner.
Cheers.
P. S. If it helps, the ampltude data in the histogram above has been included as well.

8 Comments

Do you want every single peak, no matter it's size, to have the valleys on either side of the peak pulled down to zero? Or would you allow noisy peaks, like big peaks with smaller peaks on the shoulders of the big peak?
So do you want a time domain based solution where you basically just pull down the signal peak by peak? Or do you want a frequency domain based solution where you filter out slowly varying low frequency signals? Or do you want a non-linear time-based solution like using a Savitzky-Golay filter to smooth it, then subtract the smoothed signal from the original?
Hi Image Analyst. It'd be a time domain-based analysis and one that would allow for noisy peaks shouldering the larger big peaks. Filtering out low frequiency events might get rid of some rekevent data so rather not do that at this point but pull pull down the larger peaks. I have a series of such recordings so rather than use a hard–coded value, ideally want to reiteratively go through a series of the recording files and processing in an unbiased manner.
Thanks for the help!
Do you think my idea for using MLE to get the amplitude value for where the intersection of the overlapping histograms occur is valid? Ultimately I want to get an unbiased value returned from the data that can separate out the amplitude populations of large and small that is noticeable by eye. There must be a well-established means to do this. Again thank you for any help with this.
hello
just for my fun I tried these two approaches (quick and dirty) .
I am not sure if it really brings something to do the baseline correction, both histograms before and after are quite similar (or I am wrong ?)
functions used are attached if that brings anything to the subject
code 1
%% Load data
load amplitude_data.mat;
N = length(tpb);
%% Run the algorithm
[Base, yc]=baseline(tpb); % (see function attached)
subplot(2,1,1),plot(tpb)
hold on
plot(Base,'--')
subplot(2,1,2),plot(yc)
figure,histogram(tpb)
figure,histogram(yc)
code 2
%% Load data
load amplitude_data.mat;
N = length(tpb);
t = (0:N-1)';
%% Run the algorithm
Base = env_secant(t, tpb, 50, 'bottom'); % (see function attached)
yc = tpb - Base;
subplot(2,1,1),plot(t,tpb,t,Base)
subplot(2,1,2),plot(yc)
figure,histogram(tpb)
figure,histogram(yc)
Hi Mathieu. Thank you for the time with this. I am wrapping up my day to explore this more but want to thank you for your post! I will respond back again.
I was unable to run baseline.m as there appears an unidentified variable needed: "peak_stripping(~,~)." It's not clear what this was suppose to do to try to define on my end.
sorry , I forgot to provide it
here in attachment

Sign in to comment.

 Accepted Answer

I encourage you to investigate the ‘prominence’ values of the peaks. This is not a straightforward concept to define (see the findpeaks documentation for a discussion) however it generally takes into account the ‘context’ of each peak to define how it relates to the signal values around it. This does not require any significant smoothing or data pre-processing, and it may be more useful than other peak metrics. (I generally define peaks by setting a 'MinPeakProminence' value in findpeaks calls.) I did a brief analysis of your signal with respect to the peak values and prominence values.
A ‘prominence threshold’ might do what you want —
LD = load('amplitude_data.mat');
tpb = LD.tpb;
L = numel(tpb);
x = linspace(0, L-1, L).';
figure
plot(x, tpb)
grid
xlabel('Index')
ylabel('Signal Amplitude')
[pks,locs,wdth,prom] = findpeaks(tpb);
[pkcounts,pkedges,pkbins] = histcounts(pks, 50);
figure
bar(pkedges(1:end-1)+mean(diff(pkedges))/2, pkcounts)
xlabel('Peak Amplitudes')
ylabel('Counts')
[prcounts,predges,prbins] = histcounts(prom, 50);
figure
bar(predges(1:end-1)+mean(diff(predges))/2, prcounts)
xlabel('Prominence Amplitudes')
ylabel('Counts')
pkpr = sortrows([pks,prom],1);
figure
plot(pkpr(:,1), pkpr(:,2))
grid
xlabel('Peak Amplitude')
ylabel('Prominence Value')
promThrshld = 3;
[pks2,locs2] = findpeaks(tpb, 'MinPeakProminence',promThrshld);
figure
plot(x, tpb, 'DisplayName','Data')
hold on
plot(x(locs2), pks2, 'sr', 'DisplayName','Peaks with Prominence Values >= 3')
hold off
grid
xlabel('Index')
ylabel('Signal Amplitude')
legend('Location','best')
.

7 Comments

Hi Star Strider. Much appreciation for the time you gave with this and will read more about promience in findpeaks. Will explore this and post back again but wanted to thank you. Cheers. :)
hxen
hxen on 1 Dec 2023
Edited: hxen on 1 Dec 2023
The 'prominence' feature I think this is very promising. The only issue is that the problem remains of haivng to manually set a threshold value. Ideally it would be nice if there were a method to examine the raw data and make a decision of what the threshold should be. I was thinking of maximum likelihood estimation (MLE) or do a logisitic regression. The issue with the latter of course is that I would have to have defined data before hand (i.e. "small" very "large" peaks) and this of course again introduces subjectivity....
I know in image analysis, for contouring, MLE approaches are used to find the gradient along the contour and in such cases for say a B&W image, the pixel data has clearly nicely observed overlapping histograms: one of the background population and the other the object to define a boundary around. I was in part inspired to use such an approach as intuitively the nature of this data is also bimodal with vastly different number of occurances: again the small ampltiudes being less frequent. But I am stumped how to perform this.
I could not reproduce the histogram you posted, since I don’t know how you coded it. I did not originally pick up on the bimodality, since the peak around 2 is easily lost in the noise.
Beyond the prominence value I have no further suggestions just now, since I’m not certain what objective you want to achieve. It may help to expand on that.
hxen
hxen on 3 Dec 2023
Edited: hxen on 3 Dec 2023
Hi. I could have sworn had replied to this but seems that my post did not go through...
Forgive if not having clarified this better from the onset of the thread:
I ultimately would like a method that based on the data distributuion itself can return a threshold value and that is not dependent on any hard-coded parameter.
The prominence feature I believe is the right direction but still need to set a threshold value by hand. Right now, by visual inspection the threshold is being set.
Relatedly, I agree the bimodality is not readily apparent in the histogram, so it would seem the idea inspired by image controuring is not well suited for the nature of this kidn of data structure. But I would find it surprising if there are not already estbalished ML based analyses that could return a threshold value for data typoes of a similar nature as here. It was this feedback more than anything else about such an approach that I was seeking and how best it would be implemented.
Best.
Setting a prominence depends on the result you want. The findpeaks function returns prominence values as the fourth output if you request it. You can then use the returned prominence values to set a specific level, for example by using the prctile function to return a minimum prominence value at a specific percentile. (The percentile is distribution-independent, however it is a function of the cumulative distribution of the promenence values.) You could then run findpeaks again with that value to return the peaks, locations, and other parameters meeting that criterion.
There might be other ways as well to choose a specific prominence level adaptively, however that depends on the criteria you set for it and the result you want.
I thought I saw a Comment from you similar to the last one you posted here as well, however when I went back to look for it, I was not able to find it.
"I thought I saw a Comment from you similar to the last one you posted here as well, however when I went back to look for it, I was not able to find it. "
That is what I recalled too. Weird. Perhaps I accidentally delted somehow as had the thread going on two comps. Anyway, thank you so much Star! I am accepting this as the prctile feature of findpeaks is good. Thank you so much for your very helpful feedback.

Sign in to comment.

More Answers (0)

Asked:

on 29 Nov 2023

Commented:

on 6 Dec 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!