unsupervised thresholding of signal amplitudes: MLE?

Question

0 votes

amplitude_data.mat

I am hoping someone with knowledge of unsupervised signal processing can advise how best to handle the following problem. I have a long time series signal whose noisy baseline fluctuates over time:

.

From the image above, you can see that there are low and high amplitude components. The small amplitudes are a mixture or noise plus some very low-amplitude real background signal. The small amplitude events compose the majority of events. However, I am interested in extracing the time occurrences and ampltiudes of the larger amplitude events only.

One problem is the unstable baseline on top of which the large amiplitudes ride (top subplot, black trace). Gven the fluctuating baseline, what would be best to determine the height of large amplitudes is not obvious. Relatedly, what method would be appropriate for this problem to unbiasedly determine a threshold to detect the lage ampltudes?

An approach I have tried is to first obtain the minimums of the troughs and use these values as query points (red circles in blown up image below) for fitting an interpolated line as a moving base (red dashed line). Substracting the moving base from the raw signal gives a flatten baseline (bottom subplot, blue trace):

From the flatten signal, I use a findpeaks and threshold for peak detection for anything above 3 medians. This has worked well as the majority of events are small ampltudes and three medians is selecting out the large peaks well (red and black ticks). With this methods, I can extract the time points of the large peak events and their amplitudes (as the difference of the moving basline from the raw signal at the peak points). However this is not unbiased of course and for other recordings this approach can be less effective.

To have a completely unbiased analysis, I am entertaining using an MLE approach by first using findpeaks (or peakfinder) to get the amplitudes of all the peak amplitudes (small and large) and then using this to get an unbiased threshold for selecting only the large amplitudes. Here's a histogram of the peak amplitudes:

While you can (arguably) discern two overlapping histograms (one centered around 0.5 and the other at around 2), I was hoping what was intuitively a bimodal distribution would have been more apparent. Nonetheless is there a way to use MLE to get an amplitude value at the intersection of these two overlaps corresponding to the more frequent small-amplitude and less frequent larger-amplitude events? Or is there perhaps a better way to approach this problem.

A related question to be asked is if the interpolated moving baseline subtraction is the best way to handle the noisy nature of the signal. But since there are portions that go below starting "zeroed" reference level, it is not at all clear how else to handle this fluctuating baseline.

Much apprecation and thanks for all who have read through this and can give advise/help on running this analysis in an unbiased and well-justified manner.

Cheers.

P. S. If it helps, the ampltude data in the histogram above has been included as well.

8 Comments
Show 6 older comments Hide 6 older comments

Mathieu NOE on 1 Dec 2023

peak_stripping.m

sorry , I forgot to provide it

here in attachment

hxen on 1 Dec 2023

Thanks Mathieu!

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Star Strider on 30 Nov 2023

Open in MATLAB Online

0 votes

amplitude_data.mat

I encourage you to investigate the ‘prominence’ values of the peaks. This is not a straightforward concept to define (see the findpeaks documentation for a discussion) however it generally takes into account the ‘context’ of each peak to define how it relates to the signal values around it. This does not require any significant smoothing or data pre-processing, and it may be more useful than other peak metrics. (I generally define peaks by setting a 'MinPeakProminence' value in findpeaks calls.) I did a brief analysis of your signal with respect to the peak values and prominence values.

A ‘prominence threshold’ might do what you want —

LD = load('amplitude_data.mat');

tpb = LD.tpb;

L = numel(tpb);

x = linspace(0, L-1, L).';

figure

plot(x, tpb)

grid

xlabel('Index')

ylabel('Signal Amplitude')

[pks,locs,wdth,prom] = findpeaks(tpb);

[pkcounts,pkedges,pkbins] = histcounts(pks, 50);

figure

bar(pkedges(1:end-1)+mean(diff(pkedges))/2, pkcounts)

xlabel('Peak Amplitudes')

ylabel('Counts')

[prcounts,predges,prbins] = histcounts(prom, 50);

figure

bar(predges(1:end-1)+mean(diff(predges))/2, prcounts)

xlabel('Prominence Amplitudes')

ylabel('Counts')

pkpr = sortrows([pks,prom],1);

figure

plot(pkpr(:,1), pkpr(:,2))

grid

xlabel('Peak Amplitude')

ylabel('Prominence Value')

promThrshld = 3;

[pks2,locs2] = findpeaks(tpb, 'MinPeakProminence',promThrshld);

figure

plot(x, tpb, 'DisplayName','Data')

hold on

plot(x(locs2), pks2, 'sr', 'DisplayName','Peaks with Prominence Values >= 3')

hold off

grid

xlabel('Index')

ylabel('Signal Amplitude')

legend('Location','best')

.

7 Comments
Show 5 older comments Hide 5 older comments

hxen on 1 Dec 2023

Edited: hxen on 1 Dec 2023

The 'prominence' feature I think this is very promising. The only issue is that the problem remains of haivng to manually set a threshold value. Ideally it would be nice if there were a method to examine the raw data and make a decision of what the threshold should be. I was thinking of maximum likelihood estimation (MLE) or do a logisitic regression. The issue with the latter of course is that I would have to have defined data before hand (i.e. "small" very "large" peaks) and this of course again introduces subjectivity....

I know in image analysis, for contouring, MLE approaches are used to find the gradient along the contour and in such cases for say a B&W image, the pixel data has clearly nicely observed overlapping histograms: one of the background population and the other the object to define a boundary around. I was in part inspired to use such an approach as intuitively the nature of this data is also bimodal with vastly different number of occurances: again the small ampltiudes being less frequent. But I am stumped how to perform this.

hxen on 6 Dec 2023

"I thought I saw a Comment from you similar to the last one you posted here as well, however when I went back to look for it, I was not able to find it. "

That is what I recalled too. Weird. Perhaps I accidentally delted somehow as had the thread going on two comps. Anyway, thank you so much Star! I am accepting this as the prctile feature of findpeaks is good. Thank you so much for your very helpful feedback.

Star Strider on 6 Dec 2023

As always, my pleasure!

Sign in to comment.

unsupervised thresholding of signal amplitudes: MLE?

8 Comments
Show 6 older comments Hide 6 older comments

Accepted Answer

7 Comments
Show 5 older comments Hide 5 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

unsupervised thresholding of signal amplitudes: MLE?

8 Comments Show 6 older comments Hide 6 older comments

Accepted Answer

7 Comments Show 5 older comments Hide 5 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

8 Comments
Show 6 older comments Hide 6 older comments

7 Comments
Show 5 older comments Hide 5 older comments