unsupervised thresholding of signal amplitudes: MLE?
Show older comments
I am hoping someone with knowledge of unsupervised signal processing can advise how best to handle the following problem. I have a long time series signal whose noisy baseline fluctuates over time:
.
. From the image above, you can see that there are low and high amplitude components. The small amplitudes are a mixture or noise plus some very low-amplitude real background signal. The small amplitude events compose the majority of events. However, I am interested in extracing the time occurrences and ampltiudes of the larger amplitude events only.
One problem is the unstable baseline on top of which the large amiplitudes ride (top subplot, black trace). Gven the fluctuating baseline, what would be best to determine the height of large amplitudes is not obvious. Relatedly, what method would be appropriate for this problem to unbiasedly determine a threshold to detect the lage ampltudes?
An approach I have tried is to first obtain the minimums of the troughs and use these values as query points (red circles in blown up image below) for fitting an interpolated line as a moving base (red dashed line). Substracting the moving base from the raw signal gives a flatten baseline (bottom subplot, blue trace):

From the flatten signal, I use a findpeaks and threshold for peak detection for anything above 3 medians. This has worked well as the majority of events are small ampltudes and three medians is selecting out the large peaks well (red and black ticks). With this methods, I can extract the time points of the large peak events and their amplitudes (as the difference of the moving basline from the raw signal at the peak points). However this is not unbiased of course and for other recordings this approach can be less effective.
To have a completely unbiased analysis, I am entertaining using an MLE approach by first using findpeaks (or peakfinder) to get the amplitudes of all the peak amplitudes (small and large) and then using this to get an unbiased threshold for selecting only the large amplitudes. Here's a histogram of the peak amplitudes:

While you can (arguably) discern two overlapping histograms (one centered around 0.5 and the other at around 2), I was hoping what was intuitively a bimodal distribution would have been more apparent. Nonetheless is there a way to use MLE to get an amplitude value at the intersection of these two overlaps corresponding to the more frequent small-amplitude and less frequent larger-amplitude events? Or is there perhaps a better way to approach this problem.
A related question to be asked is if the interpolated moving baseline subtraction is the best way to handle the noisy nature of the signal. But since there are portions that go below starting "zeroed" reference level, it is not at all clear how else to handle this fluctuating baseline.
Much apprecation and thanks for all who have read through this and can give advise/help on running this analysis in an unbiased and well-justified manner.
Cheers.
P. S. If it helps, the ampltude data in the histogram above has been included as well.
8 Comments
Image Analyst
on 29 Nov 2023
Edited: Image Analyst
on 29 Nov 2023
Do you want every single peak, no matter it's size, to have the valleys on either side of the peak pulled down to zero? Or would you allow noisy peaks, like big peaks with smaller peaks on the shoulders of the big peak?
So do you want a time domain based solution where you basically just pull down the signal peak by peak? Or do you want a frequency domain based solution where you filter out slowly varying low frequency signals? Or do you want a non-linear time-based solution like using a Savitzky-Golay filter to smooth it, then subtract the smoothed signal from the original?
hxen
on 29 Nov 2023
hxen
on 30 Nov 2023
Mathieu NOE
on 30 Nov 2023
hello
just for my fun I tried these two approaches (quick and dirty) .
I am not sure if it really brings something to do the baseline correction, both histograms before and after are quite similar (or I am wrong ?)
functions used are attached if that brings anything to the subject
code 1
%% Load data
load amplitude_data.mat;
N = length(tpb);
%% Run the algorithm
[Base, yc]=baseline(tpb); % (see function attached)
subplot(2,1,1),plot(tpb)
hold on
plot(Base,'--')
subplot(2,1,2),plot(yc)
figure,histogram(tpb)
figure,histogram(yc)
code 2
%% Load data
load amplitude_data.mat;
N = length(tpb);
t = (0:N-1)';
%% Run the algorithm
Base = env_secant(t, tpb, 50, 'bottom'); % (see function attached)
yc = tpb - Base;
subplot(2,1,1),plot(t,tpb,t,Base)
subplot(2,1,2),plot(yc)
figure,histogram(tpb)
figure,histogram(yc)
hxen
on 1 Dec 2023
hxen
on 1 Dec 2023
Mathieu NOE
on 1 Dec 2023
sorry , I forgot to provide it
here in attachment
hxen
on 1 Dec 2023
Accepted Answer
More Answers (0)
Categories
Find more on Descriptive Statistics in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!



