How to remove noise (unwanted data)

14 views (last 30 days)
I have a data that contain some noise , I have to remove that . from attachment take only Time and Pyrn1_Avg column. I am also attaching two figure one with noise and another is without noise(which I need). I have removed noise of only one data by putting NaN in the place of noise manually. but it is time consuming and I have thousand of data. sugest me suitable fillter.

Accepted Answer

John D'Errico
John D'Errico on 22 Feb 2023
Finding and removing what I might call singleton outlier's while leaving in place small amounts of noise can be a difficult task. After all, how far out does noise need to be for it to be an outlier? Could it be just a rare event from the normal population of noise? And this very much depends on the regular signal to noise ratio in your data.
The real problem becomes though, as to how to find large blocks of possible noise.
data = readtable('KLOG0024.CSV')
data = 1415×9 table
Date Time STemp_Avg Pyrn1_Avg Pyrn2_Avg Pyrg1_Avg Pyrg2_Avg ATM_Press_Avg Tacho_Avg __________ ________ _________ _________ _________ _________ _________ _____________ _________ 06.01.2019 00:00:00 570.25 0.349 0.054 -0.906 -0.113 96863 1768.7 06.01.2019 00:01:00 570.24 0.35 0.055 -0.905 -0.112 96862 1765.8 06.01.2019 00:02:00 570.26 0.353 0.057 -0.904 -0.11 96861 1770.3 06.01.2019 00:03:00 570.26 0.356 0.058 -0.904 -0.112 96863 1769.1 06.01.2019 00:04:00 570.24 0.356 0.055 -0.905 -0.114 96865 1762.1 06.01.2019 00:05:00 569.8 0.357 0.055 -0.903 -0.112 96864 1770.6 06.01.2019 00:06:00 569.65 0.358 0.055 -0.903 -0.111 96867 1770.8 06.01.2019 00:07:00 569.27 0.36 0.055 -0.901 -0.112 96867 1769.1 06.01.2019 00:08:00 568.77 0.36 0.054 -0.9 -0.11 96867 1758.5 06.01.2019 00:09:00 568.43 0.362 0.056 -0.899 -0.11 96867 1773.2 06.01.2019 00:10:00 567.93 0.363 0.055 -0.899 -0.109 96868 1770.6 06.01.2019 00:11:00 567.82 0.366 0.057 -0.898 -0.108 96868 1754.5 06.01.2019 00:12:00 568.15 0.37 0.06 -0.899 -0.107 96867 1773 06.01.2019 00:13:00 568.46 0.372 0.061 -0.899 -0.108 96867 1763.3 06.01.2019 00:14:00 568.89 0.373 0.06 -0.899 -0.11 96867 1763.9 06.01.2019 00:15:00 568.93 0.374 0.059 -0.899 -0.109 96868 1769.3
T = data.Time;
Y = data.Pyrn1_Avg;
plot(datenum(T),Y)
If you knew what the normal level of noise was in this data, then you might decide that any region where the variability appears to be larger than the norm should just be dropped out. The problem is, the curve itself has a significant amount of signal in it. Since the time vector is just at a constant increment, we might consider a simple finite difference of the curve. That essentially eliminates any component of the signal itself.
So we might do this:
plot(datenum(T(2:end)),diff(Y))
Now you can see where crap is happening. Next, compute a moving, local estimate of the noise in that curve. I've attached my movingstd utility (it should be on the file exchange for download.)
Sigest = movingstd([0;diff(Y)],20,'central'); % A centroal moving window, width 20
plot(Sigest)
Now, you might decide to zap out any part of the curve where the local variability of the curve is greater than some level. If we assume the bad part is no more than 10% of the curve, that would be the 90'th percentile.
Sigmax = prctile(Sigest,90)
Sigmax = 0.1446
Y(Sigest>Sigmax) = NaN;
And finally, plot the result:
plot(datenum(T),Y,'-')
And while it looks like you chose to zap out a little more of the curve than I did, this looks at least reasonable.
  2 Comments
Ritesh
Ritesh on 24 Feb 2023
Edited: Ritesh on 27 Feb 2023
@John D'Errico thank you so much, I wanted to remove the noises rather than manipuleting my data by medfilt1.
John D'Errico
John D'Errico on 27 Feb 2023
Then I am happy to have been of help.

Sign in to comment.

More Answers (1)

Askic V
Askic V on 22 Feb 2023
Edited: Askic V on 22 Feb 2023
I would suggest the following approach:
% read file into table
%T = readtable('KLOG0024.csv');
outfile = websave('KLOG0024.csv', 'https://www.mathworks.com/matlabcentral/answers/uploaded_files/1303255/KLOG0024.CSV');
T = readtable(outfile);
% read data into arrays
time_t = table2array(T(:,'Time'));
data_d = table2array(T(:,'Pyrn1_Avg'));
% plot ddata
plot(time_t, data_d);
hold on
% medfilt1 replaces every point of a signal by the
% median of that point and a specified number of neighboring points (15)
filtered_data = medfilt1(data_d,15);
plot(time_t, filtered_data);
legend('Noisy data', 'Filtered data');
you can now play with the number of points until it suits your needs,
  3 Comments
Ritesh
Ritesh on 23 Feb 2023
Thanks @Askic V for your wise approch. And very thanks @John D'Errico for your effort and solution.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!