finding singular outliers in the presence of data with steep changes but not singular

2 views (last 30 days)
Finding outliers of the type, that just a singular values significantly sticks out of the data aroung.
But the complicating factor is, that there sometimes are steep changes in the data. But these are embedded in the context of many data following the sudden new trend. That are not singular data points.
I tried
rmoutliers(Data,'movmedian',3)
but that throws out by far too many of the data from the steep changes, not only the singular outliers.
  7 Comments

Sign in to comment.

Accepted Answer

Steven Lord
Steven Lord on 6 Dec 2022
I think the outlier detection and removal functions in MATLAB are the right tools for you to use. Choosing the right parameters (detection method and thresholds) can be a challenge. That's one of the purposes for which the Clean Outlier Data task was created.
Open the Live Editor. Read in your data then open the task as per the instructions in the Open the Task section on that documentation page. Then tell the task the data on which it should operate and experiment with the various detection methods and parameters for those detection methods until they detect the points that you want to be considered outliers without ignoring those that look outlier-like but aren't. Once you have the parameters set the way you want, you can look at the code so you can use it for a different but similar data set in the future.
  2 Comments
hans
hans on 6 Dec 2022
The hint to LiveEditor and the Clean Outlier Data task are very helpful. I didn't know about that functionality. I can modify the parameters and see the effect at once. That's very good.
hans
hans on 7 Dec 2022
With the help of LiveEditor and the Clean Outlier Dat task I adapted the parameters to a suitable code. I finally came up with
[cleanedData2,outlierIndices] = filloutliers(Pressure,"linear",...
"movmedian",minutes(10),"ThresholdFactor",20,"SamplePoints",Time);
Thank You

Sign in to comment.

More Answers (1)

Mathieu NOE
Mathieu NOE on 6 Dec 2022
hello
this is my result so far
it will not look at the data in the first and last 10% of the time vector so thefocus is on the rafale of peaks in the second half
x = (1:numel(Pressure));
[dy, ddy] = firstsecondderivatives(x,Pressure);
% do not look at first and last 10% (of total signal duration) samples
n_start = round(0.1*numel(Pressure));
n_end = round(0.1*numel(Pressure));
ddy(1:n_start) = 0;
ddy(end-n_end:end) = 0;
ddy = abs(ddy);
threshold = 1;
x_zc = round(find_zc(x,ddy,threshold));
% keep only first and last index to get start / stop index of window
% and make the window a bit larger with
% 100 samples before and after
x_zc = [x_zc(1)-100 x_zc(end)+100];
y_filtered = Pressure ;
y_filtered(x_zc(1):x_zc(end)) = filloutliers(Pressure(x_zc(1):x_zc(end)),'linear','movmean',100);
figure(1);plot(Time,Pressure,'b',Time,y_filtered,'r');
function [Zx] = find_zc(x,y,threshold)
% positive slope "zero" crossing detection, using linear interpolation
y = y - threshold;
zci = @(data) find(diff(sign(data))>0); %define function: returns indices of +ZCs
ix=zci(y); %find indices of + zero crossings of x
ZeroX = @(x0,y0,x1,y1) x0 - (y0.*(x0 - x1))./(y0 - y1); % Interpolated x value for Zero-Crossing
Zx = ZeroX(x(ix),y(ix),x(ix+1),y(ix+1));
end
function [dy, ddy] = firstsecondderivatives(x,y)
% The function calculates the first & second derivative of a function that is given by a set
% of points. The first derivatives at the first and last points are calculated by
% the 3 point forward and 3 point backward finite difference scheme respectively.
% The first derivatives at all the other points are calculated by the 2 point
% central approach.
% The second derivatives at the first and last points are calculated by
% the 4 point forward and 4 point backward finite difference scheme respectively.
% The second derivatives at all the other points are calculated by the 3 point
% central approach.
n = length (x);
dy = zeros;
ddy = zeros;
% Input variables:
% x: vector with the x the data points.
% y: vector with the f(x) data points.
% Output variable:
% dy: Vector with first derivative at each point.
% ddy: Vector with second derivative at each point.
dy(1) = (-3*y(1) + 4*y(2) - y(3)) / (2*(x(2) - x(1))); % First derivative
ddy(1) = (2*y(1) - 5*y(2) + 4*y(3) - y(4)) / (x(2) - x(1))^2; % Second derivative
for i = 2:n-1
dy(i) = (y(i+1) - y(i-1)) / (x(i+1) - x(i-1));
ddy(i) = (y(i-1) - 2*y(i) + y(i+1)) / (x(i-1) - x(i))^2;
end
dy(n) = (y(n-2) - 4*y(n-1) + 3*y(n)) / (2*(x(n) - x(n-1)));
ddy(n) = (-y(n-3) + 4*y(n-2) - 5*y(n-1) + 2*y(n)) / (x(n) - x(n-1))^2;
end
  1 Comment
hans
hans on 7 Dec 2022
Hi Mathieu,
thank You for this very elaborated code. It works very good and can be adapted perfectly to the situation.
But it's a long code, so finally the filloutliers command with the parameters adapted supervised using the LiveEditor is a very comfortable way to use the preconfigured matlab command, which finally worked also well for me.
Thank You again, for looking so intensely into my data !

Sign in to comment.

Tags

Products


Release

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!