How to specify max. number of outliers on which to apply the Hampel filter?
1 view (last 30 days)
I have a timeseries with outliers. However, sometimes these occur consequtively, e.g. 20 datapoints one after the other. This poses a problem when applying Hampel on the entire dataset because it alters data that is actually valid.
I would like to apply the Hampel filter only when there are max. 3 outliers in a Hampel window. How can I do that?
The outliers are values within the vector X that exceed a threshold of abs(X) = 10.
Is there a way to create a new vector which keeps datapoints exceeding 10 if most of the sorrounding datapoints are below 10, and if in a window of 20 values, more than 3 of them exceed the threshold, then those extremes should be replaced with 'NaN'?
Thank you in advance!
Scott MacKenzie on 5 Mar 2022
There's probably a simpler way to do this, but I believe the code below achieves what you are after:
% test data
x1 = randi([5 12],1,50);
% filtered data in x2
x2 = x1;
% parameters (adjust as necessary)
ws = 6; % window size
t = 10; % threshold
n = 3; % number of points above threshold (must be >n)
idx1 = i-ws+1;
idx2 = i;
idx = (idx1-1) + find(x1(idx1:idx2) > t );
x2(idx) = nan;