File Exchange

image thumbnail

Outlier Detection and Removal [hampel]

Detect and replace outliers with appropriate local values in a non-linear time series.

48 Downloads

Updated

View License

HAMPEL(X,Y,DX,T,varargin) returns the Hampel filtered values of the
elements in Y. It was developed to detect outliers in a time series,
but it can also be used as an alternative to the standard median
filter.

References
Chapters 1.4.2, 3.2.2 and 4.3.4 in Mining Imperfect Data: Dealing with
Contamination and Incomplete Records by Ronald K. Pearson.

Acknowledgements
I would like to thank Ronald K. Pearson for the introduction to moving
window filters. Please visit his blog at:
http://exploringdatablog.blogspot.com/2012/01/moving-window-filters-and-pracma.html

X,Y are row or column vectors with an equal number of elements.
The elements in Y should be Gaussian distributed.

Input DX,T,varargin must not contain NaN values!

DX,T are optional scalar values.
DX is a scalar which defines the half width of the filter window.
It is required that DX > 0 and DX should be dimensionally equivalent to
the values in X.
T is a scalar which defines the threshold value used in the equation
|Y - Y0| > T*S0.

Standard Parameters for DX and T:
DX = 3*median(X(2:end)-X(1:end-1));
T = 3;

varargin covers addtional optional input. The optional input must be in
the form of 'PropertyName', PropertyValue.
Supported PropertyNames:
'standard': Use the standard Hampel filter.
'adaptive': Use an experimental adaptive Hampel filter. Explained under
Revision 1 details below.

Supported PropertyValues: Scalar value which defines the tolerance of
the adaptive filter. In the case of standard Hampel filter this value
is ignored.

Output YY,I,Y0,LB,UB,ADX are column vectors containing Hampel filtered
values of Y, a logical index of the replaced values, nominal data,
lower and upper bounds on the Hampel filter and the relative half size
of the local window, respectively.

NO is a scalar that specifies the Number of Outliers detected.

Examples
1. Hampel filter removal of outliers
X = 1:1000; % Pseudo Time
Y = 5000 + randn(1000, 1); % Pseudo Data
Outliers = randi(1000, 10, 1); % Index of Outliers
Y(Outliers) = Y(Outliers) + randi(1000, 10, 1); % Pseudo Outliers
[YY,I,Y0,LB,UB] = hampel(X,Y);

plot(X, Y, 'b.'); hold on; % Original Data
plot(X, YY, 'r'); % Hampel Filtered Data
plot(X, Y0, 'b--'); % Nominal Data
plot(X, LB, 'r--'); % Lower Bounds on Hampel Filter
plot(X, UB, 'r--'); % Upper Bounds on Hampel Filter
plot(X(I), Y(I), 'ks'); % Identified Outliers

2. Adaptive Hampel filter removal of outliers
DX = 1; % Window Half size
T = 3; % Threshold
Threshold = 0.1; % AdaptiveThreshold
X = 1:DX:1000; % Pseudo Time
Y = 5000 + randn(1000, 1); % Pseudo Data
Outliers = randi(1000, 10, 1); % Index of Outliers
Y(Outliers) = Y(Outliers) + randi(1000, 10, 1); % Pseudo Outliers
[YY,I,Y0,LB,UB] = hampel(X,Y,DX,T,'Adaptive',Threshold);

plot(X, Y, 'b.'); hold on; % Original Data
plot(X, YY, 'r'); % Hampel Filtered Data
plot(X, Y0, 'b--'); % Nominal Data
plot(X, LB, 'r--'); % Lower Bounds on Hampel Filter
plot(X, UB, 'r--'); % Upper Bounds on Hampel Filter
plot(X(I), Y(I), 'ks'); % Identified Outliers

3. Median Filter Based on Filter Window
DX = 3; % Filter Half Size
T = 0; % Threshold
X = 1:1000; % Pseudo Time
Y = 5000 + randn(1000, 1); % Pseudo Data
[YY,I,Y0] = hampel(X,Y,DX,T);

plot(X, Y, 'b.'); hold on; % Original Data
plot(X, Y0, 'r'); % Median Filtered Data

Comments and Ratings (8)

Sara Barros

Avaa Gao

Fiakro Castro

Fiakro Castro

E. Cheynet

E. Cheynet (view profile)

Mark Shore

An interesting and potentially very useful tool. I'm reviewing both this implementation as well as Pearson's book and webpage.

The code appears well-commented and carefully written but I'll hold off on a rating until I have tested it more using my own data.

There is a new update on the way as soon as Mathworks has reviewed it. This update corrects minor errors.

peter

peter (view profile)

thanks a lot, a very practical tool

Updates

1.6

(1) Corrected potential error in internal median function.
(2) Removed internal "keyboard" command.
(3) Optimized internal Gauss filter.

1.5

(1) The elements in X and Y are now temporarily sorted for internal computations.
(2) Performance optimization.
(3) Added Example 3.

1.4

(1) If the number of elements (X,Y) are below 2 the output YY will be a copy of Y. No outliers will be detected. No error will be issued.

1.3

(1) Changed a calculation in the adaptive Hampel filter. See details in file.
(2) Checks if DX,T or varargin contains NaN values.
(3) Now capable of ignoring NaN values in X and Y.
(4) Added output Y0 - Nominal Data.

1.1

(1) Replaced output S with lower and upper bounds on the Hampel filter.
(2) Added option to use an experimental adaptive Hampel filter.
The Principle behind this filter is described in hampel.m.

MATLAB Release
MATLAB 7.10 (R2010a)
Acknowledgements

Inspired: Hampel filter in C++

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video

Win prizes and improve your MATLAB skills

Play today