File Exchange

image thumbnail

Regression Outliers

version 1.0 (4.74 KB) by

Removes outliers from X and Y variables based on regression residuals

4 Downloads

Updated

View License

This function accepts two (vector of) variables for which a bivariate linear regression analysis is meant to be performed, and removes the outliers from both variables. Since the regression residual vector is used to detect the outliers, only those records which stand farthest from the 1:1 regression line will be detected and removed. If more than one outliers is asked to be removed, before removing the next outlier, regression residuals will be recalculated to avoid swamping and masking effects, then the next farthest point from 1:1 line will be removed and so forth. This method differentiates those points that might be outlier in a single variable (X or Y) but can fit well in a 1:1 regression line-fit from those points that stay in the acceptable range in each of the individual input variables (X,Y) but can appear in the outliers when the two variables are fitted in the regression line. To detect the outlier from the residual's vector, a subfunction is used (this subfunction is an enhancement from a work by Vince Petaccio, 2009, and is available also as a stand-alone function, "outliers", from Matlab File exchange).

--Inputs:
X0: vector of dependent variable in bivariate linear regression
Y0: vector of independent variable in bivariate linear regression
noutliers: how many outliers should be removed? (1 will be used as default if not provided)
plotOp: plotting option, whether to produce a scatterplot of the two input variables before and after each iteration of outliers removal (up to noutliers) or only do calculations (0: don’t plot, 1: plot), if 1 is given, plots will be generated in a subplot

--Outputs:
X: vector of dependent variable after removal of the outliers
Y: vector of independent variable after removal of the outliers
rSquares: a vector of r-square values calculated from the original inputs and after removal of each outlier
outliers_idx: indexes of outliers, note that records for these indexes are turned to NaN in X and Y outputs

--Dependency:
outliers subfunction, which is included in this code following main function

--Example:
X0=10.2:0.2:30; first vector
Y0=0.1:0.1:10; second vector
idx=randi(length(Y0),4,1); %randomly distribute 4 noise
Y0(idx)=randn(4,1)*10; %produce 4 random noise
noutliers=3; %number of outliers to remove
plotOp=1; %0: dont plot, 1: plot
[X,Y,rSquares,outliers_idx]=regoutliers(X0,Y0,noutliers,plotOp);
rSquares %print rsquare values calculated from original %
%data and each step after removal of outliers, this
%should show progressively increasing values, otherwise
%number of outliers to be removed should be decreased or
%in some cases increased.
outliers_idx %print indexes of outliers in both input vectors

Comments and Ratings (0)

MATLAB Release
MATLAB 7.7 (R2008b)
Acknowledgements

Inspired by: Remove Outliers

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video