File Exchange

image thumbnail

Regression Outliers

version 1.0 (4.74 KB) by

Removes outliers from X and Y variables based on regression residuals



View License

This function accepts two (vector of) variables for which a bivariate linear regression analysis is meant to be performed, and removes the outliers from both variables. Since the regression residual vector is used to detect the outliers, only those records which stand farthest from the 1:1 regression line will be detected and removed. If more than one outliers is asked to be removed, before removing the next outlier, regression residuals will be recalculated to avoid swamping and masking effects, then the next farthest point from 1:1 line will be removed and so forth. This method differentiates those points that might be outlier in a single variable (X or Y) but can fit well in a 1:1 regression line-fit from those points that stay in the acceptable range in each of the individual input variables (X,Y) but can appear in the outliers when the two variables are fitted in the regression line. To detect the outlier from the residual's vector, a subfunction is used (this subfunction is an enhancement from a work by Vince Petaccio, 2009, and is available also as a stand-alone function, "outliers", from Matlab File exchange).

   X0: vector of dependent variable in bivariate linear regression
   Y0: vector of independent variable in bivariate linear regression
   noutliers: how many outliers should be removed? (1 will be used as default if not provided)
   plotOp: plotting option, whether to produce a scatterplot of the two input variables before and after each iteration of outliers removal (up to noutliers) or only do calculations (0: don’t plot, 1: plot), if 1 is given, plots will be generated in a subplot

   X: vector of dependent variable after removal of the outliers
   Y: vector of independent variable after removal of the outliers
   rSquares: a vector of r-square values calculated from the original inputs and after removal of each outlier
   outliers_idx: indexes of outliers, note that records for these indexes are turned to NaN in X and Y outputs

   outliers subfunction, which is included in this code following main function

 X0=10.2:0.2:30; first vector
 Y0=0.1:0.1:10; second vector
 idx=randi(length(Y0),4,1); %randomly distribute 4 noise
 Y0(idx)=randn(4,1)*10; %produce 4 random noise
 noutliers=3; %number of outliers to remove
 plotOp=1; %0: dont plot, 1: plot
 rSquares %print rsquare values calculated from original %
%data and each step after removal of outliers, this
%should show progressively increasing values, otherwise
%number of outliers to be removed should be decreased or
%in some cases increased.
 outliers_idx %print indexes of outliers in both input vectors

Comments and Ratings (0)

MATLAB Release
MATLAB 7.7 (R2008b)

Inspired by: Remove Outliers

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video