Code covered by the BSD License  

Highlights from
Regression Outliers

Be the first to rate this file! 19 Downloads (last 30 days) File Size: 4.74 KB File ID: #37212
image thumbnail

Regression Outliers

by

 

Removes outliers from X and Y variables based on regression residuals

| Watch this File

File Information
Description

This function accepts two (vector of) variables for which a bivariate linear regression analysis is meant to be performed, and removes the outliers from both variables. Since the regression residual vector is used to detect the outliers, only those records which stand farthest from the 1:1 regression line will be detected and removed. If more than one outliers is asked to be removed, before removing the next outlier, regression residuals will be recalculated to avoid swamping and masking effects, then the next farthest point from 1:1 line will be removed and so forth. This method differentiates those points that might be outlier in a single variable (X or Y) but can fit well in a 1:1 regression line-fit from those points that stay in the acceptable range in each of the individual input variables (X,Y) but can appear in the outliers when the two variables are fitted in the regression line. To detect the outlier from the residual's vector, a subfunction is used (this subfunction is an enhancement from a work by Vince Petaccio, 2009, and is available also as a stand-alone function, "outliers", from Matlab File exchange).

 --Inputs:
   X0: vector of dependent variable in bivariate linear regression
   Y0: vector of independent variable in bivariate linear regression
   noutliers: how many outliers should be removed? (1 will be used as default if not provided)
   plotOp: plotting option, whether to produce a scatterplot of the two input variables before and after each iteration of outliers removal (up to noutliers) or only do calculations (0: don’t plot, 1: plot), if 1 is given, plots will be generated in a subplot

 --Outputs:
   X: vector of dependent variable after removal of the outliers
   Y: vector of independent variable after removal of the outliers
   rSquares: a vector of r-square values calculated from the original inputs and after removal of each outlier
   outliers_idx: indexes of outliers, note that records for these indexes are turned to NaN in X and Y outputs

 --Dependency:
   outliers subfunction, which is included in this code following main function

 --Example:
 X0=10.2:0.2:30; first vector
 Y0=0.1:0.1:10; second vector
 idx=randi(length(Y0),4,1); %randomly distribute 4 noise
 Y0(idx)=randn(4,1)*10; %produce 4 random noise
 noutliers=3; %number of outliers to remove
 plotOp=1; %0: dont plot, 1: plot
 [X,Y,rSquares,outliers_idx]=regoutliers(X0,Y0,noutliers,plotOp);
 rSquares %print rsquare values calculated from original %
%data and each step after removal of outliers, this
%should show progressively increasing values, otherwise
%number of outliers to be removed should be decreased or
%in some cases increased.
 outliers_idx %print indexes of outliers in both input vectors

Acknowledgements

Remove Outliers inspired this file.

Required Products Statistics Toolbox
MATLAB release MATLAB 7.7 (R2008b)
Other requirements The code should be saved in a directory known for Matlab
Tags for This File   Please login to tag files.
Please login to add a comment or rating.

Contact us