robustfit - Robust regression

Syntax

b = robustfit(X,y)
b = robustfit(X,y,wfun,tune)
b = robustfit(X,y,wfun,tune,const)
[b,stats] = robustfit(...)

Description

b = robustfit(X,y) returns a p-by-1 vector b of coefficient estimates for a robust multilinear regression of the responses in y on the predictors in X. X is an n-by-p matrix of p predictors at each of n observations. y is an n-by-1 vector of observed responses. By default, the algorithm uses iteratively reweighted least squares with a bisquare weighting function.

robustfit treats NaNs in X or y as missing values, and removes them.

b = robustfit(X,y,wfun,tune) specifies a weighting function wfun. tune is a tuning constant that is divided into the residual vector before computing weights.

The weighting function wfun can be any one of the following strings:

Weight FunctionEquationDefault Tuning Constant
'andrews'w = (abs(r)<pi) .* sin(r) ./ r1.339
'bisquare' (default)w = (abs(r)<1) .* (1 - r.^2).^24.685
'cauchy'w = 1 ./ (1 + r.^2)2.385
'fair'w = 1 ./ (1 + abs(r))1.400
'huber'w = 1 ./ max(1, abs(r))1.345
'logistic'w = tanh(r) ./ r1.205
'ols'Ordinary least squares (no weighting function)None
'talwar'w = 1 * (abs(r)<1)2.795
'welsch'w = exp(-(r.^2))2.985

If tune is unspecified, the default value in the table is used. Default tuning constants give coefficient estimates that are approximately 95% as statistically efficient as the ordinary least-squares estimates, provided the response has a normal distribution with no outliers. Decreasing the tuning constant increases the downweight assigned to large residuals; increasing the tuning constant decreases the downweight assigned to large residuals.

The value r in the weight functions is

r = resid/(tune*s*sqrt(1-h))

where resid is the vector of residuals from the previous iteration, h is the vector of leverage values from a least-squares fit, and s is an estimate of the standard deviation of the error term given by

s = MAD/0.6745

Here MAD is the median absolute deviation of the residuals from their median. The constant 0.6745 makes the estimate unbiased for the normal distribution. If there are p columns in X, the smallest p absolute deviations are excluded when computing the median.

You can write your own M-file weight function. The function must take a vector of scaled residuals as input and produce a vector of weights as output. In this case, wfun is specified using a function handle @ (as in @myfun), and the input tune is required.

b = robustfit(X,y,wfun,tune,const) controls whether or not the model will include a constant term. const is 'on' to include the constant term (the default), or 'off' to omit it. When const is 'on', robustfit adds a first column of 1s to X. When const is 'off', robustfit does not alter X.

[b,stats] = robustfit(...) returns the structure stats, whose fields contain diagnostic statistics from the regression. The fields of stats are:

The robustfit function estimates the variance-covariance matrix of the coefficient estimates using inv(X'*X)*stats.s^2. Standard errors and correlations are derived from this estimate.

Examples

Generate data with the trend y = 10-2*x, then change one value to simulate an outlier:

x = (1:10)';
y = 10 - 2*x + randn(10,1);
y(10) = 0;

Use both ordinary least squares and robust regression to estimate a straight-line fit:

bls = regress(y,[ones(10,1) x])
bls =
  7.2481
 -1.3208

brob = robustfit(x,y)
brob =
  9.1063
 -1.8231

A scatter plot of the data together with the fits shows that the robust fit is less influenced by the outlier than the least-squares fit:

scatter(x,y,'filled'); grid on; hold on
plot(x,bls(1)+bls(2)*x,'r','LineWidth',2);
plot(x,brob(1)+brob(2)*x,'g','LineWidth',2)
legend('Data','Ordinary Least Squares','Robust Regression')

References

[1] DuMouchel, W. H., and F. L. O'Brien. "Integrating a Robust Option into a Multiple Regression Computing Environment." Computer Science and Statistics: Proceedings of the 21st Symposium on the Interface. Alexandria, VA: American Statistical Association, 1989.

[2] Holland, P. W., and R. E. Welsch. "Robust Regression Using Iteratively Reweighted Least-Squares." Communications in Statistics: Theory and Methods, A6, 1977, pp. 813–827.

[3] Huber, P. J. Robust Statistics. Hoboken, NJ: John Wiley & Sons, Inc., 1981.

[4] Street, J. O., R. J. Carroll, and D. Ruppert. "A Note on Computing Robust Regression Estimates via Iteratively Reweighted Least Squares." The American Statistician. Vol. 42, 1988, pp. 152–154.

See Also

regress, robustdemo

  


Recommended Products

Includes the most popular MATLAB recorded presentations with Q&A sessions led by MATLAB experts.

 © 1984-2009- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS