b = robustfit(X,y)
b = robustfit(X,y,wfun,tune)
b = robustfit(X,y,wfun,tune,const)
[b,stats] = robustfit(...)
b = robustfit(X,y) returns a (p + 1)-by-1 vector b of coefficient estimates for a robust multilinear regression of the responses in y on the predictors in X. X is an n-by-p matrix of p predictors at each of n observations. y is an n-by-1 vector of observed responses. By default, the algorithm uses iteratively reweighted least squares with a bisquare weighting function.
Note: By default, robustfit adds a first column of 1s to X, corresponding to a constant term in the model. Do not enter a column of 1s directly into X. You can change the default behavior of robustfit using the input const, below.
robustfit treats NaNs in X or y as missing values, and removes them.
The weighting function wfun can be any one of the following strings:
|Weight Function||Equation||Default Tuning Constant|
|'andrews'||w = (abs(r)<pi) .* sin(r) ./ r||1.339|
|'bisquare' (default)||w = (abs(r)<1) .* (1 - r.^2).^2||4.685|
|'cauchy'||w = 1 ./ (1 + r.^2)||2.385|
|'fair'||w = 1 ./ (1 + abs(r))||1.400|
|'huber'||w = 1 ./ max(1, abs(r))||1.345|
|'logistic'||w = tanh(r) ./ r||1.205|
|'ols'||Ordinary least squares (no weighting function)||None|
|'talwar'||w = 1 * (abs(r)<1)||2.795|
|'welsch'||w = exp(-(r.^2))||2.985|
If tune is unspecified, the default value in the table is used. Default tuning constants give coefficient estimates that are approximately 95% as statistically efficient as the ordinary least-squares estimates, provided the response has a normal distribution with no outliers. Decreasing the tuning constant increases the downweight assigned to large residuals; increasing the tuning constant decreases the downweight assigned to large residuals.
The value r in the weight functions is
r = resid/(tune*s*sqrt(1-h))
where resid is the vector of residuals from the previous iteration, h is the vector of leverage values from a least-squares fit, and s is an estimate of the standard deviation of the error term given by
s = MAD/0.6745
Here MAD is the median absolute deviation of the residuals from their median. The constant 0.6745 makes the estimate unbiased for the normal distribution. If there are p columns in X, the smallest p absolute deviations are excluded when computing the median.
You can write your own weight function. The function must take a vector of scaled residuals as input and produce a vector of weights as output. In this case, wfun is specified using a function handle @ (as in @myfun), and the input tune is required.
b = robustfit(X,y,wfun,tune,const) controls whether or not the model will include a constant term. const is 'on' to include the constant term (the default), or 'off' to omit it. When const is 'on', robustfit adds a first column of 1s to X and b becomes a (p + 1)-by-1 vector . When const is 'off', robustfit does not alter X, then b is a p-by-1 vector.
ols_s — Sigma estimate (RMSE) from ordinary least squares
robust_s — Robust estimate of sigma
mad_s — Estimate of sigma computed using the median absolute deviation of the residuals from their median; used for scaling residuals during iterative fitting
s — Final estimate of sigma, the larger of robust_s and a weighted average of ols_s and robust_s
resid — Residual
rstud — Studentized residual (see regress for more information)
se — Standard error of coefficient estimates
covb — Estimated covariance matrix for coefficient estimates
coeffcorr — Estimated correlation of coefficient estimates
t — Ratio of b to se
p — p-values for t
w — Vector of weights for robust fit
R — R factor in QR decomposition of X
dfe — Degrees of freedom for error
h — Vector of leverage values for least-squares fit
The robustfit function estimates the variance-covariance matrix of the coefficient estimates using inv(X'*X)*stats.s^2. Standard errors and correlations are derived from this estimate.
Generate data with the trend y = 10 - 2* x , then change one value to simulate an outlier.
x = (1:10)'; rng default; % For reproducibility y = 10 - 2*x + randn(10,1); y(10) = 0;
Fit a straight line using ordinary least squares regression.
bls = regress(y,[ones(10,1) x])
bls = 7.8518 -1.3644
Now use robust regression to estimate a straight-line fit.
brob = robustfit(x,y)
brob = 8.4504 -1.5278
Create scatter plot of the data together with the fits.
scatter(x,y,'filled'); grid on; hold on plot(x,bls(1)+bls(2)*x,'r','LineWidth',2); plot(x,brob(1)+brob(2)*x,'g','LineWidth',2) legend('Data','Ordinary Least Squares','Robust Regression')
The robust fit is less influenced by the outlier than the least-squares fit.
 DuMouchel, W. H., and F. L. O'Brien. "Integrating a Robust Option into a Multiple Regression Computing Environment." Computer Science and Statistics: Proceedings of the 21st Symposium on the Interface. Alexandria, VA: American Statistical Association, 1989.
 Holland, P. W., and R. E. Welsch. "Robust Regression Using Iteratively Reweighted Least-Squares." Communications in Statistics: Theory and Methods, A6, 1977, pp. 813–827.
 Huber, P. J. Robust Statistics. Hoboken, NJ: John Wiley & Sons, Inc., 1981.
 Street, J. O., R. J. Carroll, and D. Ruppert. "A Note on Computing Robust Regression Estimates via Iteratively Reweighted Least Squares." The American Statistician. Vol. 42, 1988, pp. 152–154.