Robust regression
b = robustfit(X,y)
b = robustfit(X,y,wfun
,tune)
b = robustfit(X,y,wfun
,tune,const
)
[b,stats] = robustfit(...)
returns
a (p + 1)-by-1 vector b = robustfit(X,y)
b
of coefficient
estimates for a robust multilinear regression of the responses in y
on
the predictors in X
. X
is an n-by-p matrix
of p predictors at each of n observations. y
is
an n-by-1 vector of observed responses. By default,
the algorithm uses iteratively reweighted least squares with a bisquare
weighting function.
Note:
By default, |
robustfit
treats NaN
s
in X
or y
as missing values,
and removes them.
b = robustfit(X,y,
specifies
a weighting function wfun
,tune)wfun
. tune
is
a tuning constant that is divided into the residual vector before
computing weights.
The weighting function wfun
can be
any one of the following:
Weight Function | Equation | Default Tuning Constant |
---|---|---|
'andrews' | w = (abs(r)<pi) .* sin(r) ./ r | 1.339 |
'bisquare' (default) | w = (abs(r)<1) .* (1 - r.^2).^2 | 4.685 |
'cauchy' | w = 1 ./ (1 + r.^2) | 2.385 |
'fair' | w = 1 ./ (1 + abs(r)) | 1.400 |
'huber' | w = 1 ./ max(1, abs(r)) | 1.345 |
'logistic' | w = tanh(r) ./ r | 1.205 |
'ols' | Ordinary least squares (no weighting function) | None |
'talwar' | w = 1 * (abs(r)<1) | 2.795 |
'welsch' | w = exp(-(r.^2)) | 2.985 |
If tune
is unspecified, the default value
in the table is used. Default tuning constants give coefficient estimates
that are approximately 95% as statistically efficient as the ordinary
least-squares estimates, provided the response has a normal distribution
with no outliers. Decreasing the tuning constant increases the downweight
assigned to large residuals; increasing the tuning constant decreases
the downweight assigned to large residuals.
The value r
in the weight functions is
r = resid/(tune*s*sqrt(1-h))
where resid
is the vector of residuals from
the previous iteration, h
is the vector of leverage
values from a least-squares fit, and s
is an estimate
of the standard deviation of the error term given by
s = MAD/0.6745
Here MAD
is the median absolute deviation
of the residuals from their median. The constant 0.6745 makes the
estimate unbiased for the normal distribution. If there are p columns
in X
, the smallest p absolute
deviations are excluded when computing the median.
You can write your own weight function. The function must take
a vector of scaled residuals as input and produce a vector of weights
as output. In this case, wfun
is specified
using a function handle @
(as in @myfun
),
and the input tune
is required.
b = robustfit(X,y,
controls
whether or not the model will include a constant term. wfun
,tune,const
)const
is 'on'
to
include the constant term (the default), or 'off'
to
omit it. When const
is 'on'
, robustfit
adds
a first column of 1s to X
and b
becomes
a (p + 1)-by-1 vector . When const
is 'off'
, robustfit
does
not alter X
, then b
is a p-by-1
vector.
[b,stats] = robustfit(...)
returns
the structure stats
, whose fields contain diagnostic
statistics from the regression. The fields of stats
are:
ols_s
— Sigma estimate (RMSE)
from ordinary least squares
robust_s
— Robust estimate
of sigma
mad_s
— Estimate of sigma
computed using the median absolute deviation of the residuals from
their median; used for scaling residuals during iterative fitting
s
— Final estimate of sigma,
the larger of robust_s
and a weighted average of ols_s
and robust_s
resid
— Residual
rstud
— Studentized residual
(see regress
for more information)
se
— Standard error of coefficient
estimates
covb
— Estimated covariance
matrix for coefficient estimates
coeffcorr
— Estimated correlation
of coefficient estimates
t
— Ratio of b
to se
p
— p-values
for t
w
— Vector of weights for
robust fit
R
— R factor
in QR decomposition of X
dfe
— Degrees of freedom
for error
h
— Vector of leverage values
for least-squares fit
The robustfit
function estimates the variance-covariance
matrix of the coefficient estimates using inv(X'*X)*stats.s^2
.
Standard errors and correlations are derived from this estimate.
[1] DuMouchel, W. H., and F. L. O'Brien. "Integrating a Robust Option into a Multiple Regression Computing Environment." Computer Science and Statistics: Proceedings of the 21st Symposium on the Interface. Alexandria, VA: American Statistical Association, 1989.
[2] Holland, P. W., and R. E. Welsch. "Robust Regression Using Iteratively Reweighted Least-Squares." Communications in Statistics: Theory and Methods, A6, 1977, pp. 813–827.
[3] Huber, P. J. Robust Statistics. Hoboken, NJ: John Wiley & Sons, Inc., 1981.
[4] Street, J. O., R. J. Carroll, and D. Ruppert. "A Note on Computing Robust Regression Estimates via Iteratively Reweighted Least Squares." The American Statistician. Vol. 42, 1988, pp. 152–154.