regress - Multiple linear regression

Syntax

b = regress(y,X)
[b,bint] = regress(y,X)
[b,bint,r] = regress(y,X)
[b,bint,r,rint] = regress(y,X)
[b,bint,r,rint,stats] = regress(y,X)
[...] = regress(y,X,alpha)

Description

b = regress(y,X) returns a p-by-1 vector b of coefficient estimates for a multilinear regression of the responses in y on the predictors in X. X is an n-by-p matrix of p predictors at each of n observations. y is an n-by-1 vector of observed responses.

regress treats NaNs in X or y as missing values, and ignores them.

If the columns of X are linearly dependent, regress obtains a basic solution by setting the maximum number of elements of b to zero.

[b,bint] = regress(y,X) returns a p-by-2 matrix bint of 95% confidence intervals for the coefficient estimates. The first column of bint contains lower confidence bounds for each of the p coefficient estimates; the second column contains upper confidence bounds.

If the columns of X are linearly dependent, regress returns zeros in elements of bint corresponding to the zero elements of b.

[b,bint,r] = regress(y,X) returns an n-by-1 vector r of residuals.

[b,bint,r,rint] = regress(y,X) returns an n-by-2 matrix rint of intervals that can be used to diagnose outliers. If the interval rint(i,:) for observation i does not contain zero, the corresponding residual is larger than expected in 95% of new observations, suggesting an outlier.

In a linear model, observed values of y are random variables, and so are their residuals. Residuals have normal distributions with zero mean but with different variances at different values of the predictors. To put residuals on a comparable scale, they are "Studentized," that is, they are divided by an estimate of their standard deviation that is independent of their value. Studentized residuals have t distributions with known degrees of freedom. The intervals returned in rint are shifts of the 95% confidence intervals of these t distributions, centered at the residuals.

[b,bint,r,rint,stats] = regress(y,X) returns a 1-by-4 vector stats that contains, in order, the R2 statistic, the F statistic and its p-value, and an estimate of the error variance.

[...] = regress(y,X,alpha) uses a 100*(1-alpha)% confidence level to compute bint and rint.

Example

Load data on cars; identify weight and horsepower as predictors, mileage as the response:

load carsmall
x1 = Weight;
x2 = Horsepower; % Contains NaN data
y = MPG;

Compute regression coefficients for a linear model with an interaction term:

X = [ones(size(x1)) x1 x2 x1.*x2];
b = regress(y,X) % Removes NaN data
b =
  60.7104
  -0.0102
  -0.1882
   0.0000

Plot the data and the model:

scatter3(x1,x2,y,'filled')
hold on
x1fit = min(x1):100:max(x1);
x2fit = min(x2):10:max(x2);
[X1FIT,X2FIT] = meshgrid(x1fit,x2fit);
YFIT = b(1) + b(2)*X1FIT + b(3)*X2FIT + b(4)*X1FIT.*X2FIT;
mesh(X1FIT,X2FIT,YFIT)
xlabel('Weight')
ylabel('Horsepower')
zlabel('MPG')
view(50,10)

Reference

[1] Chatterjee, S., A. S. Hadi, "Influential Observations, High Leverage Points, and Outliers in Linear Regression," Statistical Science, 1986, pp. 379- 416.

See Also

regstats, mvregress, robustfit, stepwisefit, rcoplot

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS