Products & Services Industries Academia Support User Community Company

Learn more about Statistics Toolbox   

regress - Multiple linear regression

Syntax

b = regress(y,X)
[b,bint] = regress(y,X)
[b,bint,r] = regress(y,X)
[b,bint,r,rint] = regress(y,X)
[b,bint,r,rint,stats] = regress(y,X)
[...] = regress(y,X,alpha)

Description

b = regress(y,X) returns a p-by-1 vector b of coefficient estimates for a multilinear regression of the responses in y on the predictors in X. X is an n-by-p matrix of p predictors at each of n observations. y is an n-by-1 vector of observed responses.

regress treats NaNs in X or y as missing values, and ignores them.

If the columns of X are linearly dependent, regress obtains a basic solution by setting the maximum number of elements of b to zero.

[b,bint] = regress(y,X) returns a p-by-2 matrix bint of 95% confidence intervals for the coefficient estimates. The first column of bint contains lower confidence bounds for each of the p coefficient estimates; the second column contains upper confidence bounds.

If the columns of X are linearly dependent, regress returns zeros in elements of bint corresponding to the zero elements of b.

[b,bint,r] = regress(y,X) returns an n-by-1 vector r of residuals.

[b,bint,r,rint] = regress(y,X) returns an n-by-2 matrix rint of intervals that can be used to diagnose outliers. If the interval rint(i,:) for observation i does not contain zero, the corresponding residual is larger than expected in 95% of new observations, suggesting an outlier.

In a linear model, observed values of y are random variables, and so are their residuals. Residuals have normal distributions with zero mean but with different variances at different values of the predictors. To put residuals on a comparable scale, they are "Studentized," that is, they are divided by an estimate of their standard deviation that is independent of their value. Studentized residuals have t distributions with known degrees of freedom. The intervals returned in rint are shifts of the 95% confidence intervals of these t distributions, centered at the residuals.

[b,bint,r,rint,stats] = regress(y,X) returns a 1-by-4 vector stats that contains, in order, the R2 statistic, the F statistic and its p-value, and an estimate of the error variance.

[...] = regress(y,X,alpha) uses a 100*(1-alpha)% confidence level to compute bint and rint.

Examples

Load data on cars; identify weight and horsepower as predictors, mileage as the response:

load carsmall
x1 = Weight;
x2 = Horsepower; % Contains NaN data
y = MPG;

Compute regression coefficients for a linear model with an interaction term:

X = [ones(size(x1)) x1 x2 x1.*x2];
b = regress(y,X) % Removes NaN data
b =
  60.7104
  -0.0102
  -0.1882
   0.0000

Plot the data and the model:

scatter3(x1,x2,y,'filled')
hold on
x1fit = min(x1):100:max(x1);
x2fit = min(x2):10:max(x2);
[X1FIT,X2FIT] = meshgrid(x1fit,x2fit);
YFIT = b(1) + b(2)*X1FIT + b(3)*X2FIT + b(4)*X1FIT.*X2FIT;
mesh(X1FIT,X2FIT,YFIT)
xlabel('Weight')
ylabel('Horsepower')
zlabel('MPG')
view(50,10)

References

[1] Chatterjee, S., and A. S. Hadi. "Influential Observations, High Leverage Points, and Outliers in Linear Regression." Statistical Science. Vol. 1, 1986, pp. 379–416.

See Also

regstats, mvregress, robustfit, stepwisefit, rcoplot

  


Recommended Products

Includes the most popular MATLAB recorded presentations with Q&A sessions led by MATLAB experts.

 © 1984-2009- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS