This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.


Multiple linear regression


b = regress(y,X)
[b,bint] = regress(y,X)
[b,bint,r] = regress(y,X)
[b,bint,r,rint] = regress(y,X)
[b,bint,r,rint,stats] = regress(y,X)
[...] = regress(y,X,alpha)


b = regress(y,X) returns a p-by-1 vector b of coefficient estimates for a multilinear regression of the responses in y on the predictors in X. X is an n-by-p matrix of p predictors at each of n observations. y is an n-by-1 vector of observed responses.

regress treats NaNs in X or y as missing values, and ignores them.

If the columns of X are linearly dependent, regress obtains a basic solution by setting the maximum number of elements of b to zero.

[b,bint] = regress(y,X) returns a p-by-2 matrix bint of 95% confidence intervals for the coefficient estimates. The first column of bint contains lower confidence bounds for each of the p coefficient estimates; the second column contains upper confidence bounds.

If the columns of X are linearly dependent, regress returns zeros in elements of bint corresponding to the zero elements of b.

[b,bint,r] = regress(y,X) returns an n-by-1 vector r of residuals.

[b,bint,r,rint] = regress(y,X) returns an n-by-2 matrix rint of intervals that can be used to diagnose outliers. If the interval rint(i,:) for observation i does not contain zero, the corresponding residual is larger than expected in 95% of new observations, suggesting an outlier.

In a linear model, observed values of y are random variables, and so are their residuals. Residuals have normal distributions with zero mean but with different variances at different values of the predictors. To put residuals on a comparable scale, they are "Studentized," that is, they are divided by an estimate of their standard deviation that is independent of their value. Studentized residuals have t distributions with known degrees of freedom. The intervals returned in rint are shifts of the 95% confidence intervals of these t distributions, centered at the residuals.

[b,bint,r,rint,stats] = regress(y,X) returns a 1-by-4 vector stats that contains, in order, the R2 statistic, the F statistic and its p value, and an estimate of the error variance.

    Note:   When computing statistics, X should include a column of 1s so that the model contains a constant term. The F statistic and its p value are computed under this assumption, and they are not correct for models without a constant.

    The F statistic is the test statistic of the F-test on the regression model, for a significant linear regression relationship between the response variable and the predictor variables.

    The R2 statistic can be negative for models without a constant, indicating that the model is not appropriate for the data.

[...] = regress(y,X,alpha) uses a 100*(1-alpha)% confidence level to compute bint and rint.


collapse all

This example shows how to estimate the coefficients of a multiple linear regression.

Load the sample data. Identify weight and horsepower as predictors, and mileage as the response.

load carsmall
x1 = Weight;
x2 = Horsepower;    % Contains NaN data
y = MPG;

Compute the regression coefficients for a linear model with an interaction term.

X = [ones(size(x1)) x1 x2 x1.*x2];
b = regress(y,X)    % Removes NaN data
b =


Plot the data and the model.

hold on
x1fit = min(x1):100:max(x1);
x2fit = min(x2):10:max(x2);
[X1FIT,X2FIT] = meshgrid(x1fit,x2fit);
YFIT = b(1) + b(2)*X1FIT + b(3)*X2FIT + b(4)*X1FIT.*X2FIT;

Related Examples


[1] Chatterjee, S., and A. S. Hadi. "Influential Observations, High Leverage Points, and Outliers in Linear Regression." Statistical Science. Vol. 1, 1986, pp. 379–416.

Introduced before R2006a

Was this topic helpful?