Note: This page has been translated by MathWorks. Please click here

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Multiple linear regression

`b = regress(y,X)`

[b,bint] = regress(y,X)

[b,bint,r] = regress(y,X)

[b,bint,r,rint] = regress(y,X)

[b,bint,r,rint,stats] = regress(y,X)

[...] = regress(y,X,alpha)

`b = regress(y,X)`

returns
a *p*-by-1 vector `b`

of coefficient
estimates for a multilinear regression of the responses in `y`

on
the predictors in `X`

. `X`

is an *n*-by-*p* matrix
of *p* predictors at each of *n* observations. `y`

is
an *n*-by-1 vector of observed responses.

`regress`

treats `NaN`

s
in `X`

or `y`

as missing values,
and ignores them.

If the columns of `X`

are linearly dependent, `regress`

obtains
a basic solution by setting the maximum number of elements of `b`

to
zero.

`[b,bint] = regress(y,X)`

returns
a *p*-by-2 matrix `bint`

of 95%
confidence intervals for the coefficient estimates. The first column
of `bint`

contains lower confidence bounds for each
of the *p* coefficient estimates; the second column
contains upper confidence bounds.

If the columns of `X`

are linearly dependent, `regress`

returns
zeros in elements of `bint`

corresponding to the
zero elements of `b`

.

`[b,bint,r] = regress(y,X)`

returns
an *n*-by-1 vector `r`

of residuals.

`[b,bint,r,rint] = regress(y,X)`

returns
an *n*-by-2 matrix `rint`

of intervals
that can be used to diagnose outliers. If the interval `rint(i,:)`

for
observation `i`

does not contain zero, the corresponding
residual is larger than expected in 95% of new observations, suggesting
an outlier.

In a linear model, observed values of `y`

are
random variables, and so are their residuals. Residuals have normal
distributions with zero mean but with different variances at different
values of the predictors. To put residuals on a comparable scale,
they are “Studentized,” that is, they are divided by
an estimate of their standard deviation that is independent of their
value. Studentized residuals have *t* distributions
with known degrees of freedom. The intervals returned in `rint`

are
shifts of the 95% confidence intervals of these *t* distributions,
centered at the residuals.

`[b,bint,r,rint,stats] = regress(y,X)`

returns
a 1-by-4 vector `stats`

that contains, in order,
the *R*^{2} statistic,
the *F* statistic and its *p* value,
and an estimate of the error variance.

When computing statistics, `X`

should include
a column of 1s so that the model contains a constant term. The *F* statistic
and its *p* value are computed under this assumption,
and they are not correct for models without a constant.

The *F* statistic is the test statistic of
the F-test on the regression model, for a significant linear regression
relationship between the response variable and the predictor variables.

The *R*^{2} statistic
can be negative for models without a constant, indicating that the
model is not appropriate for the data.

`[...] = regress(y,X,alpha)`

uses
a `100*(1-alpha)`

% confidence level to compute `bint`

and `rint`

.

[1] Chatterjee, S., and A. S. Hadi. “Influential
Observations, High Leverage Points, and Outliers in Linear Regression.” *Statistical
Science*. Vol. 1, 1986, pp. 379–416.

`LinearModel`

| `fitlm`

| `mvregress`

| `rcoplot`

| `stepwiselm`

Was this topic helpful?