Main Content

Fit linear regression model using stepwise regression

returns a vector `b`

= stepwisefit(`X`

,`y`

)`b`

of coefficient estimates from stepwise
regression of the response vector `y`

on the predictor variables in
matrix `X`

. `stepwisefit`

begins with an initial
constant model and takes forward or backward steps to add or remove variables, until
a stopping criterion is satisfied.

specifies additional options using one or more name-value pair arguments. For
example, you can specify a nonconstant initial model, or a maximum number of steps
that `b`

= stepwisefit(`X`

,`y`

,`Name,Value`

)`stepwisefit`

can take.

`[`

also returns a specification of the variables in the final regression model
`b`

,`se`

,`pval`

,`finalmodel`

,`stats`

] = stepwisefit(___)`finalmodel`

, and statistics `stats`

about
the final model.

*Stepwise regression* is a method for adding terms to and
removing terms from a multilinear model based on their statistical significance. This
method begins with an initial model and then takes successive steps to modify the model
by adding or removing terms. At each step, the *p*-value of an
*F*-statistic is computed to test models with and without a
potential term. If a term is not currently in the model, the null hypothesis is that the
term would have a zero coefficient if added to the model. If there is sufficient
evidence to reject the null hypothesis, the term is added to the model. Conversely, if a
term is currently in the model, the null hypothesis is that the term has a zero
coefficient. If there is insufficient evidence to reject the null hypothesis, the term
is removed from the model. The method proceeds as follows:

Fit the initial model.

If any terms not in the model have

*p*-values less than an entry tolerance, add the one with the smallest*p*-value and repeat this step. For example, assume the initial model is the default constant model and the entry tolerance is the default`0.05`

. The algorithm first fits all models consisting of the constant plus another term and identifies the term that has the smallest*p*-value, for example term`4`

. If the term`4`

*p*-value is less than`0.05`

, then term`4`

is added to the model. Next, the algorithm performs a search among all models consisting of the constant, term`4`

, and another term. If a term not in the model has a*p*-value less than`0.05`

, the term with the smallest*p*-value is added to the model and the process is repeated. When no further terms exist that can be added to the model, the algorithm proceeds to step 3.If any terms in the model have

*p*-values greater than an exit tolerance, remove the one with the largest*p*-value and go to step 2; otherwise, end.

In each step of the algorithm, `stepwisefit`

uses the method of least
squares to estimate the model coefficients. After adding a term to the model at an
earlier stage, the algorithm might subsequently drop that term if it is no longer
helpful in combination with other terms added later. The method terminates when no
single step improves the model. However, the final model is not guaranteed to be
optimal, which means having the best fit to the data. A different initial model or a
different sequence of steps might lead to a better fit. In this sense, stepwise models
are locally optimal, but are not necessarily globally optimal.

You can create a model using

`fitlm`

, and then manually adjust the model using`step`

,`addTerms`

, and`removeTerms`

.Use

`stepwiselm`

if you have data in a table, you have a mix of continuous and categorical predictors, or you want to specify model formulas that can potentially include higher-order and interaction terms.Use

`stepwiseglm`

to create stepwise generalized linear models (for example, if you have a binary response variable and want to fit a classification model).

[1] Draper, Norman R., and Harry
Smith. *Applied Regression Analysis*. Hoboken, NJ:
Wiley-Interscience, 1998. pp. 307–312.

`addedvarplot`

| `regress`

| `stepwise`

| `stepwiseglm`

| `stepwiselm`