| Statistics Toolbox™ | ![]() |
b = stepwisefit(X,y)
[b,se,pval,inmodel,stats,nextstep,history] = stepwisefit(...)
[...] = stepwisefit(X,y,param1,val1,param2,val2,...)
b = stepwisefit(X,y) uses a stepwise method to perform a multilinear regression of the response values in the n-by-1 vector y on the p predictive terms in the n-by-p matrix X. Distinct predictive terms should appear in different columns of X. b is a p-by-1 vector of estimated coefficients for all of the terms in X. The value in b for a term not included in the final model is the coefficient estimate that would result from adding the term to the model.
Note stepwisefit automatically includes a constant term in all models. Do not enter a column of 1s directly into X. |
stepwisefit treats NaN values in either X or y as missing values, and ignores them.
[b,se,pval,inmodel,stats,nextstep,history] = stepwisefit(...) returns the following additional information:
se — A vector of standard errors for b
pval — A vector of p-values for testing whether elements of b are 0
inmodel — A logical vector, with length equal to the number of columns in X, specifying which terms are in the final model
stats — A structure of additional statistics with the following fields. All statistics pertain to the final model except where noted.
source — The string 'stepwisefit'
dfe — Degrees of freedom for error
df0 — Degrees of freedom for the regression
SStotal — Total sum of squares of the response
SSresid — Sum of squares of the residuals
fstat — F-statistic for testing the final model vs. no model (mean only)
pval — p-value of the F-statistic
rmse — Root mean square error
xr — Residuals for predictors not in the final model, after removing the part of them explained by predictors in the model
yr — Residuals for the response using predictors in the final model
B — Coefficients for terms in final model, with values for a term not in the model set to the value that would be obtained by adding that term to the model
SE — Standard errors for coefficient estimates
TSTAT — t statistics for coefficient estimates
PVAL — p-values for coefficient estimates
intercept — Estimated intercept
wasnan — Indicates which rows in the data contained NaN values
nextstep — The recommended next step—either the index of the next term to move in or out of the model, or 0 if no further steps are recommended
history — A structure containing information on steps taken, with the following fields:
rmse — Root mean square errors for the model at each step
df0 — Degrees of freedom for the regression at each step
in — Logical array indicating which predictors are in the model at each step
[...] = stepwisefit(X,y,param1,val1,param2,val2,...) specifies one or more of the name/value pairs described in the following table.
| Parameter Name | Parameter Value |
|---|---|
'inmodel' | A logical vector specifying terms to include in the initial fit. The default is to specify no terms. |
'penter' | The maximum p-value for a term to be added. The default is 0.05. |
'premove' | The minimum p-value for a term to be removed. The default is the maximum of the value of 'penter' and 0.10. |
'display' | 'on' displays information about each step in the command window. This is the default. 'off' omits the display. |
'maxiter' | The maximum number of steps in the regression. The default is Inf. |
'keep' | A logical vector specifying terms to keep in their initial state. The default is to specify no terms. |
'scale' | 'on' centers and scales each column of X (computes z-scores) before fitting. 'off' does not scale the terms. This is the default. |
Stepwise regression is a systematic method for adding and removing terms from a multilinear model based on their statistical significance in a regression. The method begins with an initial model and then compares the explanatory power of incrementally larger and smaller models. At each step, the p-value of an F-statistic is computed to test models with and without a potential term. If a term is not currently in the model, the null hypothesis is that the term would have a zero coefficient if added to the model. If there is sufficient evidence to reject the null hypothesis, the term is added to the model. Conversely, if a term is currently in the model, the null hypothesis is that the term has a zero coefficient. If there is insufficient evidence to reject the null hypothesis, the term is removed from the model. The method proceeds as follows:
Fit the initial model.
If any terms not in the model have p-values less than an entrance tolerance (that is, if it is unlikely that they would have zero coefficient if added to the model), add the one with the smallest p-value and repeat this step; otherwise, go to step 3.
If any terms in the model have p-values greater than an exit tolerance (that is, if it is unlikely that the hypothesis of a zero coefficient can be rejected), remove the one with the largest p-value and go to step 2; otherwise, end.
Depending on the terms included in the initial model and the order in which terms are moved in and out, the method may build different models from the same set of potential terms. The method terminates when no single step improves the model. There is no guarantee, however, that a different initial model or a different sequence of steps will not lead to a better fit. In this sense, stepwise models are locally optimal, but may not be globally optimal.
Load the data in hald.mat, which contains observations of the heat of reaction of various cement mixtures:
load hald whos Name Size Bytes Class Attributes Description 22x58 2552 char hald 13x5 520 double heat 13x1 104 double ingredients 13x4 416 double
The response (heat) depends on the quantities of the four predictors (the columns of ingredients).
Use stepwisefit to carry out the stepwise regression algorithm, beginning with no terms in the model and using entrance/exit tolerances of 0.05/0.10 on the p-values:
stepwisefit(ingredients,heat,...
'penter',0.05,'premove',0.10);
Initial columns included: none
Step 1, added column 4, p=0.000576232
Step 2, added column 1, p=1.10528e-006
Final columns included: 1 4
'Coeff' 'Std.Err.' 'Status' 'P'
[ 1.4400] [ 0.1384] 'In' [1.1053e-006]
[ 0.4161] [ 0.1856] 'Out' [ 0.0517]
[-0.4100] [ 0.1992] 'Out' [ 0.0697]
[-0.6140] [ 0.0486] 'In' [1.8149e-007]
stepwisefit automatically includes an intercept term in the model, so you do not add it explicitly to ingredients as you would for regress. For terms not in the model, coefficient estimates and their standard errors are those that result if the term is added.
The inmodel parameter is used to specify terms in an initial model:
initialModel = ...
[false true false false]; % Force in 2nd term
stepwisefit(ingredients,heat,...
'inmodel',initialModel,...
'penter',.05,'premove',0.10);
Initial columns included: 2
Step 1, added column 1, p=2.69221e-007
Final columns included: 1 2
'Coeff' 'Std.Err.' 'Status' 'P'
[ 1.4683] [ 0.1213] 'In' [2.6922e-007]
[ 0.6623] [ 0.0459] 'In' [5.0290e-008]
[ 0.2500] [ 0.1847] 'Out' [ 0.2089]
[-0.2365] [ 0.1733] 'Out' [ 0.2054]
The preceding two models, built from different initial models, use different subsets of the predictive terms. Terms 2 and 4, swapped in the two models, are highly correlated:
term2 = ingredients(:,2);
term4 = ingredients(:,4);
R = corrcoef(term2,term4)
R =
1.0000 -0.9730
-0.9730 1.0000To compare the models, use the stats output of stepwisefit:
[betahat1,se1,pval1,inmodel1,stats1] = ...
stepwisefit(ingredients,heat,...
'penter',.05,'premove',0.10,...
'display','off');
[betahat2,se2,pval2,inmodel2,stats2] = ...
stepwisefit(ingredients,heat,...
'inmodel',initialModel,...
'penter',.05,'premove',0.10,...
'display','off');
RMSE1 = stats1.rmse
RMSE1 =
2.7343
RMSE2 = stats2.rmse
RMSE2 =
2.4063The second model has a lower Root Mean Square Error (RMSE).
[1] Draper, N., and H. Smith,Applied Regression Analysis, 2nd edition, John Wiley and Sons, 1981, pp. 307-312.
stepwise, addedvarplot, regress
![]() | stepwise | summary | ![]() |
| © 1984-2008- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |