Documentation

This is machine translation

Translated by Microsoft
Mouse over text to see original. Click the button below to return to the English verison of the page.

step

Class: GeneralizedLinearModel

Improve generalized linear regression model by adding or removing terms

Syntax

mdl1 = step(mdl)
mdl1 = step(mdl,Name,Value)

Description

mdl1 = step(mdl) returns an improved generalized linear model based on mdl, with one predictor added or removed.

mdl1 = step(mdl,Name,Value) improves a generalized linear model with additional options specified by one or more Name,Value pair arguments.

Tips

Input Arguments

expand all

Generalized linear model representing a least-squares fit of the link of the response to the data, returned as a GeneralizedLinearModel object.

For properties and methods of the generalized linear model object, mdl, see the GeneralizedLinearModel class page.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

expand all

Criterion to add or remove terms, specified as the comma-separated pair consisting of 'Criterion' and one of the following:

  • 'sse' — Default for stepwiselm. p-value for an F-test of the change in the sum of squared error by adding or removing the term.

  • 'aic' — Change in the value of Akaike information criterion (AIC).

  • 'bic' — Change in the value of Bayesian information criterion (BIC).

  • 'rsquared' — Increase in the value of R2.

  • 'adjrsquared' — Increase in the value of adjusted R2.

Example: 'Criterion','bic'

Model specification describing terms that cannot be removed from the model, specified as the comma-separated pair consisting of 'Lower' and one of the options for modelspec naming the model.

Example: 'Lower','linear'

Number of steps to take, specified as the comma-separated pair consisting of 'NSteps' and a positive integer.

Data Types: single | double

Improvement measure for adding a term, specified as the comma-separated pair consisting of 'PEnter' and a scalar value. The default values are below.

CriterionDefault valueDecision
'Deviance'0.05If the p-value of F or chi-squared statistic is smaller than PEnter, add the term to the model.
'SSE'0.05If the SSE of the model is smaller than PEnter, add the term to the model.
'AIC'0If the change in the AIC of the model is smaller than PEnter, add the term to the model.
'BIC'0If the change in the BIC of the model is smaller than PEnter, add the term to the model.
'Rsquared'0.1If the increase in the R-squared of the model is larger than PEnter, add the term to the model.
'AdjRsquared'0If the increase in the adjusted R-squared of the model is larger than PEnter, add the term to the model.

For more information on the criteria, see Criterion name-value pair argument.

Example: 'PEnter',0.075

Improvement measure for removing a term, specified as the comma-separated pair consisting of 'PRemove' and a scalar value.

CriterionDefault valueDecision
'Deviance'0.10If the p-value of F or chi-squared statistic is larger than PRemove, remove the term from the model.
'SSE'0.10If the p-value of the F statistic is larger than PRemove, remove the term from the model.
'AIC'0.01If the change in the AIC of the model is larger than PRemove, remove the term from the model.
'BIC'0.01If the change in the BIC of the model is larger than PRemove, remove the term from the model.
'Rsquared'0.05If the increase in the R-squared value of the model is smaller than PRemove, remove the term from the model.
'AdjRsquared'-0.05If the increase in the adjusted R-squared value of the model is smaller than PRemove, remove the term from the model.

At each step, stepwise algorithm also checks whether any term is redundant (linearly dependent) with other terms in the current model. When any term is linearly dependent with other terms in the current model, it is removed, regardless of the criterion value.

For more information on the criteria, see Criterion name-value pair argument.

Example: 'PRemove',0.05

Model specification describing the largest set of terms in the fit, specified as the comma-separated pair consisting of 'Upper' and one of the character vector options for modelspec naming the model.

Example: 'Upper','quadratic'

Control for display of information, specified as the comma-separated pair consisting of 'Verbose' and one of the following:

  • 0 — Suppress all display.

  • 1 — Display the action taken at each step.

  • 2 — Also display the actions evaluated at each step.

Example: 'Verbose',2

Output Arguments

expand all

Regression model with additional terms, returned as a LinearModel object. mdl1 is the same as mdl but includes the additional terms specified in terms. To overwrite mdl, set mdl1 equal to mdl.

Examples

expand all

Fit a Poisson regression model using random data and a single predictor, then step in other predictors.

Generate artificial data with 20 predictors, using three of the predictors for the responses.

rng('default') % for reproducibility
X = randn(100,20);
mu = exp(X(:,[5 10 15])*[.4;.2;.3] + 1);
y = poissrnd(mu);

Construct a generalized linear model using X(:,1) as the only predictor.

mdl = fitglm(X,y,...
    'y ~ x1','Distribution','poisson')
mdl = 

Generalized Linear regression model:
    log(y) ~ 1 + x1
    Distribution = Poisson

Estimated Coefficients:
                   Estimate    SE          tStat     pValue    
    (Intercept)      1.1278    0.057487    19.618    1.0904e-85
    x1             0.061287     0.04848    1.2642       0.20617

100 observations, 98 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 1.59, p-value = 0.208

Add a variable to the model using step.

mdl1 = step(mdl)
1. Adding x5, Deviance = 134.2976, Chi2Stat = 50.80176, PValue = 1.021821e-12

mdl1 = 

Generalized Linear regression model:
    log(y) ~ 1 + x1 + x5
    Distribution = Poisson

Estimated Coefficients:
                   Estimate    SE          tStat      pValue    
    (Intercept)      1.0418    0.062341     16.712      1.07e-62
    x1             0.018803    0.049916    0.37671       0.70639
    x5              0.47881    0.067875     7.0542    1.7357e-12

100 observations, 97 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 52.4, p-value = 4.21e-12

Add another variable to the model using step.

mdl1 = step(mdl1)
2. Adding x15, Deviance = 105.9973, Chi2Stat = 28.30027, PValue = 1.038814e-07

mdl1 = 

Generalized Linear regression model:
    log(y) ~ 1 + x1 + x5 + x15
    Distribution = Poisson

Estimated Coefficients:
                   Estimate    SE          tStat      pValue    
    (Intercept)      1.0459      0.0627     16.681    1.7975e-62
    x1             0.026907     0.05003    0.53782        0.5907
    x5               0.3983    0.068376     5.8251    5.7073e-09
    x15             0.28949    0.053992     5.3618    8.2375e-08

100 observations, 96 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 80.7, p-value = 2.18e-17

Related Examples

Algorithms

Stepwise regression is a systematic method for adding and removing terms from a linear or generalized linear model based on their statistical significance in explaining the response variable. The method begins with an initial model, specified using modelspec, and then compares the explanatory power of incrementally larger and smaller models.

MATLAB® uses forward and backward stepwise regression to determine a final model. At each step, the method searches for terms to add to or remove from the model based on the value of the 'Criterion' argument. The default value of 'Criterion' is 'sse', and in this case, stepwiselm uses the p-value of an F-statistic to test models with and without a potential term at each step. If a term is not currently in the model, the null hypothesis is that the term would have a zero coefficient if added to the model. If there is sufficient evidence to reject the null hypothesis, the term is added to the model. Conversely, if a term is currently in the model, the null hypothesis is that the term has a zero coefficient. If there is insufficient evidence to reject the null hypothesis, the term is removed from the model.

Here is how stepwise proceeds when 'Criterion' is 'sse':

  1. Fit the initial model.

  2. Examine a set of available terms not in the model. If any of these terms have p-values less than an entrance tolerance (that is, if it is unlikely that they would have zero coefficient if added to the model), add the one with the smallest p-value and repeat this step; otherwise, go to step 3.

  3. If any of the available terms in the model have p-values greater than an exit tolerance (that is, the hypothesis of a zero coefficient cannot be rejected), remove the one with the largest p-value and go to step 2; otherwise, end.

At any stage, the function will not add a higher-order term if the model does not also include all lower-order terms that are subsets of it. For example, it will not try to add the term X1:X2^2 unless both X1 and X2^2 are already in the model. Similarly, the function will not remove lower-order terms that are subsets of higher-order terms that remain in the model. For example, it will not examine to remove X1 or X2^2 if X1:X2^2 stays in the model.

The default for stepwiseglm is 'Deviance' and it follows a similar procedure for adding or removing terms.

There are several other criteria available, which you can specify using the 'Criterion' argument. You can use the change in the value of the Akaike information criterion, Bayesian information criterion, R-squared, adjusted R-squared as a criterion to add or remove terms.

Depending on the terms included in the initial model and the order in which terms are moved in and out, the method might build different models from the same set of potential terms. The method terminates when no single step improves the model. There is no guarantee, however, that a different initial model or a different sequence of steps will not lead to a better fit. In this sense, stepwise models are locally optimal, but might not be globally optimal.

Alternatives

Use stepwiseglm to select a model from a starting model, continuing until no single step is beneficial.

Use addTerms or removeTerms to add or remove particular terms.

Was this topic helpful?