Why does "stepwiselm" not remove terms with high p-values?

7 views (last 30 days)
Hi,
I'm using stepwiselm to model some data - however I seem to have difficulties to exactly understand what this command does:
stepwiselm(X,'linear','PEnter',0.05)
From what I understand this should give out a model with each term not having a p-value above 0.05. The output does however contain terms with p values up to 0.51.
If I'm using for the same set of data the command below, the output seems fine (no p-values above 0.05).
stepwiselm(X,'linear','upper','linear)
From what I understood 'upper' suppresses Matlab to make any kind of linear combination of the term to get better results. So I assume its rather a 'coincidence' that it removes terms with high p-values from the model.
Thanks a lot for your help!

Answers (1)

Aylin
Aylin on 12 Oct 2016
Hello,
I understand that you are trying to remove terms with high p-values from a stepwise regression model of your data. In order to do this, I would recommend setting the ‘ PRemove ’ property of the stepwiselm function.
Let me explain this further using your code. The first line of your code:
stepwiselm(X, linear, PEnter, 0.05)
is building a regression model of your X data. Initially, it only includes ‘ linear ’ terms of the regression model. Then, the ‘ PEnter ’ property allows additional terms to be included in the regression model only if their p-value is less than 0.05. Note however, in the above line of code, the ‘ PRemove ’ property is already set by default to 0.10. This means that only terms with p-values greater than 0.10 are actually removed from the regression model. Please refer to the following documentation link for more information about the ‘ PRemove ’ property:
In order to exclude regression terms with a p-value of greater than 0.05, your first line of code should be modified to:
stepwiselm(X, linear, PEnter, 0.025, PRemove, 0.05)
This should remove any regression terms with a p-value greater than 0.05.
As you mentioned in your question, setting the ‘ Upper ’ property of the stepwiselm function constrains the regression model to use only ‘ linear ’ terms. Yes, it probably is only a coincidence here that the p-values of the regression terms with this setting are all less than 0.05.
The MATLAB documentation contains some detailed examples that can help clarify the use of stepwiselm:
  1 Comment
M L
M L on 13 Oct 2016
Dear Rylan, thanks a lot for your help and taking the time! I tried using
stepwiselm(X, ‘linear’, ‘PEnter’, 0.025, ‘PRemove’, 0.05)
as you suggested. It still contains a term that has a p-values of 0,1 and hence shouldn't be in the model. However a linear combination of that term is in the model and has a valid p-value. The stepwiselm documentation says:
'At any stage, the function will not add a higher-order term if the model does not also include all lower-order terms that are subsets of it. For example, it will not try to add the term X1:X2^2 unless both X1 and X2^2 are already in the model. Similarly, the function will not remove lower-order terms that are subsets of higher-order terms that remain in the model. For example, it will not examine to remove X1 or X2^2 if X1:X2^2 stays in the model.'
What I do not understand is why the term with an unvalid p-value is added in the first place? How does matlab exactly proceed when adding and removing terms. Does it do forward stepwise regression first and subsequently backwards?
Thanks again for your help!

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!