How to do feature selection by maximizing Rsquared for linear regression model ?

1 view (last 30 days)
Hi everyone,
I am doing project to build predictive model but before that I just want the important features for the model. So I am using feature selection.
I went through this link and code is working properly https://www.mathworks.com/help/stats/feature-selection.html.
But I want to use Rsquared instead of Deviance which is used in the above link, that is I want to select those features that give good Rsquared value(>0.85) .
Can anyone help me out with the code , thanks !
  1 Comment
the cyclist
the cyclist on 11 Jun 2019
Also, I should mention that if your full model (i.e. with all features) does not achieve R^2 > 0.85, then a reduced feature set cannot achieve that. Is that what you were hoping for?

Sign in to comment.

Answers (1)

the cyclist
the cyclist on 10 Jun 2019
Edited: the cyclist on 10 Jun 2019
I believe you just need to redefine the critfun function from the one in the example:
function dev = critfun(X,Y)
model = fitglm(X,Y,'Distribution','binomial');
dev = model.Deviance;
end
replacing the critical value with
dev = model.RSquared
You might want to rename that variable something like rsqr, to avoid confusion.
EDIT:
After reading that example, and thinking about it a bit more, there might be some other nuances. That example states, "Adding a feature with no effect reduces the deviance by an amount that has a chi-square distribution with one degree of freedom". I'm not sure the same is true for R^2. So, that might bear some thought.
Also, I believe the deviance measure is something that is minimized, whereas R^2 is maximized. There is probably an adjustment that needs to be made for that as well. (One simple possibility would be to return 1-R^2 in the critical function, I guess.)
  4 Comments
Shreeraksha Raviprakash
Shreeraksha Raviprakash on 11 Jun 2019
Was just a copy paste error , but had run with correct variable and still could not get the code to select features.
the cyclist
the cyclist on 11 Jun 2019
I have to admit that I have not tried to deeply understand the example. But it seems to me that you still need to deal with the fact that you want to maximize R^2, not minimize it.
Also, I think you have not fully understood the purpose of the lines
maxR=chi2inv(0.4,1);
...
'TolFun',maxdev,...
(where I assume the mismatch here is another typo).
That line is not about defining the absolute level of R^2 that defines the stopping criterion. It is about the relative level, compared to the prior models with fewer feature (I think).
All in all, my impression is that you are trying to make these changes without getting a deeper understanding of what everything is doing, which is hazardous to getting the correct result.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!