feature reduction via regression analysis

1 view (last 30 days)

Show older comments

joeDiHare on 16 Jul 2012

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/43771-feature-reduction-via-regression-analysis

Accepted Answer: Ilya

Open in MATLAB Online

Suppose you have a very large feature vector X, used to predict a a vector of expected values y.

Is the sequential linear linear regression,

e.g.: coeff=regress(y, X);

followed by sequential feature reduction,

e.g.   [coeff_subset] = sequentialfs(fun, X, y, 'direction', 'backward'); 
  % where: fun = @(XT,yT,Xt,yt)(rmse(regress(yT, XT)'*Xt')', yt);

the easiest/best approach to get the a reasonable sized feature vecture when no other information is known?

It seems that, from my testing, this method rarely captures the features that matter the most, and I obtained better results by randomly selecting some of the features.

10 Comments
Show 8 older commentsHide 8 older comments

Ilya on 18 Jul 2012

I have trouble interpreting what you wrote in 1 because I still don't know what you mean by correlation. I thought you were saying that the correlation between each individual predictor and the observed response (measured y values) was small for all predictors but one. But setting one predictor to zero cannot have any effect on correlations between the other predictors and the response. And so "correlations went from near zero to back up again" is a mystery to me. Then perhaps by "correlation" you mean correlation between the predicted response and observed response? I don't get how that can be zero after you added the predictor with 94% correlation to the model either. If that happens, something must've gone bad with the fit.

Instead of re-running stepwisefit, I would recommend playing with 'penter' and 'premove' parameters.

joeDiHare on 18 Jul 2012

Thanks, I will tweak p values to get what I need.

About 1, yes, it is strange to me, but correlation between predicted and observed responses goes to zero because there is one feature that has wild values.

I don't know why it happens, but by setting the bad feature (e.g. feat #71) to 0, corr goes to 94% again (and a bit higher).

Accepted Answer

Ilya on 17 Jul 2012

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/43771-feature-reduction-via-regression-analysis#answer_53730

Open in MATLAB Online

If you prefer linear regression, use function stepwisefit or its new incarnation LinearModel.stepwise. For example, for backward elimination with an intercept term you can do

load carsmall
X = [Acceleration Cylinders Displacement Horsepower];
y = MPG;
stepwisefit([ones(100,1) X],y,'inmodel',true(1,5))

In general, there is no "best" approach to feature selection. What you can do depends on what assumptions you are willing to make (such as linear model), how many features you have and how much effort you want to invest.

0 Comments
Show -2 older commentsHide -2 older comments

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

feature reduction via regression analysis

10 Comments
Show 8 older commentsHide 8 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

feature reduction via regression analysis

10 Comments Show 8 older commentsHide 8 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

10 Comments
Show 8 older commentsHide 8 older comments

0 Comments
Show -2 older commentsHide -2 older comments