Asked by Ke Dang
on 29 Mar 2012

Hi All, Thank you for your time. I want to ask a question in below:

Given a training set, a test set, a list of features and a result set using all features in the machine learning, I would liek to know:

1. Some way to know how to select the set of features that would produce best result 2. What features contributed most to the classification 3. What features did not contribute to the classification

Is there a function that can do it in Matlab?

*No products are associated with this question.*

Answer by Ilya
on 29 Mar 2012

Edited by Ilya
on 20 Sep 2012

Accepted answer

Here are Statistics Toolbox utilities you should look into:

- sequentialfs
- relieff
- predictorImportance method of ClassificationTree, or its older version, varimportance method of classregtree
- Ensembles of decision trees. In particular, TreeBagger has several properties for estimation of predictor importance, especially DeltaCritDecisionSplit and OOBPermutedVarDeltaError
- Discriminant analysis with thresholding available in 12a from ClassificationDiscriminant. See DeltaPredictor property.

If you can recast your classification problem as a (generalized) linear regression model, functions lasso and lassoglm would help. Also, LinearModel.stepwise and GeneralizedLinearModel.stepwise, if you have a sufficiently recent version of MATLAB.

As you see, there are plenty of options. Without knowing more about your data, it's hard to say what might work best for you.

Opportunities for recent engineering grads.

## 0 Comments