|
Hi,
I am running classifiers (e.g. MDA or SVM) on a data set with many explanatory variables (usually between 60 and 150). In general, I am dividing data into 3 to 7 groups.
Many of the explanatory variables will contribute only noise and so I would like to know which variables are important for correctly classifying the data. I have tried running the classifier with each variable removed in turn but the results are confusing. *Removal* of variables which have high signal to noise ratios (these should be the informative cases) can sometimes result in an *increase* in classification accuracy. This makes me think that there is substantial information in the correlations between variables. What would be a good way of testing this? I can't do it exhaustively because there are too many variable combinations. Could the optimization toolbox help?
Cheers
|