Can more predictor variables reduce perfomance when using Classification Trees?

1 view (last 30 days)

Niklas Axelsson on 1 Jul 2012

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/42462-can-more-predictor-variables-reduce-perfomance-when-using-classification-trees

Hi again everyone,

Say we have explanatory variables stored in X, and the response variable in Y. Y contains either 'n' or 's' (where 'n' is very much less common than 's').

I just wanted to know if the performance or result of a Classification Tree (using either ClassificationTree.fit, or TreeBagger with a sufficiently amount of trees) could get worse if I introduce a new variable in X?

I have read that if one introduces to many 'bad' variables in X, the algorithm will eventually try and actually use some of them when building the tree, which would of course result in bad results (maybe since there are more variables that are bad, compared to the ones that actually contribute to a good prediction value).

The thing is, I have introduced one new variable in my tree which really covers up nearly all of the 'n':s.

Is this a common problem? What would you think could be the remedy, apart from deleting the variable in X?

I guess one could use weights to try and give some variables more predictive power over others, but that will probably introduce some sort of bias and I do not really know which variables are 'good' or not anyway...

Regards, Niklas

1 Comment
Show -1 older commentsHide -1 older comments

Tom Lane on 1 Jul 2012

If the new variable "covers up" the n's, do you mean it predicts them well? Is this a problem because you suspect the relationship is spurious for some reason? It would be helpful if you would explain your concern.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Can more predictor variables reduce perfomance when using Classification Trees?

1 Comment
Show -1 older commentsHide -1 older comments

Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

Can more predictor variables reduce perfomance when using Classification Trees?

1 Comment Show -1 older commentsHide -1 older comments

Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments