How does Treebagger handle missing values?

7 views (last 30 days)
Kristen Weasenforth
Kristen Weasenforth on 18 Sep 2017
Answered: Matlab on 25 Nov 2017
I've seen bits and pieces of this answer, such that NaNs get ignored in Treebagger, but no explicit answer. How are the NaNs being ignored? Does the entire row or column containing a NaN get removed? Or if an observation in the training data for an individual tree is missing that variable, is the variable simply not used on that individual tree but still used in other trees in the random forest? Or do the missing values get imputed? If so, with what?
If anyone could give me a definitive answer on what the Treebagger function is doing with them that would be amazing.

Answers (1)

Matlab
Matlab on 25 Nov 2017
Random forest consists of the decision tree. I think the answer of the question is how fittree resolve the missing value.Actually the question can divide into two parts——training part and prediction part. In default, when it comes to split a node, it will ignore the sample whose testing value is missing in the impurity computation. It also can use another split method surrogate decision splits to deal with the missing value. The details are explained in the help document. When it comes to Prediction, the sample is missing in the testing attribute.I'm not sure about this part. It will produce some copies, and each copy will come along the branch with corresponding probability. The main idea is from the paper 《Induction of the decision tree》

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!