Questions about OOB error in TreeBagger

6 views (last 30 days)
Emmanuel
Emmanuel on 14 May 2014
Answered: Emmanuel on 27 May 2014
Hello,
I'm currently working on a classification problem with random forests and am using Matlab's TreeBagger. To estimate the discriminant power of my features, I would like to visualize the prediction ratio for each class. So far I used a train and test set, and given that each forest gives a slightly different result due to its random nature, I build 100 forests and average the ratios.
However, on Breiman's site (<http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#ooberr>) it is stated :
"In random forests, there is no need for cross-validation or a separate test set to get an unbiased estimate of the test set error. It is estimated internally, during the run, as follows: Each tree is constructed using a different bootstrap sample from the original data. About one-third of the cases are left out of the bootstrap sample and not used in the construction of the kth tree."
I have seen papers using random forest for classification, where the authors still use train/test sets and cross-validation. I am confused : with random forests, how should the classification ratio be computed? With the "classic" method (train/test sets and cross-validation) or with the out-of-bag (OOB) estimations (according to what Breiman says)?
So I wanted to try the out-of-bag estimations. In the TreeBagger doc, I have seen that one can use the 'OOBPred' option and plot(oobError(b)) to visualize the classification error. My questions to this function are:
  • How can I visualize the OOB error for EACH class, and not only the general error?
  • As far as I understood, OOB estimations requires bagging ("About one-third of the cases are left out"). How does TreeBagger behave when I turn on the 'OOBPred' option while the 'FBoot' option is 1 (default value)? FBoot=1 means that there is no bagging right? ("Fraction of input data to sample with replacement from the input data for growing each new tree")

Answers (1)

Emmanuel
Emmanuel on 27 May 2014
Hi,
I found in Breiman's Random Forests paper that:
"The study of error estimate for bagged classifiers in Breiman [1996b], gives empirical evidence to show that the OOB estimate is as accurate as using a test set of the same size as the training set."
So I suppose that people continue using train/test sets and cross-validation with RF when they have to compare the performance of their classification with other methods, where the test set and the train set are not of the same size.
But I still don't know how to compute the OOB error for each class with TreeBagger and how it behaves with FBoot=1. I could need some advice! ;-)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!