Hello,
I'm currently working on a classification problem with random forests and am using Matlab's TreeBagger. To estimate the discriminant power of my features, I would like to visualize the prediction ratio for each class. So far I used a train and test set, and given that each forest gives a slightly different result due to its random nature, I build 100 forests and average the ratios.
However, on Breiman's site (<http://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm#ooberr>) it is stated :
"In random forests, there is no need for cross-validation or a separate test set to get an unbiased estimate of the test set error. It is estimated internally, during the run, as follows: Each tree is constructed using a different bootstrap sample from the original data. About one-third of the cases are left out of the bootstrap sample and not used in the construction of the kth tree."
I have seen papers using random forest for classification, where the authors still use train/test sets and cross-validation. I am confused : with random forests, how should the classification ratio be computed? With the "classic" method (train/test sets and cross-validation) or with the out-of-bag (OOB) estimations (according to what Breiman says)?
So I wanted to try the out-of-bag estimations. In the TreeBagger doc, I have seen that one can use the 'OOBPred' option and plot(oobError(b)) to visualize the classification error. My questions to this function are:
- How can I visualize the OOB error for EACH class, and not only the general error?
- As far as I understood, OOB estimations requires bagging ("About one-third of the cases are left out"). How does TreeBagger behave when I turn on the 'OOBPred' option while the 'FBoot' option is 1 (default value)? FBoot=1 means that there is no bagging right? ("Fraction of input data to sample with replacement from the input data for growing each new tree")