I originally ran my data through the code at this link and got an average error rate of ~3%. When I realized I couldn't easily calculate variable importance with that code, I switched over to TreeBagger.
RF_ensemble = TreeBagger(ntrees,meanValuesPerPitcher,string(pitcherClusters),'Method','classification',... 'OOBPredictorImportance','on');
oobError(RF_ensemble,'Mode','individual') = vector with values ranging from 0.7 to 0.94.
oobError(RF_ensemble,'Mode','ensemble') = 0.44
I would rather go with the TreeBagger function since I'm more confident it is correct, but I don't understand how or why the error rate is so high.
My data is a 50x14 matrix (50 observations with 14 variables), and my labels vector is a 50x1 numeric vector with a cluster number 1-10 for each observation.
I must be doing something wrong because there is no way the error is this high, but I don't know what to do. Let me know if more information would be helpful.