B = TreeBagger(NumTrees,Tbl,ResponseVarName)
B = TreeBagger(NumTrees,Tbl,Formula)
B = TreeBagger(NumTrees,Tbl,Y)
B = TreeBagger(NumTrees,X,Y)
B = TreeBagger(NumTrees,X,Y,
B = TreeBagger(NumTrees,Tbl,ResponseVarName) creates
an ensemble for predicting the responses stored in
a function of the predictors in the table
ResponseVarName is the name of a variable
Tbl. By default
an ensemble of classification trees. The function can build an ensemble
of regression trees by setting the optional input argument
B = TreeBagger(NumTrees,Tbl,Formula) creates
the formula string
Formula to specify the response
and predictor variables in
Wilkinson notation. For more information, see Wilkinson Notation.
B = TreeBagger(NumTrees,Tbl,Y) creates
trees for predicting responses in vector
Y as a
function of the predictors stored in the table
Y is an array of response data. Elements
Y correspond to the rows of
Y is the set of true class
labels. Labels can be any grouping variable,
that is, a numeric or logical vector, character matrix, cell vector
of strings, or categorical vector.
labels to a cell array of strings for classification. For regression,
a numeric vector.
B = TreeBagger(NumTrees,X,Y) creates an
trees for predicting response
Y as a function of
predictors in the numeric matrix of training data,
Each row in
X represents an observation and each
column represents a predictor or feature.
B = TreeBagger(NumTrees,X,Y, specifies
optional parameter name-value pairs:
|Fraction of input data to sample with replacement from the input data for growing each new tree. Default value is 1.|
The default value is
|Number of variables to select at random for each decision split.
Default is the square root of the number of variables for classification
and one third of the number of variables for regression. Valid values
|Number of training cycles (grown trees) after which |
|Minimum number of observations per tree leaf. Default is 1 for classification and 5 for regression.|
|A structure that specifies options that govern the computation
when growing the ensemble of decision trees. One option requests that
the computation of decision trees on multiple bootstrap replicates
uses multiple processors, if the Parallel Computing Toolbox™ is
available. Two options specify the random number streams to use in
selecting bootstrap replicates. You can create this argument with
a call to |
Prior probabilities for each class. Specify as one of:
If you set values for both
Categorical predictors list, specified as the comma-separated
pair consisting of
In addition to the optional
arguments above, this method accepts all optional
with the exception of
'MinParent'. Refer to the
fitrtree for more detail.
Load Fisher's iris data set.
Train a bagged ensemble of classification trees using the data and specifying
50 weak learners. Store which observations are out of bag for each tree.
rng(1); % For reproducibility BaggedEnsemble = TreeBagger(50,meas,species,'OOBPrediction','On',... 'Method','classification')
BaggedEnsemble = TreeBagger Ensemble with 50 bagged decision trees: Training X: [150x4] Training Y: [150x1] Method: classification NumPredictors: 4 NumPredictorsToSample: 2 MinLeafSize: 1 InBagFraction: 1 SampleWithReplacement: 1 ComputeOOBPrediction: 1 ComputeOOBPredictorImportance: 0 Proximity:  ClassNames: 'setosa' 'versicolor' 'virginica'
BaggedEnsemble is a
BaggedEnsemble.Trees is the property that stores a 50-by-1 cell vector of the trained classification trees (
CompactClassificationTree model objects) that compose the ensemble.
Plot a graph of the first trained classification tree.
TreeBagger grows deep trees.
BaggedEnsemble.OOBIndices is the property that stores the out-of-bag indices as a matrix of logical values.
Plot the out-of-bag error over the number of grown classification trees.
oobErrorBaggedEnsemble = oobError(BaggedEnsemble); plot(oobErrorBaggedEnsemble) xlabel 'Number of grown trees'; ylabel 'Out-of-bag classification error';
The out-of-bag error decreases with the number of grown trees.
To label out-of-bag observations, pass
TreeBagger generates in-bag samples by oversampling
classes with large misclassification costs and undersampling classes
with small misclassification costs. Consequently, out-of-bag samples
have fewer observations from classes with large misclassification
costs and more observations from classes with small misclassification
costs. If you train a classification ensemble using a small data set
and a highly skewed cost matrix, then the number of out-of-bag observations
per class might be very low. Therefore, the estimated out-of-bag error
might have a large variance and might be difficult to interpret. The
same phenomenon can occur for classes with large prior probabilities.
Avoid large estimated out-of-bag error variances by setting a more balanced misclassification cost matrix or a less skewed prior probability vector.