Products & Services Solutions Academia Support User Community Company

Learn more about Statistics Toolbox   

TreeBagger - Class: TreeBagger

Create ensemble of bagged decision trees

Syntax

B = TreeBagger(ntrees,X,Y)
B = TreeBagger(ntrees,X,Y,'param1',val1,'param2',val2,...)

Description

B = TreeBagger(ntrees,X,Y) creates an ensemble B of ntrees decision trees for predicting response Y as a function of predictors X. By default TreeBagger builds an ensemble of classification trees. The function can build an ensemble of regression trees by setting the optional input argument 'method' to 'regression'.

X is a numeric matrix of training data. Each row represents an observation and each column represents a predictor or feature. Y is an array of true class labels for classification or numeric function values for regression. True class labels can be a numeric vector, character matrix, vector cell array of strings or categorical vector. TreeBagger converts labels to a cell array of strings for classification.

B = TreeBagger(ntrees,X,Y,'param1',val1,'param2',val2,...) specifies optional parameter name/value pairs:

'FBoot'Fraction of input data to sample with replacement from the input data for growing each new tree.
'oobpred''on' to store info on what observations are out of bag for each tree. This info can be used by oobPredict to compute the predicted class probabilities for each tree in the ensemble. Default is 'off'.
'OOBVarImp''on' to store out-of-bag estimates of feature importance in the ensemble. Default is 'off'. Specifying 'on' also sets the 'ooberr' value to 'on'.
'Method'Either 'classification' or 'regression'. Regression requires a numeric Y.
'NVarToSample'Number of variables to select at random for each decision split. Default is the square root of the number of variables for classification and one third of the number of variables for regression. Valid values are 'all' or a positive integer.
'NPrint'Number of training cycles (grown trees) after which TreeBagger displays a diagnostic message showing training progress. Default is no diagnostic messages.
'MinLeaf'Minimum number of observations per tree leaf. Default is 1 for classification and 5 for regression.
'Options'A struct that specifies options that govern the computation when growing the ensemble of decision trees. One option requests that the computation of decision trees on multiple bootstrap replicates uses multiple processors, if the Parallel Computing Toolbox is available. Two options specify the random number streams to use in selecting bootstrap replicates. You can create this argument with a call to statset. You can retrieve values of the individual fields with a call to statget. Applicable statset parameters are:
  • 'UseParallel' — If 'always' and if a matlabpool of the Parallel Computing Toolbox is open, compute decision trees drawn on separate boostrap replicates in parallel. If the Parallel Computing Toolbox is not installed, or a matlabpool is not open, computation occurs in serial mode. Default is 'never', or serial computation.

  • 'UseSubstreams' — If 'always' select each bootstrap replicate using a separate Substream of the random number generator (aka Stream). This option is available only with RandStream types that support Substreams. Default is 'never', do not use a different Substream to compute each bootstrap replicate.

  • 'Streams' — An object of the RandStream class, or a cell array of RandStream objects. Default is an empty cell array. If you do not supply a value for this parameter, TreeBagger uses the default RandStream on each MATLAB executable in selecting bootstrap replicates. Otherwise, TreeBagger selects bootstrap replicates using the supplied RandStream object(s). If you select 'UseSubstreams', the Streams parameter, if present, must be a scalar RandStream object. If you do not select 'UseSubstreams', then the Streams parameter, if present, must match the number of processors used for the computation. For serial computation, the Streams parameter must be a scalar. If computation is distributed ('UseParallel' is 'always' and a matlabpool is open), then the Streams parameter must be a cell array of the same length as the matlabpool size. In this case, each element of the cell array supplies the random number generator for bootstrap sampling on one of the parallel workers.

In addition to the optional arguments above, this method accepts all optional classregtree arguments with the exception of 'minparent'. Refer to the documentation for classregtree for more detail.

Examples

load fisheriris
b = TreeBagger(50,meas,species,'oobpred','on')
plot(oobError(b))
xlabel('number of grown trees')
ylabel('out-of-bag classification error')

returns

b = 

Ensemble with 50 bagged decision trees:
               Training X:              [150x4]
               Training Y:              [150x1]
                   Method:       classification
                    Nvars:                    4
             NVarToSample:                    2
                  MinLeaf:                    1
                    FBoot:                    1
    SampleWithReplacement:                    1
     ComputeOOBPrediction:                    1
         ComputeOOBVarImp:                    0
                Proximity:                   []
                    Prune:                    0
              MergeLeaves:                    0
                 TreeArgs:
               ClassNames:'setosa' 'versicolor' 'virginica'

See Also

Regression and Classification by Bagging Decision Trees, Grouped Data

classregtree, CompactTreeBagger

  


Recommended Products

Includes the most popular MATLAB recorded presentations with Q&A sessions led by MATLAB experts.

 © 1984-2009- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS