MathWorks Machine Translation
The automated translation of this page is provided by a general purpose third party translator tool.
MathWorks does not warrant, and disclaims all liability for, the accuracy, suitability, or fitness for purpose of the translation.
Create decision tree template
returns
a default decision tree learner template suitable for training ensembles
or errorcorrecting output code (ECOC) multiclass models. Specify t
= templateTreet
as
a learner using:
fitensemble
for
classification or regression ensembles
fitcecoc
for
ECOC model classification
If you specify a default decision tree template, then
the software uses default values for all input arguments during training.
It is good practice to specify the type of decision tree, e.g., for
a classification tree template, specify 'Type','classification'
.
If you specify the type of decision tree and display t
in
the Command Window, then all options except Type
appear
empty ([]
).
creates
a template with additional options specified by one or more namevalue
pair arguments.t
= templateTree(Name,Value
)
For example, you can specify the algorithm used to find the best split on a categorical predictor, the split criterion, or the number of predictors selected for each split.
If you display t
in the Command Window, then
all options appear empty ([]
), except those that
you specify using namevalue pair arguments. During training, the
software uses default values for empty options.
Create a decision tree template with surrogate splits, and use the template to train an ensemble using sample data.
Load Fisher's iris data set.
load fisheriris
Create a decision tree template with surrogate splits.
t = templateTree('Surrogate','on')
t = Fit template for Tree. Surrogate: 'on'
All options of the template object are empty except for Surrogate
. When you pass t
to the training function, the software fills in the empty options with their respective default values.
Specify t
as a weak learner for a classification ensemble.
Mdl = fitensemble(meas,species,'AdaBoostM2',100,t)
Mdl = classreg.learning.classif.ClassificationEnsemble ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 150 NumTrained: 100 Method: 'AdaBoostM2' LearnerNames: {'Tree'} ReasonForTermination: 'Terminated normally after completing the reque...' FitInfo: [100×1 double] FitInfoDescription: {2×1 cell}
Display the insample (resubstitution) misclassification error.
L = resubLoss(Mdl)
L = 0.0333
Use a trained, boosted regression tree ensemble to predict the fuel economy of a car. Choose the number of cylinders, volume displaced by the cylinders, horsepower, and weight as predictors. Then, train an ensemble using fewer predictors and compare its insample predictive accuracy against the first ensemble.
Load the carsmall
data set. Store the training data in a table.
load carsmall
Tbl = table(Cylinders,Displacement,Horsepower,Weight,MPG);
Specify a regression tree template that uses surrogate splits to impove predictive accuracy in the presence of NaN
values.
t = templateTree('Surrogate','On');
Train the regression tree ensemble using LSBoost and 100 learning cycles.
Mdl1 = fitensemble(Tbl,'MPG','LSBoost',100,t);
Mdl1
is a trained RegressionEnsemble
regression ensemble. Because MPG
is a variable in the MATLAB® Workspace, you can obtain the same result by entering
Mdl1 = fitensemble(Tbl,MPG,'LSBoost',100,t);
Use the trained regression ensemble to predict the fuel economy for a fourcylinder car with a 200cubic inch displacement, 150 horsepower, and weighing 3000 lbs.
predMPG = predict(Mdl1,[4 200 150 3000])
predMPG = 22.8462
The average fuel economy of a car with these specifications is 21.78 mpg.
Train a new ensemble using all predictors in Tbl
except Displacement
.
formula = 'MPG ~ Cylinders + Horsepower + Weight'; Mdl2 = fitensemble(Tbl,formula,'LSBoost',100,t);
Compare the resubstitution MSEs between Mdl1
and Mdl2
.
mse1 = resubLoss(Mdl1) mse2 = resubLoss(Mdl2)
mse1 = 6.4721 mse2 = 7.8599
The insample MSE for the ensemble that trains on all predictors is lower.
You can control the depth of the trees in an ensemble of decision trees. You can also control the tree depth in an ECOC model containing decision tree binary learners using the MaxNumSplits
, MinLeafSize
, or MinParentSize
namevalue pair parameters.
When bagging decision trees, fitensemble
grows deep decision trees by default. You can grow shallower trees to reduce model complexity or computation time.
When boosting decision trees, fitensemble grows stumps (a tree with one split) by default. You can grow deeper trees for better accuracy.
Load the carsmall
data set. Specify the variables Acceleration
, Displacement
, Horsepower
, and Weight
as predictors, and MPG
as the response.
load carsmall
X = [Acceleration Displacement Horsepower Weight];
Y = MPG;
The default values of the tree depth controllers for boosting regression trees are:
1
for MaxNumSplits
. This option grows stumps.
5
for MinLeafSize
10
for MinParentSize
To search for the optimal number of splits:
Train a set of ensembles. Exponentially increase the maximum number of splits for subsequent ensembles from stump to at most n  1 splits. Also, decrease the learning rate for each ensemble from 1 to 0.1.
Cross validate the ensembles.
Estimate the crossvalidated meansquared error (MSE) for each ensemble.
Compare the crossvalidated MSEs. The ensemble with the lowest one performs the best, and indicates the optimal maximum number of splits, number of trees, and learning rate for the data set.
Grow and cross validate a deep regression tree and a stump. Specify to use surrogate splits because the data contain missing values. These serve as benchmarks.
MdlDeep = fitrtree(X,Y,'CrossVal','on','MergeLeaves','off',... 'MinParentSize',1,'Surrogate','on'); MdlStump = fitrtree(X,Y,'MaxNumSplits',1,'CrossVal','on','Surrogate','on');
Train the boosting ensembles using 150 regression trees. Cross validate the ensemble using 5fold cross validation. Vary the maximum number of splits using the values in the sequence , where m is such that is no greater than n  1. For each variant, adjust the learning rate to each value in the set {0.1, 0.25, 0.5, 1};
n = size(X,1); m = floor(log2(n  1)); lr = [0.1 0.25 0.5 1]; maxNumSplits = 2.^(0:m); numTrees = 150; Mdl = cell(numel(maxNumSplits),numel(lr)); rng(1); % For reproducibility for k = 1:numel(lr); for j = 1:numel(maxNumSplits); t = templateTree('MaxNumSplits',maxNumSplits(j),'Surrogate','on'); Mdl{j,k} = fitensemble(X,Y,'LSBoost',numTrees,t,... 'Type','regression','KFold',5,'LearnRate',lr(k)); end; end;
Compute the crossvalidated MSE for each ensemble.
kflAll = @(x)kfoldLoss(x,'Mode','cumulative'); errorCell = cellfun(kflAll,Mdl,'Uniform',false); error = reshape(cell2mat(errorCell),[numTrees numel(maxNumSplits) numel(lr)]); errorDeep = kfoldLoss(MdlDeep); errorStump = kfoldLoss(MdlStump);
Plot how the crossvalidated MSE behaves as the number of trees in the ensemble increases for a few of the ensembles, the deep tree, and the stump. Plot the curves with respect to learning rate in the same plot, and plot separate plots for varying tree complexities. Choose a subset of tree complexity levels.
mnsPlot = [1 round(numel(maxNumSplits)/2) numel(maxNumSplits)]; figure; for k = 1:3; subplot(2,2,k); plot(squeeze(error(:,mnsPlot(k),:)),'LineWidth',2); axis tight; hold on; h = gca; plot(h.XLim,[errorDeep errorDeep],'.b','LineWidth',2); plot(h.XLim,[errorStump errorStump],'.r','LineWidth',2); plot(h.XLim,min(min(error(:,mnsPlot(k),:))).*[1 1],'k'); h.YLim = [10 50]; xlabel 'Number of trees'; ylabel 'Crossvalidated MSE'; title(sprintf('MaxNumSplits = %0.3g', maxNumSplits(mnsPlot(k)))); hold off; end; hL = legend([cellstr(num2str(lr','Learning Rate = %0.2f'));... 'Deep Tree';'Stump';'Min. MSE']); hL.Position(1) = 0.6;
Each curve contains a minimum crossvalidated MSE occuring at the optimal number of trees in the ensemble.
Identify the maximum number of splits, number of trees, and learning rate that yields the lowest MSE overall.
[minErr,minErrIdxLin] = min(error(:)); [idxNumTrees,idxMNS,idxLR] = ind2sub(size(error),minErrIdxLin); fprintf('\nMin. MSE = %0.5f',minErr) fprintf('\nOptimal Parameter Values:\nNum. Trees = %d',idxNumTrees); fprintf('\nMaxNumSplits = %d\nLearning Rate = %0.2f\n',... maxNumSplits(idxMNS),lr(idxLR))
Min. MSE = 18.42979 Optimal Parameter Values: Num. Trees = 1 MaxNumSplits = 4 Learning Rate = 1.00
For a different approach to optimizing this ensemble, see Optimize a Boosted Regression Ensemble.
This example requires a Parallel Computing Toolbox™ license.
Load the carsmall
data set. Consider a model that predicts the mean fuel economy of a car given its acceleration, number of cylinders, engine displacement, horsepower, manufacturer, model year, and weight. Consider Cylinders
, Mfg
, and Model_Year
as categorical variables.
load carsmall Cylinders = categorical(Cylinders); Mfg = categorical(cellstr(Mfg)); Model_Year = categorical(Model_Year); X = table(Acceleration,Cylinders,Displacement,Horsepower,Mfg,... Model_Year,Weight,MPG); rng('default'); % For reproducibility
Display the number of categories represented in the categorical variables.
numCylinders = numel(categories(Cylinders)) numMfg = numel(categories(Mfg)) numModelYear = numel(categories(Model_Year))
numCylinders = 3 numMfg = 28 numModelYear = 3
Because there are 3 categories only in Cylinders
and Model_Year
, the standard CART, predictorsplitting algorithm prefers splitting a continuous predictor over these two variables.
Train a random forest of 500 regression trees using the entire data set. To grow unbiased trees, specify usage of the curvature test for splitting predictors. Because there are missing values in the data, specify usage of surrogate splits.
t = templateTree('PredictorSelection','curvature','Surrogate','on'); Mdl = fitrensemble(X,'MPG','Method','bag','NumLearningCycles',500,... 'Learners',t);
Estimate predictor importance measures by permuting outofbag observations. Perform calculations in parallel. Compare the estimates using a bar graph.
options = statset('UseParallel',true); imp = oobPermutedPredictorImportance(Mdl,'Options',options); figure; bar(imp); title('OutofBag Permuted Predictor Importance Estimates'); ylabel('Estimates'); xlabel('Predictors'); h = gca; h.XTickLabel = Mdl.PredictorNames; h.XTickLabelRotation = 45; h.TickLabelInterpreter = 'none';
Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers.
In this case, Model_Year
is the most important predictor, followed by Cylinders
. Compare these results to the results in Estimate Importance of Predictors.
Specify optional commaseparated pairs of Name,Value
arguments.
Name
is the argument
name and Value
is the corresponding
value. Name
must appear
inside single quotes (' '
).
You can specify several name and value pair
arguments in any order as Name1,Value1,...,NameN,ValueN
.
'Surrogate','on','NVarToSample','all'
specifies
a template with surrogate splits, and uses all available predictors
at each split.'MaxNumSplits'
— Maximal number of decision splitsMaximal number of decision splits (or branch nodes) per tree,
specified as the commaseparated pair consisting of 'MaxNumSplits'
and
a positive integer. templateTree
splits MaxNumSplits
or
fewer branch nodes. For more details on splitting behavior, see Algorithms.
For bagged decision trees and decision tree binary learners
in ECOC models, the default is size(X,1)  1
. For
boosted decision trees, the default is 1
.
Example: 'MaxNumSplits',5
Data Types: single
 double
'MergeLeaves'
— Leaf merge flag'off'
 'on'
Leaf merge flag, specified as the commaseparated pair consisting
of 'MergeLeaves'
and either 'on'
or 'off'
.
When 'on'
, the decision tree merges leaves
that originate from the same parent node, and that provide a sum of
risk values greater or equal to the risk associated with the parent
node. When 'off'
, the decision tree does not merge
leaves.
For ensembles models, the default is 'off'
.
For decision tree binary learners in ECOC models, the default is 'on'
.
Example: 'MergeLeaves','on'
'MinLeafSize'
— Minimum observations per leafMinimum observations per leaf, specified as the commaseparated
pair consisting of 'MinLeafSize'
and a positive
integer value. Each leaf has at least MinLeafSize
observations
per tree leaf. If you supply both MinParentSize
and MinLeafSize
,
the decision tree uses the setting that gives larger leaves: MinParentSize
= max(MinParentSize,2*MinLeafSize)
.
For boosted and bagged decision trees, the defaults are 1
for
classification and 5
for regression. For decision
tree binary learners in ECOC models, the default is 1
.
Example: 'MinLeafSize',2
'MinParentSize'
— Minimum observations per branch nodeMinimum observations per branch node, specified as the commaseparated
pair consisting of 'MinParentSize'
and a positive
integer value. Each branch node in the tree has at least MinParentSize
observations.
If you supply both MinParentSize
and MinLeafSize
,
the decision tree uses the setting that gives larger leaves: MinParentSize
= max(MinParentSize,2*MinLeafSize)
.
For boosted and bagged decision trees, the defaults are 2
for
classification and 10
for regression. For decision
tree binary learners in ECOC models, the default is 10
.
Example: 'MinParentSize',4
'NumVariablesToSample'
— Number of predictors to select at random for each split'all'
Number of predictors to select at random for each split, specified
as the commaseparated pair consisting of 'NumVariablesToSample'
and
a positive integer value. Alternatively, you can specify 'all'
to
use all available predictors.
For boosted decision trees and decision tree binary learners
in ECOC models models, the default is 'all'
. The
default for bagged decision trees is the square root of the number
of predictors for classification, or one third of predictors for regression.
Example: 'NumVariablesToSample',3
'PredictorSelection'
— Algorithm used to select the best split predictor'allsplits'
(default)  'curvature'
 'interactioncurvature'
Algorithm used to select the best split predictor at each node,
specified as the commaseparated pair consisting of 'PredictorSelection'
and
a value in this table.
Value  Description 

'allsplits'  Standard CART — Selects the split predictor that maximizes the splitcriterion gain over all possible splits of all predictors [1]. 
'curvature'  Curvature test — Selects the split predictor that minimizes the pvalue of chisquare tests of independence between each predictor and the response [3][4]. Training speed is similar to standard CART. 
'interactioncurvature'  Interaction test — Chooses the split predictor that minimizes the pvalue of chisquare tests of independence between each predictor and the response, and that minimizes the pvalue of a chisquare test of independence between each pair of predictors and response [3]. Training speed can be slower than standard CART. 
For 'curvature'
and 'interactioncurvature'
,
if all tests yield pvalues greater than 0.05,
then MATLAB^{®} stops splitting nodes.
Tip

Example: 'PredictorSelection','curvature'
Data Types: char
'Prune'
— Flag to estimate optimal sequence of pruned subtrees'off'
(default)  'on'
Flag to estimate the optimal sequence of pruned subtrees, specified
as the commaseparated pair consisting of 'Prune'
and 'on'
or 'off'
.
If Prune
is 'on'
, then
the software trains the classification tree learners without pruning
them, but estimates the optimal sequence of pruned subtrees for each
learner in the ensemble or decision tree binary learner in ECOC models.
Otherwise, the software trains the classification tree learners without
estimating the optimal sequence of pruned subtrees.
For ensembles, the default is 'off'
.
For decision tree binary learners in ECOC models, then the default
is 'on'
.
Example: 'Prune','on'
'PruneCriterion'
— Pruning criterion'error'
 'impurity'
 'mse'
Pruning criterion, specified as the commaseparated pair consisting
of 'PruneCriterion'
and a pruning criterion valid
for the tree type.
For classification trees, you can specify 'error'
(default)
or 'impurity'
.
For regression, you can only specify 'mse'
(default).
Example: 'PruneCriterion','impurity'
'SplitCriterion'
— Split criterion'gdi'
 'twoing'
 'deviance'
 'mse'
Split criterion, specified as the commaseparated pair consisting
of 'SplitCriterion'
and a split criterion valid
for the tree type.
For classification trees:
'gdi'
for Gini's diversity index
(default)
'twoing'
for the twoing rule
'deviance'
for maximum deviance
reduction (also known as cross entropy)
For regression trees:
'mse'
for mean squared error (default)
Example: 'SplitCriterion','deviance'
'Surrogate'
— Surrogate decision splits'off'
(default)  'on'
 'all'
 positive integer valueSurrogate decision splits flag, specified as the commaseparated
pair consisting of 'Surrogate'
and one of 'off'
, 'on'
, 'all'
,
or a positive integer value.
When 'off'
, the decision tree does
not find surrogate splits at the branch nodes.
When 'on'
, the decision tree finds
at most 10 surrogate splits at each branch node.
When set to 'all'
, the decision
tree finds all surrogate splits at each branch node. The 'all'
setting
can consume considerable time and memory.
When set to a positive integer value, the decision tree finds at most the specified number of surrogate splits at each branch node.
Use surrogate splits to improve the accuracy of predictions for data with missing values. This setting also lets you compute measures of predictive association between predictors.
Example: 'Surrogate','on'
'AlgorithmForCategorical'
— Algorithm for best categorical predictor split'Exact'
 'PullLeft'
 'PCA'
 'OVAbyClass'
Algorithm to find the best split on a categorical predictor
for data with C categories for data and K ≥
3 classes, specified as the commaseparated pair consisting of 'AlgorithmForCategorical'
and
one of the following.
'Exact'  Consider all 2^{C–1} – 1 combinations. 
'PullLeft'  Start with all C categories on the right branch. Consider moving each category to the left branch as it achieves the minimum impurity for the K classes among the remaining categories. From this sequence, choose the split that has the lowest impurity. 
'PCA'  Compute a score for each category using the inner product between the first principal component of a weighted covariance matrix (of the centered class probability matrix) and the vector of class probabilities for that category. Sort the scores in ascending order, and consider all C — 1 splits. 
'OVAbyClass'  Start with all C categories on the right branch. For each class, order the categories based on their probability for that class. For the first class, consider moving each category to the left branch in order, recording the impurity criterion at each move. Repeat for the remaining classes. From this sequence, choose the split that has the minimum impurity. 
ClassificationTree
selects the optimal subset
of algorithms for each split using the known number of classes and
levels of a categorical predictor. For two classes, ClassificationTree
always
performs the exact search. Use the 'AlgorithmForCategorical'
namevalue
pair argument to specify a particular algorithm.
Example: 'AlgorithmForCategorical','PCA'
'MaxNumCategories'
— Maximum category levels in split node10
(default)  nonnegative scalar valueMaximum category levels in the split node, specified as the
commaseparated pair consisting of 'MaxNumCategories'
and
a nonnegative scalar value. ClassificationTree
splits
a categorical predictor using the exact search algorithm if the predictor
has at most MaxNumCategories
levels in the split
node. Otherwise, ClassificationTree
finds the best
categorical split using one of the inexact algorithms. Note that passing
a small value can increase computation time and memory overload.
Example: 'MaxNumCategories',8
'QuadraticErrorTolerance'
— Quadratic error tolerance1e6
(default)  nonnegative scalar valueQuadratic error tolerance per node, specified as the comma —separated
pair consisting of 'QuadraticErrorTolerance'
and
a nonnegative scalar value. RegressionTree
stops
splitting nodes when the quadratic error per node drops below QuadraticErrorTolerance*QED
,
where QED
is the quadratic error for the entire
data computed before the decision tree is grown. QED = NORM(Y
 YBAR)
, where YBAR
is estimated as the
average of the input array Y
.
Example: 'QuadraticErrorTolerance',1e4
t
— Decision tree template for classification or regressionDecision tree template for classification or regression suitable
for training ensembles or errorcorrecting output code (ECOC) multiclass
models, returned as a template object. Pass t
to fitensemble
or fitcecoc
to
specify how to create the decision tree for the ensemble or ECOC model,
respectively.
If you display t
in the Command Window, then
all unspecified options appear empty ([]
). However,
the software replaces empty options with their corresponding default
values during training.
To accommodate MaxNumSplits
, the
software splits all nodes in the current layer,
and then counts the number of branch nodes. A layer is the set of
nodes that are equidistant from the root node. If the number of branch
nodes exceeds MaxNumSplits
, then the software follows
this procedure.
Determine how many branch nodes in the current layer
need to be unsplit so that there would be at most MaxNumSplits
branch
nodes.
Sort the branch nodes by their impurity gains.
Unsplit the desired number of least successful branches.
Return the decision tree grown so far.
This procedure aims at producing maximally balanced trees.
The software splits branch nodes layer by layer until at least one of these events occurs.
There are MaxNumSplits
+ 1 branch
nodes.
A proposed split causes the number of observations
in at least one branch node to be fewer than MinParentSize
.
A proposed split causes the number of observations
in at least one leaf node to be fewer than MinLeafSize
.
The algorithm cannot find a good split within a layer
(i.e., the pruning criterion (see PruneCriterion
),
does not improve for all proposed splits in a layer). A special case
of this event is when all nodes are pure (i.e., all observations in
the node have the same class).
For values 'curvature'
or 'interactioncurvature'
of PredictorSelection
,
all tests yield pvalues greater than 0.05.
MaxNumSplits
and MinLeafSize
do
not affect splitting at their default values. Therefore, if you set 'MaxNumSplits'
,
then splitting might stop due to the value of MinParentSize
before MaxNumSplits
splits
occur.
For details on selecting split predictors and nodesplitting algorithms when growing decision trees, see Algorithms for classification trees and Algorithms for regression trees.
[1] Breiman, L., J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Boca Raton, FL: CRC Press, 1984.
[2] Coppersmith, D., S. J. Hong, and J. R. M. Hosking. "Partitioning Nominal Attributes in Decision Trees." Data Mining and Knowledge Discovery, Vol. 3, 1999, pp. 197–217.
[3] Loh, W.Y. "Regression Trees with Unbiased Variable Selection and Interaction Detection." Statistica Sinica, Vol. 12, 2002, pp. 361–386.
[4] Loh, W.Y. and Y.S. Shih. "Split Selection Methods for Classification Trees." Statistica Sinica, Vol. 7, 1997, pp. 815–840.
ClassificationTree
 fitcecoc
 fitctree
 fitensemble
 RegressionTree
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
You can also select a location from the following list: