Documentation

templateTree

Create decision tree template

Syntax

Description

example

t = templateTree returns a default decision tree learner template suitable for training ensembles or error-correcting output code (ECOC) multiclass models. Specify t as a learner using:

If you specify a default decision tree template, then the software uses default values for all input arguments during training. It is good practice to specify the type of decision tree, e.g., for a classification tree template, specify 'Type','classification'. If you specify the type of decision tree and display t in the Command Window, then all options except Type appear empty ([]).

example

t = templateTree(Name,Value) creates a template with additional options specified by one or more name-value pair arguments.

For example, you can specify the algorithm used to find the best split on a categorical predictor, the split criterion, or the number of predictors selected for each split.

If you display t in the Command Window, then all options appear empty ([]), except those that you specify using name-value pair arguments. During training, the software uses default values for empty options.

Examples

collapse all

Create a Classification Template with Surrogate Splits

Create a decision tree template with surrogate splits, and use the template to train an ensemble using sample data.

Load Fisher's iris data set.

load fisheriris

Create a decision tree template with surrogate splits.

t = templateTree('Surrogate','on')
t = 

Fit template for Tree.
    Surrogate: 'on'

All options of the template object are empty except for Surrogate. When you pass t to the training function, the software fills in the empty options with their respective default values.

Specify t as a weak learner for a classification ensemble.

Mdl = fitensemble(meas,species,'AdaBoostM2',100,t)
Mdl = 

  classreg.learning.classif.ClassificationEnsemble
          PredictorNames: {'x1'  'x2'  'x3'  'x4'}
            ResponseName: 'Y'
              ClassNames: {'setosa'  'versicolor'  'virginica'}
          ScoreTransform: 'none'
         NumObservations: 150
              NumTrained: 100
                  Method: 'AdaBoostM2'
            LearnerNames: {'Tree'}
    ReasonForTermination: 'Terminated normally after completing the reques...'
                 FitInfo: [100x1 double]
      FitInfoDescription: {2x1 cell}


Display the in-sample (resubstitution) misclassification error.

L = resubLoss(Mdl)
L =

    0.0333

Train a Regression Ensemble

Use a trained, boosted regression tree ensemble to predict the fuel economy of a car. Choose the number of cylinders, volume displaced by the cylinders, horsepower, and weight as predictors.

Load the carsmall data set. Set the predictors to X.

load carsmall
X = [Cylinders,Displacement,Horsepower,Weight];
xnames = {'Cylinders','Displacement','Horsepower','Weight'};

Specify a regression tree template that uses surrogate splits to impove predictive accuracy in the presence of NaN values.

RegTreeTemp = templateTree('Surrogate','On');

Train the regression tree ensemble using LSBoost and 100 learning cycles.

RegTreeEns = fitensemble(X,MPG,'LSBoost',100,RegTreeTemp,...
    'PredictorNames',xnames);

RegTreeEns is a trained RegressionEnsemble regression ensemble.

Use the trained regression ensemble to predict the fuel economy for a four-cylinder car with a 200-cubic inch displacement, 150 horsepower, and weighing 3000 lbs.

predMPG = predict(RegTreeEns,[4 200 150 3000])
predMPG =

   22.6290

The average fuel economy of a car with these specifications is 21.78 mpg.

Find the Optimal Number of Splits and Trees for an Ensemble

You can control the depth of the trees in an ensemble of decision trees. You can also control the tree depth in an ECOC model containing decision tree binary learners using the MaxNumSplits, MinLeafSize, or MinParentSize name-value pair parameters.

  • When bagging decision trees, fitensemble grows deep decision trees by default. You can grow shallower trees to reduce model complexity or computation time.

  • When boosting decision trees, fitensemble grows stumps (a tree with one split) by default. You can grow deeper trees for better accuracy.

Load the carsmall data set. Specify the variables Acceleration, Displacement, Horsepower, and Weight as predictors, and MPG as the response.

load carsmall
X = [Acceleration Displacement Horsepower Weight];
Y = MPG;

The default values of the tree depth controllers for boosting regression trees are:

  • 1 for MaxNumSplits. This option grows stumps.

  • 5 for MinLeafSize

  • 10 for MinParentSize

To search for the optimal number of splits:

  1. Train a set of ensembles. Exponentially increase the maximum number of splits for subsequent ensembles from stump to at most n - 1 splits. Also, decrease the learning rate for each ensemble from 1 to 0.1.

  2. Cross validate the ensembles.

  3. Estimate the cross-validated mean-squared error (MSE) for each ensemble.

  4. Compare the cross-validated MSEs. The ensemble with the lowest one performs the best, and indicates the optimal maximum number of splits, number of trees, and learning rate for the data set.

Grow and cross validate a deep classification tree and a stump. Specify to use surrogate splits because the data contain missing values. These serve as benchmarks.

MdlDeep = fitrtree(X,Y,'CrossVal','on','MergeLeaves','off',...
    'MinParentSize',1,'Surrogate','on');
MdlStump = fitrtree(X,Y,'MaxNumSplits',1,'CrossVal','on','Surrogate','on');

Train the boosting ensembles using 200 regression trees. Cross validate the ensemble using 10-fold cross validation. Vary the maximum number of splits using the values in the sequence $\{2^0, 2^1,...,2^m\}$, where m is such that $2^m$ is no greater than n - 1. For each variant, adjust the learning rate to each value in the set {0.1, 0.25, 0.5, 1};

n = size(X,1);
m = floor(log2(n - 1));
lr = [0.1 0.25 0.5 1];
maxNumSplits = 2.^(0:m);
numTrees = 250;
Mdl = cell(numel(maxNumSplits),numel(lr));
rng(1); % For reproducibility
for k = 1:numel(lr);
    for j = 1:numel(maxNumSplits);
        t = templateTree('MaxNumSplits',maxNumSplits(j),'Surrogate','on');
        Mdl{j,k} = fitensemble(X,Y,'LSBoost',numTrees,t,...
            'Type','regression','CrossVal','on','LearnRate',lr(k));
    end;
end;

Compute the cross-validated MSE for each ensemble.

kflAll = @(x)kfoldLoss(x,'Mode','cumulative');
errorCell = cellfun(kflAll,Mdl,'Uniform',false);
error = reshape(cell2mat(errorCell),[numTrees numel(maxNumSplits) numel(lr)]);
errorDeep = kfoldLoss(MdlDeep);
errorStump = kfoldLoss(MdlStump);

Plot how the cross-validated classification error behaves as the number of trees in the ensemble increases for a few of the ensembles, the deep tree, and the stump. Plot the curves with respect to learning rate in the same plot, and plot separate plots for varying tree complexities. Choose a subset of tree complexity levels.

mnsPlot = [1 round(numel(maxNumSplits)/2) numel(maxNumSplits)];
figure;
for k = 1:3;
    subplot(2,2,k);
    plot(squeeze(error(:,mnsPlot(k),:)),'LineWidth',2);
    axis tight;
    hold on;
    h = gca;
    plot(h.XLim,[errorDeep errorDeep],'-.b','LineWidth',2);
    plot(h.XLim,[errorStump errorStump],'-.r','LineWidth',2);
    plot(h.XLim,min(min(error(:,mnsPlot(k),:))).*[1 1],'--k');
    h.YLim = [10 50];
    xlabel 'Number of trees';
    ylabel 'Cross-validated MSE';
    title(sprintf('MaxNumSplits = %0.3g', maxNumSplits(mnsPlot(k))));
    hold off;
end;
hL = legend([cellstr(num2str(lr','Learning Rate = %0.2f'));...
        'Deep Tree';'Stump';'Min. MSE']);
hL.Position(1) = 0.6;

Each curve contains a minimum cross-validated MSE occuring at the optimal number of trees in the ensemble.

Identify the maximum number of splits, number of trees, and learning rate that yields the lowest MSE overall.

[minErr minErrIdxLin] = min(error(:));
[idxNumTrees idxMNS idxLR] = ind2sub(size(error),minErrIdxLin);

fprintf('\nMin. MSE = %0.5f',minErr)
fprintf('\nOptimal Parameter Values:\nNum. Trees = %d',idxNumTrees);
fprintf('\nMaxNumSplits = %d\nLearning Rate = %0.2f\n',...
    maxNumSplits(idxMNS),lr(idxLR))
Min. MSE = 17.87423
Optimal Parameter Values:
Num. Trees = 79
MaxNumSplits = 1
Learning Rate = 0.25

Input Arguments

collapse all

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'Surrogate','on','NVarToSample','all' specifies a template with surrogate splits, and uses all available predictors at each split.

For Classification Trees and Regression Trees

'MaxNumSplits' — Maximal number of decision splitspositive integer

Maximal number of decision splits (or branch nodes) per tree, specified as the comma-separated pair consisting of 'MaxNumSplits' and a positive integer. templateTree splits MaxNumSplits or fewer branch nodes. For more details on splitting behavior, see Algorithms.

For bagged decision trees and decision tree binary learners in ECOC models, the default is size(X,1) - 1. For boosted decision trees, the default is 1.

Example: 'MaxNumSplits',5

Data Types: single | double

'MergeLeaves' — Leaf merge flag'off' | 'on'

Leaf merge flag, specified as the comma-separated pair consisting of 'MergeLeaves' and either 'on' or 'off'.

When 'on', the decision tree merges leaves that originate from the same parent node, and that provide a sum of risk values greater or equal to the risk associated with the parent node. When 'off', the decision tree does not merge leaves.

For ensembles models, the default is 'off'. For decision tree binary learners in ECOC models, the default is 'on'.

Example: 'MergeLeaves','on'

'MinLeafSize' — Minimum observations per leafpositive integer value

Minimum observations per leaf, specified as the comma-separated pair consisting of 'MinLeafSize' and a positive integer value. Each leaf has at least MinLeafSize observations per tree leaf. If you supply both MinParentSize and MinLeafSize, the decision tree uses the setting that gives larger leaves: MinParentSize = max(MinParentSize,2*MinLeafSize).

For boosted and bagged decision trees, the defaults are 1 for classification and 5 for regression. For decision tree binary learners in ECOC models, the default is 1.

Example: 'MinLeafSize',2

'MinParentSize' — Minimum observations per branch nodepositive integer value

Minimum observations per branch node, specified as the comma-separated pair consisting of 'MinParentSize' and a positive integer value. Each branch node in the tree has at least MinParentSize observations. If you supply both MinParentSize and MinLeafSize, the decision tree uses the setting that gives larger leaves: MinParentSize = max(MinParentSize,2*MinLeafSize).

For boosted and bagged decision trees, the defaults are 2 for classification and 10 for regression. For decision tree binary learners in ECOC models, the default is 10.

Example: 'MinParentSize',4

'NumVariablesToSample' — Number of predictors to select at random for each splitpositive integer value | 'all'

Number of predictors to select at random for each split, specified as the comma-separated pair consisting of 'NumVariablesToSample' and a positive integer value. Alternatively, you can specify 'all' to use all available predictors.

For boosted decision trees and decision tree binary learners in ECOC models models, the default is 'all'. The default for bagged decision trees is the square root of the number of predictors for classification, or one third of predictors for regression.

Example: 'NumVariablesToSample',3

'Prune' — Flag to estimate optimal sequence of pruned subtrees'off' (default) | 'on'

Flag to estimate the optimal sequence of pruned subtrees, specified as the comma-separated pair consisting of 'Prune' and 'on' or 'off'.

If Prune is 'on', then the software trains the classification tree learners without pruning them, but estimates the optimal sequence of pruned subtrees for each learner in the ensemble or decision tree binary learner in ECOC models. Otherwise, the software trains the classification tree learners without estimating the optimal sequence of pruned subtrees.

For ensembles, the default is 'off'.

For decision tree binary learners in ECOC models, then the default is 'on'.

Example: 'Prune','on'

'PruneCriterion' — Pruning criterion'error' | 'impurity' | 'mse'

Pruning criterion, specified as the comma-separated pair consisting of 'PruneCriterion' and a pruning criterion string valid for the tree type.

  • For classification trees, you can specify 'error' (default) or 'impurity'.

  • For regression, you can only specify 'mse'(default).

Example: 'PruneCriterion','impurity'

'SplitCriterion' — Split criterion'gdi' | 'twoing' | 'deviance' | 'mse'

Split criterion, specified as the comma-separated pair consisting of 'SplitCriterion' and a split criterion string valid for the tree type.

  • For classification trees:

    • 'gdi' for Gini's diversity index (default)

    • 'twoing' for the twoing rule

    • 'deviance' for maximum deviance reduction (also known as cross entropy)

  • For regression trees:

    • 'mse' for mean squared error (default)

Example: 'SplitCriterion','deviance'

'Surrogate' — Surrogate decision splits'off' (default) | 'on' | 'all' | positive integer value

Surrogate decision splits flag, specified as the comma-separated pair consisting of 'Surrogate' and one of 'off', 'on', 'all', or a positive integer value.

  • When 'off', the decision tree does not find surrogate splits at the branch nodes.

  • When 'on', the decision tree finds at most 10 surrogate splits at each branch node.

  • When set to 'all', the decision tree finds all surrogate splits at each branch node. The 'all' setting can consume considerable time and memory.

  • When set to a positive integer value, the decision tree finds at most the specified number of surrogate splits at each branch node.

Use surrogate splits to improve the accuracy of predictions for data with missing values. This setting also lets you compute measures of predictive association between predictors.

Example: 'Surrogate','on'

For Classification Trees Only

'AlgorithmForCategorical' — Algorithm for best categorical predictor split'Exact' | 'PullLeft' | 'PCA' | 'OVAbyClass'

Algorithm to find the best split on a categorical predictor for data with C categories for data and K ≥ 3 classes, specified as the comma-separated pair consisting of 'AlgorithmForCategorical' and one of the following.

'Exact'Consider all 2C–1 – 1 combinations.
'PullLeft'Start with all C categories on the right branch. Consider moving each category to the left branch as it achieves the minimum impurity for the K classes among the remaining categories. From this sequence, choose the split that has the lowest impurity.
'PCA'Compute a score for each category using the inner product between the first principal component of a weighted covariance matrix (of the centered class probability matrix) and the vector of class probabilities for that category. Sort the scores in ascending order, and consider all C — 1 splits.
'OVAbyClass'Start with all C categories on the right branch. For each class, order the categories based on their probability for that class. For the first class, consider moving each category to the left branch in order, recording the impurity criterion at each move. Repeat for the remaining classes. From this sequence, choose the split that has the minimum impurity.

ClassificationTree selects the optimal subset of algorithms for each split using the known number of classes and levels of a categorical predictor. For two classes, ClassificationTree always performs the exact search. Use the 'AlgorithmForCategorical' name-value pair argument to specify a particular algorithm.

Example: 'AlgorithmForCategorical','PCA'

'MaxNumCategories' — Maximum category levels in split node10 (default) | nonnegative scalar value

Maximum category levels in the split node, specified as the comma-separated pair consisting of 'MaxNumCategories' and a nonnegative scalar value. ClassificationTree splits a categorical predictor using the exact search algorithm if the predictor has at most MaxNumCategories levels in the split node. Otherwise, ClassificationTree finds the best categorical split using one of the inexact algorithms. Note that passing a small value can increase computation time and memory overload.

Example: 'MaxNumCategories',8

For Regression Trees Only

'QuadraticErrorTolerance' — Quadratic error tolerance1e-6 (default) | nonnegative scalar value

Quadratic error tolerance per node, specified as the comma —separated pair consisting of 'QuadraticErrorTolerance' and a nonnegative scalar value. RegressionTree stops splitting nodes when the quadratic error per node drops below QuadraticErrorTolerance*QED, where QED is the quadratic error for the entire data computed before the decision tree is grown. QED = NORM(Y - YBAR), where YBAR is estimated as the average of the input array Y.

Example: 'QuadraticErrorTolerance',1e-4

Output Arguments

collapse all

t — Decision tree template for classification or regressiontemplate object

Decision tree template for classification or regression suitable for training ensembles or error-correcting output code (ECOC) multiclass models, returned as a template object. Pass t to fitensemble or fitcecoc to specify how to create the decision tree for the ensemble or ECOC model, respectively.

If you display t in the Command Window, then all unspecified options appear empty ([]). However, the software replaces empty options with their corresponding default values during training.

More About

collapse all

Algorithms

  • To accommodate MaxNumSplits, the software splits all nodes in the current layer, and then counts the number of branch nodes. A layer is the set of nodes that are equidistant from the root node. If the number of branch nodes exceeds MaxNumSplits, then the software follows this procedure.

    1. Determine how many branch nodes in the current layer need to be unsplit so that there would be at most MaxNumSplits branch nodes.

    2. Sort the branch nodes by their impurity gains.

    3. Unsplit the desired number of least successful branches.

    4. Return the decision tree grown so far.

    This procedure aims at producing maximally balanced trees.

  • The software splits branch nodes layer by layer until at least one of these events occurs.

    • There are MaxNumSplits + 1 branch nodes.

    • A proposed split causes the number of observations in at least one branch node to be fewer than MinParentSize.

    • A proposed split causes the number of observations in at least one leaf node to be fewer than MinLeafSize.

    • The algorithm cannot find a good split within a layer (i.e., the pruning criterion (see PruneCriterion), does not improve for all proposed splits in a layer). A special case of this event is when all nodes are pure (i.e., all observations in the node have the same class).

    MaxNumSplits and MinLeafSize do not affect splitting at their default values. Therefore, if you set 'MaxNumSplits', then splitting might stop due to the value of MinParentSize before MaxNumSplits splits occur.

References

[1] Coppersmith, D., S. J. Hong, and J. R. M. Hosking. "Partitioning Nominal Attributes in Decision Trees." Data Mining and Knowledge Discovery, Vol. 3, 1999, pp. 197–217.

Was this topic helpful?