Documentation Center

  • Trial Software
  • Product Updates

templateTree

Create classification template

Syntax

Description

example

t = templateTree returns a learner template suitable to use in the fitensemble function.

example

t = templateTree(Name,Value) creates a template with additional options specified by one or more name-value pair arguments. For example, you can specify the algorithm used to find the best split on a categorical predictor, the split criterion, or the number of predictors selected for each split.

Examples

expand all

Create a Classification Template with Surrogate Splits

Create a decision tree template with surrogate splits, and use the template to train an ensemble using sample data.

Create a decision tree template with surrogate splits.

t = templateTree('Surrogate','on')
t = 

Fit template for Tree.
    surrogate: 'on'

Load the sample data. Use the template to train an ensemble using the sample data.

load fisheriris;
ens = fitensemble(meas,species,'AdaBoostM2',100,t)
ens = 

  classreg.learning.classif.ClassificationEnsemble
          PredictorNames: {'x1'  'x2'  'x3'  'x4'}
            ResponseName: 'Y'
              ClassNames: {1x3 cell}
          ScoreTransform: 'none'
         NumObservations: 150
              NumTrained: 100
                  Method: 'AdaBoostM2'
            LearnerNames: {'Tree'}
    ReasonForTermination: 'Terminated normally after co...'
                 FitInfo: [100x1 double]
      FitInfoDescription: {2x1 cell}


  Properties, Methods

Input Arguments

expand all

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'Surrogate','on','NVarToSample','all' specifies a template with surrogate splits, and uses all available predictors at each split.

For Classification Trees and Regression Trees

'MergeLeaves' — Leaf merge flag'off' (default) | 'on'

Leaf merge flag, specified as the comma-separated pair consisting of 'MergeLeaves' and either 'on' or 'off'.

When 'on', the decision tree merges leaves that originate from the same parent node, and that give a sum of risk values greater or equal to the risk associated with the parent node. When 'off', the decision tree does not merge leaves.

Example: 'MergeLeaves','on'

'MinLeaf' — Minimum observations per leafpositive integer value

Minimum observations per leaf, specified as the comma-separated pair consisting of 'MinLeaf' and a positive integer value. Each leaf has at least MinLeaf observations per tree leaf. If you supply both MinParent and MinLeaf, the decision tree uses the setting that gives larger leaves: MinParent = max(MinParent,2*MinLeaf).

For boosting, the default is 1. For bagging, the default is 1 for classification, or 5 for regression.

Example: 'MinLeaf',2

'MinParent' — Minimum observations per branch nodepositive integer value

Minimum observations per branch node, specified as the comma-separated pair consisting of 'MinParent' and a positive integer value. Each branch node in the tree has at least MinParent observations. If you supply both MinParent and MinLeaf, the decision tree uses the setting that gives larger leaves: MinParent = max(MinParent,2*MinLeaf).

For boosting, the default is the number of training observations. For bagging, the default is 2 for classification, or 10 for regression.

Example: 'MinParent',4

'NVarToSample' — Number of predictors selected for each splitpositive integer value | 'all'

Number of predictors selected at random for each split, specified as the comma-separated pair consisting of 'NVarToSample' and a positive integer value. Alternatively, you can specify 'all' to use all available predictors. The default for boosting is 'all'. The default for bagging is the square root of the number of predictors for classification, or one third of predictors for regression.

Example: 'NVarToSample',3

'Prune' — Pruning flag'off' (default) | 'on'

Pruning flag, specified as the comma-separated pair consisting of 'Prune' and either 'on' or 'off'. When 'on', fitensemble grows unpruned trees and computes the optimal sequence of pruned subtrees for each tree. When 'off', fitensemble grows the tree without pruning.

Example: 'Prune','on'

'PruneCriterion' — Pruning criterion'error' | 'impurity' | 'mse'

Pruning criterion, specified as the comma-separated pair consisting of 'PruneCriterion' and a pruning criterion string valid for the tree type.

  • For classification trees, you can specify either 'error' (default) or 'impurity'.

  • For regression, you can only specify 'mse'(default).

Example: 'PruneCriterion','impurity'

'SplitCriterion' — Split criterion'gdi' | 'twoing' | 'deviance' | 'mse'

Split criterion, specified as the comma-separated pair consisting of 'SplitCriterion' and a split criterion string valid for the tree type.

  • For classification trees:

    • 'gdi' for Gini's diversity index (default)

    • 'twoing' for the twoing rule

    • 'deviance' for maximum deviance reduction (also known as cross entropy)

  • For regression trees:

    • 'mse' for mean squared error (default)

Example: 'SplitCriterion','deviance'

'Surrogate' — Surrogate decision splits'off' (default) | 'on' | 'all' | positive integer value

Surrogate decision splits flag, specified as the comma-separated pair consisting of 'Surrogate' and one of 'off', 'on', 'all', or a positive integer value.

  • When 'off', the decision tree does not find surrogate splits at the branch nodes.

  • When 'on', the decision tree finds at most 10 surrogate splits at each branch node.

  • When set to 'all', the decision tree finds all surrogate splits at each branch node. The 'all' setting can use considerable time and memory.

  • When set to a positive integer value, the decision tree finds at most the specified number of surrogate splits at each branch node.

Use surrogate splits to improve the accuracy of predictions for data with missing values. The setting also lets you compute measures of predictive association between predictors.

Example: 'Surrogate','on'

For Classification Trees Only

'AlgorithmForCategorical' — Algorithm for best categorical predictor split'Exact' | 'PullLeft' | 'PCA' | 'OVAbyClass'

Algorithm to find the best split on a categorical predictor for data with C categories for data and K ≥ 3 classes, specified as the comma-separated pair consisting of 'AlgorithmForCategorical' and one of the following.

'Exact'Consider all 2C–1 – 1 combinations.
'PullLeft'Start with all C categories on the right branch. Consider moving each category to the left branch as it achieves the minimum impurity for the K classes among the remaining categories. From this sequence, choose the split that has the lowest impurity.
'PCA'Compute a score for each category using the inner product between the first principal component of a weighted covariance matrix (of the centered class probability matrix) and the vector of class probabilities for that category. Sort the scores in ascending order, and consider all C — 1 splits.
'OVAbyClass'Start with all C categories on the right branch. For each class, order the categories based on their probability for that class. For the first class, consider moving each category to the left branch in order, recording the impurity criterion at each move. Repeat for the remaining classes. From this sequence, choose the split that has the minimum impurity.

ClassificationTree selects the optimal subset of algorithms for each split using the known number of classes and levels of a categorical predictor. For two classes, ClassificationTree always performs the exact search. Use the 'AlgorithmForCategorical' name-value pair argument to specify a particular algorithm.

Example: 'AlgorithmForCategorical','PCA'

'MaxCat' — Maximum category levels in split node10 (default) | nonnegative scalar value

Maximum category levels in the split node, specified as the comma-separated pair consisting of 'MaxCat' and a nonnegative scalar value. ClassificationTree splits a categorical predictor using the exact search algorithm if the predictor has at most MaxCat levels in the split node. Otherwise, ClassificationTree finds the best categorical split using one of the inexact algorithms. Note that passing a small value can increase computation time and memory overload.

Example: 'MaxCat',8

For Regression Trees Only

'QEToler' — Quadratic error tolerance1e-6 (default) | nonnegative scalar value

Quadratic error tolerance per node, specified as the comma —separated pair consisting of 'QEToler' and a nonnegative scalar value. RegressionTree stops splitting nodes when the quadratic error per node drops below QETolr * QED, where QED is the quadratic error for the entire data computed before the decision tree is grown. QED = NORM(Y - YBAR), where YBAR is estimated as the average of the input array Y.

Example: 'QEToler',1e-4

Output Arguments

expand all

t — Decision tree learner templatedecision tree template object

Decision tree learner template, returned as a decision tree template object suitable to use in the fitensemble function. In an ensemble, t specifies how to grow the decision trees.

References

[1] Coppersmith, D., S. J. Hong, and J. R. M. Hosking. "Partitioning Nominal Attributes in Decision Trees." Data Mining and Knowledge Discovery, Vol. 3, 1999, pp. 197–217.

See Also

| | |

Was this topic helpful?