ClassificationTree.template

Class: ClassificationTree

Create classification template (to be removed)

ClassificationTree.template will be removed in a future release. Use templateTree instead.

Syntax

t = ClassificationTree.template
t = ClassificationTree.template(Name,Value)

Description

t = ClassificationTree.template returns a learner template suitable to use in the fitensemble function.

t = ClassificationTree.template(Name,Value) creates a template with additional options specified by one or more Name,Value pair arguments. You can specify several name-value pair arguments in any order as Name1,Value1,…,NameN,ValueN.

Input Arguments

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'AlgorithmForCategorical'

Algorithm to find the best split on a categorical predictor for data with K = 3 or more classes. The available algorithms are:

'Exact'For a categorical predictor with C categories, consider all 2C — 1 — 1 combinations.
'PullLeft'Start with all C categories on the right branch. Consider moving each category to the left branch as it achieves the minimum impurity for the K classes among the remaining categories. Out of this sequence, choose the split that has the lowest impurity.
'PCA'Compute a score for each category using the inner product between the first principal component of a weighted covariance matrix (of the centered class probability matrix) and the vector of class probabilities for that category. Sort the scores in ascending order, and consider all C — 1 splits.
'OVAbyClass'Start with all C categories on the right branch. For each class, order the categories based on their probability for that class. For the first class, consider moving each category to the left branch in order, recording the impurity criterion at each move. Repeat for the remaining classes. Out of this sequence, choose the split that has the minimum impurity.

Default: ClassificationTree selects the optimal subset of algorithms for each split using the known number of classes and levels of a categorical predictor. For two classes, ClassificationTree always performs the exact search.

'MaxCat'

ClassificationTree splits a categorical predictor using the exact search algorithm if the predictor has at most MaxCat levels in the split node. Otherwise, ClassificationTree finds the best categorical split using one of the inexact algorithms.

Specify MaxCat as a numeric nonnegative scalar value. Passing a small value can lead to long computation time and memory overload.

Default: 10

'MergeLeaves'

String that specifies whether to merge leaves after the tree is grown. Values are 'on' or 'off'.

When 'on', ClassificationTree merges leaves that originate from the same parent node, and that give a sum of risk values greater or equal to the risk associated with the parent node. When 'off', ClassificationTree does not merge leaves.

Default: 'off'

'MinLeaf'

Each leaf has at least MinLeaf observations per tree leaf. If you supply both MinParent and MinLeaf, ClassificationTree uses the setting that gives larger leaves: MinParent=max(MinParent,2*MinLeaf).

Default: Half the number of training observations for boosting, 1 for bagging

'MinParent'

Each branch node in the tree has at least MinParent observations. If you supply both MinParent and MinLeaf, ClassificationTree uses the setting that gives larger leaves: MinParent=max(MinParent,2*MinLeaf).

Default: Number of training observations for boosting, 2 for bagging

'NVarToSample'

Number of predictors to select at random for each split. Can be a positive integer or 'all', which means use all available predictors.

Default: 'all' for boosting, square root of number of predictors for bagging

'Prune'

When 'on', ClassificationTree grows the classification tree and computes the optimal sequence of pruned subtrees. When 'off' ClassificationTree grows the tree without pruning.

Default: 'off'

'PruneCriterion'

String with the pruning criterion, either 'error' or 'impurity'.

Default: 'error'

'SplitCriterion'

Criterion for choosing a split. One of 'gdi' (Gini's diversity index), 'twoing' for the twoing rule, or 'deviance' for maximum deviance reduction (also known as cross entropy).

Default: 'gdi'

'Surrogate'

String describing whether to find surrogate decision splits at each branch node. Specify as 'on', 'off', 'all', or a positive scalar value.

  • When 'on', ClassificationTree finds at most 10 surrogate splits at each branch node.

  • When set to a positive integer value, ClassificationTree finds at most the specified number of surrogate splits at each branch node.

  • When set to 'all', ClassificationTree finds all surrogate splits at each branch node. The 'all' setting can use much time and memory.

Use surrogate splits to improve the accuracy of predictions for data with missing values. The setting also enables you to compute measures of predictive association between predictors.

Default: 'off'

Output Arguments

t

Classification tree template suitable to use in the fitensemble function. In an ensemble, t specifies how to grow the classification trees.

Examples

expand all

Construct a Classification Template with Surrogate Splits

Create a classification template with surrogate splits, and train an ensemble for the Fisher iris model with the template.

t = ClassificationTree.template('surrogate','on');
load fisheriris
ens = fitensemble(meas,species,'AdaBoostM2',100,t);

References

[1] Coppersmith, D., S. J. Hong, and J. R. M. Hosking. "Partitioning Nominal Attributes in Decision Trees." Data Mining and Knowledge Discovery, Vol. 3, 1999, pp. 197–217.

Was this topic helpful?