t = templateTree(Name,Value) creates
a template with additional options specified by one or more name-value
pair arguments. For example, you can specify the algorithm used to
find the best split on a categorical predictor, the split criterion,
or the number of predictors selected for each split.

Specify optional comma-separated pairs of Name,Value arguments.
Name is the argument
name and Value is the corresponding
value. Name must appear
inside single quotes (' ').
You can specify several name and value pair
arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'Surrogate','on','NVarToSample','all' specifies
a template with surrogate splits, and uses all available predictors
at each split.

Leaf merge flag, specified as the comma-separated pair consisting
of 'MergeLeaves' and either 'on' or 'off'.

When 'on', the decision tree merges leaves
that originate from the same parent node, and that give a sum of risk
values greater or equal to the risk associated with the parent node.
When 'off', the decision tree does not merge leaves.

Minimum observations per leaf, specified as the comma-separated
pair consisting of 'MinLeaf' and a positive integer
value. Each leaf has at least MinLeaf observations
per tree leaf. If you supply both MinParent and MinLeaf,
the decision tree uses the setting that gives larger leaves: MinParent
= max(MinParent,2*MinLeaf).

For boosting, the default is 1. For bagging,
the default is 1 for classification, or 5 for
regression.

Minimum observations per branch node, specified as the comma-separated
pair consisting of 'MinParent' and a positive integer
value. Each branch node in the tree has at least MinParent observations.
If you supply both MinParent and MinLeaf,
the decision tree uses the setting that gives larger leaves: MinParent
= max(MinParent,2*MinLeaf).

For boosting, the default is the number of training observations.
For bagging, the default is 2 for classification,
or 10 for regression.

Number of predictors selected at random for each split, specified
as the comma-separated pair consisting of 'NVarToSample' and
a positive integer value. Alternatively, you can specify 'all' to
use all available predictors. The default for boosting is 'all'.
The default for bagging is the square root of the number of predictors
for classification, or one third of predictors for regression.

Pruning flag, specified as the comma-separated pair consisting
of 'Prune' and either 'on' or 'off'.
When 'on', fitensemble grows
unpruned trees and computes the optimal sequence of pruned subtrees
for each tree. When 'off', fitensemble grows
the tree without pruning.

Surrogate decision splits flag, specified as the comma-separated
pair consisting of 'Surrogate' and one of 'off', 'on', 'all',
or a positive integer value.

When 'off', the decision tree does
not find surrogate splits at the branch nodes.

When 'on', the decision tree finds
at most 10 surrogate splits at each branch node.

When set to 'all', the decision
tree finds all surrogate splits at each branch node. The 'all' setting
can use considerable time and memory.

When set to a positive integer value, the decision
tree finds at most the specified number of surrogate splits at each
branch node.

Use surrogate splits to improve the accuracy of predictions
for data with missing values. The setting also lets you compute measures
of predictive association between predictors.

Algorithm to find the best split on a categorical predictor
for data with C categories for data and K ≥
3 classes, specified as the comma-separated pair consisting of 'AlgorithmForCategorical' and
one of the following.

'Exact'

Consider all 2^{C–1} –
1 combinations.

'PullLeft'

Start with all C categories on the right
branch. Consider moving each category to the left branch as it achieves
the minimum impurity for the K classes among the
remaining categories. From this sequence, choose the split that has
the lowest impurity.

'PCA'

Compute a score for each category using the inner product between
the first principal component of a weighted covariance matrix (of
the centered class probability matrix) and the vector of class probabilities
for that category. Sort the scores in ascending order, and consider
all C — 1 splits.

'OVAbyClass'

Start with all C categories on the right
branch. For each class, order the categories based on their probability
for that class. For the first class, consider moving each category
to the left branch in order, recording the impurity criterion at each
move. Repeat for the remaining classes. From this sequence, choose
the split that has the minimum impurity.

ClassificationTree selects the optimal subset
of algorithms for each split using the known number of classes and
levels of a categorical predictor. For two classes, ClassificationTree always
performs the exact search. Use the 'AlgorithmForCategorical' name-value
pair argument to specify a particular algorithm.

Maximum category levels in the split node, specified as the
comma-separated pair consisting of 'MaxCat' and
a nonnegative scalar value. ClassificationTree splits
a categorical predictor using the exact search algorithm if the predictor
has at most MaxCat levels in the split node. Otherwise, ClassificationTree finds
the best categorical split using one of the inexact algorithms. Note
that passing a small value can increase computation time and memory
overload.

Quadratic error tolerance per node, specified as the comma —separated
pair consisting of 'QEToler' and a nonnegative
scalar value. RegressionTree stops splitting nodes
when the quadratic error per node drops below QETolr * QED,
where QED is the quadratic error for the entire
data computed before the decision tree is grown. QED = NORM(Y
- YBAR), where YBAR is estimated as the
average of the input array Y.

Decision tree learner template, returned as a decision tree
template object suitable to use in the fitensemble function.
In an ensemble, t specifies how to grow the decision
trees.

References

[1] Coppersmith, D., S. J. Hong, and J. R.
M. Hosking. "Partitioning Nominal Attributes in Decision Trees." Data
Mining and Knowledge Discovery, Vol. 3, 1999, pp. 197–217.