Superclasses: CompactClassificationTree
Binary decision tree for classification
A ClassificationTree
object represents a decision
tree with binary splits for classification. An object of this class
can predict responses for new data using the predict
method.
The object contains the data used for training, so it can also compute
resubstitution predictions.
returns
a fitted binary classification decision tree based on the input variables
(also known as predictors, features, or attributes) contained in the
table tree
= fitctree(TBL
,ResponseVarName
)TBL
and output (response or labels) contained
in ResponseVarName
. The returned binary tree
splits branching nodes based on the values of a column of TBL
.
returns
a fitted binary classification decision tree based on the input variables
contained in the table tree
= fitctree(TBL
,formula
)TBL
. formula
is
a formula that identifies the response and predictor variables in TBL
used
to fit tree
. The returned binary tree splits
branching nodes based on the values of a column of TBL
.
returns
a fitted binary classification decision tree based on the input variables
contained in the table tree
= fitctree(TBL
,Y
)TBL
and output in vector Y
.
The returned binary tree splits branching nodes based on the values
of a column of TBL
.
returns
a fitted binary classification decision tree based on the input variables
contained in matrix tree
= fitctree(X
,Y
)X
and output Y
.
The returned binary tree splits branching nodes based on the values
of a column of X
.
fits
a tree with additional options specified by one or more namevalue
pair arguments, using any of the previous syntaxes. For example, you
can specify the algorithm used to find the best split on a categorical
predictor or grow a crossvalidated tree.tree
= fitctree(___,Name,Value
)

List of categorical predictors, a numeric vector with indices
from 

An nby2 cell array, where n is
the number of categorical splits in 

An nby2 array containing the numbers of
the child nodes for each node in 

An nbyk array of class
counts for the nodes in 

List of the elements in 

An nbyk array of class
probabilities for the nodes in 

Square matrix, where 

An nby2 cell array of the categories used
at branches in


An nelement vector of the values used as
cut points in


An nelement cell array indicating the type
of cut at each node in


An nelement cell array of the names of the
variables used for branching in each node in


Expanded predictor names, stored as a cell array of character vectors. If the model uses encoding for categorical variables, then 

Description of the crossvalidation optimization of hyperparameters,
stored as a


An nelement logical vector that is 

Parameters used in training 

Number of observations in the training data, a numeric scalar. 

An nelement cell array with the names of
the most probable classes in each node of 

An nelement vector of the errors of the
nodes in 

An nelement vector of the probabilities
of the nodes in 

An nelement vector of the risk of the nodes in the tree, where n is the number of nodes. The risk for each node is the measure of impurity (Gini index or deviance) for this node weighted by the node probability. If the tree is grown by twoing, the risk for each node is zero. 

An nelement vector of the sizes of the nodes
in 

The number of nodes in 

An nelement vector containing the number
of the parent node for each node in 

Cell array of character vectors containing the predictor names,
in the order which they appear in 

Numeric vector of prior probabilities for each class. The order
of the elements of 

Numeric vector with one element per pruning level. If the pruning
level ranges from 0 to M, then 

An nelement numeric vector with the pruning
levels in each node of 

A character vector that specifies the name of the response variable
( 

An nelement logical vector indicating which
rows of the original predictor data ( 

Function handle for transforming predicted classification scores, or character vector representing a builtin transformation function.
To change the score transformation function to, e.g.,


An nelement cell array of the categories
used for surrogate splits in 

An nelement cell array of the numeric cut
assignments used for surrogate splits in 

An nelement cell array of the numeric values
used for surrogate splits in 

An nelement cell array indicating types
of surrogate splits at each node in 

An nelement cell array of the names of the
variables used for surrogate splits in each node in 

An nelement cell array of the predictive
measures of association for surrogate splits in 

The scaled 

A matrix of predictor values. Each column of 

A categorical array, cell array of character vectors, character
array, logical vector, or a numeric vector. Each row of 
compact  Compact tree 
crossval  Crossvalidated decision tree 
cvloss  Classification error by cross validation 
prune  Produce sequence of subtrees by pruning 
resubEdge  Classification edge by resubstitution 
resubLoss  Classification error by resubstitution 
resubMargin  Classification margins by resubstitution 
resubPredict  Predict resubstitution response of tree 
compareHoldout  Compare accuracies of two classification models using new data 
edge  Classification edge 
loss  Classification error 
margin  Classification margins 
predict  Predict labels using classification tree 
predictorImportance  Estimates of predictor importance 
surrogateAssociation  Mean predictive measure of association for surrogate splits in decision tree 
view  View tree 
ClassificationTree
splits
nodes based on either impurity or node
error.
Impurity means one of several things, depending on your choice
of the SplitCriterion
namevalue pair argument:
Gini's Diversity Index (gdi
) —
The Gini index of a node is
$$1{\displaystyle \sum _{i}{p}^{2}(i)},$$
where the sum is over the classes i at the
node, and p(i) is the observed
fraction of classes with class i that reach the
node. A node with just one class (a pure node)
has Gini index 0
; otherwise the Gini index is positive.
So the Gini index is a measure of node impurity.
Deviance ('deviance'
) —
With p(i) defined the same as
for the Gini index, the deviance of a node is
$${\displaystyle \sum _{i}p(i)\mathrm{log}p(i)}.$$
A pure node has deviance 0
; otherwise, the
deviance is positive.
Twoing rule ('twoing'
) —
Twoing is not a purity measure of a node, but is a different measure
for deciding how to split a node. Let L(i)
denote the fraction of members of class i in the
left child node after a split, and R(i)
denote the fraction of members of class i in the
right child node after a split. Choose the split criterion to maximize
$$P(L)P(R){\left({\displaystyle \sum _{i}\leftL(i)R(i)\right}\right)}^{2},$$
where P(L) and P(R) are the fractions of observations that split to the left and right respectively. If the expression is large, the split made each child node purer. Similarly, if the expression is small, the split made each child node similar to each other, and hence similar to the parent node, and so the split did not increase node purity.
Node error — The node error is the fraction of misclassified classes at a node. If j is the class with the largest number of training samples at a node, the node error is
1 – p(j).
Value. To learn how value classes affect copy operations, see Copying Objects in the MATLAB^{®} documentation.
[1] Breiman, L., J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Boca Raton, FL: CRC Press, 1984.
ClassificationEnsemble
 CompactClassificationTree
 fitctree
 predict
 RegressionTree