Documentation Center 
Superclasses: CompactClassificationTree
Binary decision tree for classification
A decision tree with binary splits for classification. An object of class ClassificationTree can predict responses for new data with the predict method. The object contains the data used for training, so can compute resubstitution predictions.
tree = fitctree(x,y) returns a classification tree based on the input variables (also known as predictors, features, or attributes) x and output (response) y. tree is a binary tree, where each branching node is split based on the values of a column of x.
tree = fitctree(x,y,Name,Value) fits a tree with additional options specified by one or more Name,Value pair arguments. If you use one of the following five options, tree is of class ClassificationPartitionedModel: 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. Otherwise, tree is of class ClassificationTree.
CategoricalPredictors 
List of categorical predictors, a numeric vector with indices from 1 to p, where p is the number of columns of X. 
CatSplit 
An nby2 cell array, where n is the number of categorical splits in tree. Each row in CatSplit gives left and right values for a categorical split. For each branch node with categorical split j based on a categorical predictor variable z, the left child is chosen if z is in CatSplit(j,1) and the right child is chosen if z is in CatSplit(j,2). The splits are in the same order as nodes of the tree. Find the nodes for these splits by selecting 'categorical' cuts from top to bottom in the CutType property. 
Children 
An nby2 array containing the numbers of the child nodes for each node in tree, where n is the number of nodes. Leaf nodes have child node 0. 
ClassCount 
An nbyk array of class counts for the nodes in tree, where n is the number of nodes and k is the number of classes. For any node number i, the class counts ClassCount(i,:) are counts of observations (from the data used in fitting the tree) from each class satisfying the conditions for node i. 
ClassNames 
List of the elements in Y with duplicates removed. ClassNames can be a categorical array, cell array of strings, character array, logical vector, or a numeric vector. ClassNames has the same data type as the data in the argument Y. 
ClassProb 
An nbyk array of class probabilities for the nodes in tree, where n is the number of nodes and k is the number of classes. For any node number i, the class probabilities ClassProb(i,:) are the estimated probabilities for each class for a point satisfying the conditions for node i. 
Cost 
Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i. 
CutCategories 
An nby2 cell array of the categories used at branches in tree, where n is the number of nodes. For each branch node i based on a categorical predictor variable x, the left child is chosen if x is among the categories listed in CutCategories{i,1}, and the right child is chosen if x is among those listed in CutCategories{i,2}. Both columns of CutCategories are empty for branch nodes based on continuous predictors and for leaf nodes. CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. 
CutPoint 
An nelement vector of the values used as cut points in tree, where n is the number of nodes. For each branch node i based on a continuous predictor variable x, the left child is chosen if x<CutPoint(i) and the right child is chosen if x>=CutPoint(i). CutPoint is NaN for branch nodes based on categorical predictors and for leaf nodes. CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. 
CutType 
An nelement cell array indicating the type of cut at each node in tree, where n is the number of nodes. For each node i, CutType{i} is:
CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. 
CutVar 
An nelement cell array of the names of the variables used for branching in each node in tree, where n is the number of nodes. These variables are sometimes known as cut variables. For leaf nodes, CutVar contains an empty string. CutPoint contains the cut points for 'continuous' cuts, and CutCategories contains the set of categories. 
IsBranch 
An nelement logical vector that is true for each branch node and false for each leaf node of tree. 
ModelParameters 
Parameters used in training tree. 
NumObservations 
Number of observations in the training data, a numeric scalar. NumObservations can be less than the number of rows of input data X when there are missing values in X or response Y. 
NodeClass 
An nelement cell array with the names of the most probable classes in each node of tree, where n is the number of nodes in the tree. Every element of this array is a string equal to one of the class names in ClassNames. 
NodeErr 
An nelement vector of the errors of the nodes in tree, where n is the number of nodes. NodeErr(i) is the misclassification probability for node i. 
NodeProb 
An nelement vector of the probabilities of the nodes in tree, where n is the number of nodes. The probability of a node is computed as the proportion of observations from the original data that satisfy the conditions for the node. This proportion is adjusted for any prior probabilities assigned to each class. 
NodeRisk 
An nelement vector of the risk of the nodes in the tree, where n is the number of nodes. The risk for each node is the measure of impurity (Gini index or deviance) for this node weighted by the node probability. If the tree is grown by twoing, the risk for each node is zero. 
NodeSize 
An nelement vector of the sizes of the nodes in tree, where n is the number of nodes. The size of a node is defined as the number of observations from the data used to create the tree that satisfy the conditions for the node. 
NumNodes 
The number of nodes in tree. 
Parent 
An nelement vector containing the number of the parent node for each node in tree, where n is the number of nodes. The parent of the root node is 0. 
PredictorNames 
Cell array of strings containing the predictor names, in the order which they appear in X. 
Prior 
Numeric vector of prior probabilities for each class. The order of the elements of Prior corresponds to the elements of ClassNames. 
PruneAlpha 
Numeric vector with one element per pruning level. If the pruning level ranges from 0 to M, then PruneAlpha has M + 1 elements sorted in ascending order. PruneAlpha(1) is for pruning level 0 (no pruning), PruneAlpha(2) is for pruning level 1, and so on. 
PruneList 
An nelement numeric vector with the pruning levels in each node of tree, where n is the number of nodes. The pruning levels range from 0 (no pruning) to M, where M is the distance between the deepest leaf and the root node. 
ResponseNames 
String describing the response variable Y. 
ScoreTransform 
Function handle for transforming predicted classification scores, or string representing a builtin transformation function. none means no transformation, or @(x)x. To change the score transformation function to, e.g., function, use dot notation.

SurrCutCategories 
An nelement cell array of the categories used for surrogate splits in tree, where n is the number of nodes in tree. For each node k, SurrCutCategories{k} is a cell array. The length of SurrCutCategories{k} is equal to the number of surrogate predictors found at this node. Every element of SurrCutCategories{k} is either an empty string for a continuous surrogate predictor, or is a twoelement cell array with categories for a categorical surrogate predictor. The first element of this twoelement cell array lists categories assigned to the left child by this surrogate split, and the second element of this twoelement cell array lists categories assigned to the right child by this surrogate split. The order of the surrogate split variables at each node is matched to the order of variables in SurrCutVar. The optimalsplit variable at this node does not appear. For nonbranch (leaf) nodes, SurrCutCategories contains an empty cell. 
SurrCutFlip 
An nelement cell array of the numeric cut assignments used for surrogate splits in tree, where n is the number of nodes in tree. For each node k, SurrCutFlip{k} is a numeric vector. The length of SurrCutFlip{k} is equal to the number of surrogate predictors found at this node. Every element of SurrCutFlip{k} is either zero for a categorical surrogate predictor, or a numeric cut assignment for a continuous surrogate predictor. The numeric cut assignment can be either –1 or +1. For every surrogate split with a numeric cut C based on a continuous predictor variable Z, the left child is chosen if Z<C and the cut assignment for this surrogate split is +1, or if Z≥C and the cut assignment for this surrogate split is –1. Similarly, the right child is chosen if Z≥C and the cut assignment for this surrogate split is +1, or if Z<C and the cut assignment for this surrogate split is –1. The order of the surrogate split variables at each node is matched to the order of variables in SurrCutVar. The optimalsplit variable at this node does not appear. For nonbranch (leaf) nodes, SurrCutFlip contains an empty array. 
SurrCutPoint 
An nelement cell array of the numeric values used for surrogate splits in tree, where n is the number of nodes in tree. For each node k, SurrCutPoint{k} is a numeric vector. The length of SurrCutPoint{k} is equal to the number of surrogate predictors found at this node. Every element of SurrCutPoint{k} is either NaN for a categorical surrogate predictor, or a numeric cut for a continuous surrogate predictor. For every surrogate split with a numeric cut C based on a continuous predictor variable Z, the left child is chosen if Z<C and SurrCutFlip for this surrogate split is –1. Similarly, the right child is chosen if Z≥C and SurrCutFlip for this surrogate split is +1, or if Z<C and SurrCutFlip for this surrogate split is –1. The order of the surrogate split variables at each node is matched to the order of variables returned by SurrCutVar. The optimalsplit variable at this node does not appear. For nonbranch (leaf) nodes, SurrCutPoint contains an empty cell. 
SurrCutType 
An nelement cell array indicating types of surrogate splits at each node in tree, where n is the number of nodes in tree. For each node k, SurrCutType{k} is a cell array with the types of the surrogate split variables at this node. The variables are sorted by the predictive measure of association with the optimal predictor in the descending order, and only variables with the positive predictive measure are included. The order of the surrogate split variables at each node is matched to the order of variables in SurrCutVar. The optimalsplit variable at this node does not appear. For nonbranch (leaf) nodes, SurrCutType contains an empty cell. A surrogate split type can be either 'continuous' if the cut is defined in the form Z<V for a variable Z and cut point V or 'categorical' if the cut is defined by whether Z takes a value in a set of categories. 
SurrCutVar 
An nelement cell array of the names of the variables used for surrogate splits in each node in tree, where n is the number of nodes in tree. Every element of SurrCutVar is a cell array with the names of the surrogate split variables at this node. The variables are sorted by the predictive measure of association with the optimal predictor in the descending order, and only variables with the positive predictive measure are included. The optimalsplit variable at this node does not appear. For nonbranch (leaf) nodes, SurrCutVar contains an empty cell. 
SurrVarAssoc 
An nelement cell array of the predictive measures of association for surrogate splits in tree, where n is the number of nodes in tree. For each node k, SurrVarAssoc{k} is a numeric vector. The length of SurrVarAssoc{k} is equal to the number of surrogate predictors found at this node. Every element of SurrVarAssoc{k} gives the predictive measure of association between the optimal split and this surrogate split. The order of the surrogate split variables at each node is the order of variables in SurrCutVar. The optimalsplit variable at this node does not appear. For nonbranch (leaf) nodes, SurrVarAssoc contains an empty cell. 
W 
The scaled weights, a vector with length n, the number of rows in X. 
X 
A matrix of predictor values. Each column of X represents one variable, and each row represents one observation. 
Y 
A categorical array, cell array of strings, character array, logical vector, or a numeric vector. Each row of Y represents the classification of the corresponding row of X. 
compact  Compact tree 
crossval  Crossvalidated decision tree 
cvloss  Classification error by cross validation 
prune  Produce sequence of subtrees by pruning 
resubEdge  Classification edge by resubstitution 
resubLoss  Classification error by resubstitution 
resubMargin  Classification margins by resubstitution 
resubPredict  Predict resubstitution response of tree 
edge  Classification edge 
loss  Classification error 
margin  Classification margins 
meanSurrVarAssoc  Mean predictive measure of association for surrogate splits in decision tree 
predict  Predict classification 
predictorImportance  Estimates of predictor importance 
view  View tree 
ClassificationTree splits nodes based on either impurity or node error. Impurity means one of several things, depending on your choice of the SplitCriterion namevalue pair argument:
Gini's Diversity Index (gdi) — The Gini index of a node is
where the sum is over the classes i at the node, and p(i) is the observed fraction of classes with class i that reach the node. A node with just one class (a pure node) has Gini index 0; otherwise the Gini index is positive. So the Gini index is a measure of node impurity.
Deviance ('deviance') — With p(i) defined the same as for the Gini index, the deviance of a node is
A pure node has deviance 0; otherwise, the deviance is positive.
Twoing rule ('twoing') — Twoing is not a purity measure of a node, but is a different measure for deciding how to split a node. Let L(i) denote the fraction of members of class i in the left child node after a split, and R(i) denote the fraction of members of class i in the right child node after a split. Choose the split criterion to maximize
where P(L) and P(R) are the fractions of observations that split to the left and right respectively. If the expression is large, the split made each child node purer. Similarly, if the expression is small, the split made each child node similar to each other, and hence similar to the parent node, and so the split did not increase node purity.
Node error — The node error is the fraction of misclassified classes at a node. If j is the class with the largest number of training samples at a node, the node error is
1 – p(j).
Value. To learn how value classes affect copy operations, see Copying Objects in the MATLAB^{®} documentation.
ClassificationEnsemble  CompactClassificationTree  fitctree  predict  RegressionTree