Documentation |
label = predict(tree,X)
[label,score]
= predict(tree,X)
[label,score,node]
= predict(tree,X)
[label,score,node,cnum]
= predict(tree,X)
[label,...] = predict(tree,X,Name,Value)
label = predict(tree,X) returns a vector of predicted class labels for a matrix X, based on tree, a trained full or compact classification tree.
[label,score] = predict(tree,X) returns a matrix of scores, indicating the likelihood that a label comes from a particular class.
[label,score,node] = predict(tree,X) returns a vector of predicted node numbers for the classification, based on tree.
[label,score,node,cnum] = predict(tree,X) returns a vector of predicted class number for the classification, based on tree.
[label,...] = predict(tree,X,Name,Value) returns labels with additional options specified by one or more Name,Value pair arguments.
tree |
A classification tree created by fitctree, or a compact classification tree created by compact. |
X |
A matrix where each row represents an observation, and each column represents a predictor. The number of columns in X must equal the number of predictors in tree. |
Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.
'Subtrees' |
Numeric vector of pruning levels, with 0 representing the full, unpruned tree. To use the Subtrees name-value pair, tree must include a pruning sequence as created by the fitctree or prune methods. If Subtrees has T elements, and X has N rows, then labels is an N-by-T matrix. The ith column of labels contains the fitted values produced by the Subtrees(I) subtree. Similarly, score is an N-by-K-by-T array, and node and cnum are N-by-T matrices. Subtrees must be sorted in ascending order. (To compute fitted values for a tree that is not part of the optimal pruning sequence, first use prune to prune the tree.) Default: 0 |
label |
Vector of class labels of the same type as the response data used in training tree. Each entry of label corresponds to the class with minimal expected cost for the corresponding row of X. See Predicted Class Label. |
score |
Numeric matrix of size N-by-K, where N is the number of observations (rows) in X, and K is the number of classes (in tree.ClassNames). score(i,j) is the posterior probability that row i of X is of class j. |
node |
Numeric vector of node numbers for the predicted classes. Each entry corresponds to the predicted node in tree for the corresponding row of X. |
cnum |
Numeric vector of class numbers corresponding to the predicted labels. Each entry of cnum corresponds to a predicted class number for the corresponding row of X. |
predict classifies so as to minimize the expected classification cost:
$$\widehat{y}=\underset{y=1,\mathrm{...},K}{\mathrm{arg}\mathrm{min}}{\displaystyle \sum _{k=1}^{K}\widehat{P}\left(k|x\right)C\left(y|k\right)},$$
where
$$\widehat{y}$$ is the predicted classification.
K is the number of classes.
$$\widehat{P}\left(k|x\right)$$ is the posterior probability of class k for observation x.
$$C\left(y|k\right)$$ is the cost of classifying an observation as y when its true class is k.
For trees, the score of a classification of a leaf node is the posterior probability of the classification at that node. The posterior probability of the classification at a node is the number of training sequences that lead to that node with the classification, divided by the number of training sequences that lead to that node.
For example, consider classifying a predictor X as true when X < 0.15 or X > 0.95, and X is false otherwise.
Generate 100 random points and classify them:
rng(0,'twister') % for reproducibility X = rand(100,1); Y = (abs(X - .55) > .4); tree = fitctree(X,Y); view(tree,'Mode','Graph')
Prune the tree:
tree1 = prune(tree,'Level',1); view(tree1,'Mode','Graph')
The pruned tree correctly classifies observations that are less than 0.15 as true. It also correctly classifies observations from .15 to .94 as false. However, it incorrectly classifies observations that are greater than .94 as false. Therefore, the score for observations that are greater than .15 should be about .05/.85=.06 for true, and about .8/.85=.94 for false.
Compute the prediction scores for the first 10 rows of X:
[~,score] = predict(tree1,X(1:10)); [score X(1:10,:)]
ans = 0.9059 0.0941 0.8147 0.9059 0.0941 0.9058 0 1.0000 0.1270 0.9059 0.0941 0.9134 0.9059 0.0941 0.6324 0 1.0000 0.0975 0.9059 0.0941 0.2785 0.9059 0.0941 0.5469 0.9059 0.0941 0.9575 0.9059 0.0941 0.9649
Indeed, every value of X (the right-most column) that is less than 0.15 has associated scores (the left and center columns) of 0 and 1, while the other values of X have associated scores of 0.91 and 0.09. The difference (score 0.09 instead of the expected .06) is due to a statistical fluctuation: there are 8 observations in X in the range (.95,1) instead of the expected 5 observations.
There are two costs associated with classification: the true misclassification cost per class, and the expected misclassification cost per observation.
You can set the true misclassification cost per class in the Cost name-value pair when you create the classifier using the fitctree method. Cost(i,j) is the cost of classifying an observation into class j if its true class is i. By default, Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j. In other words, the cost is 0 for correct classification, and 1 for incorrect classification.
There are two costs associated with classification: the true misclassification cost per class, and the expected misclassification cost per observation.
Suppose you have Nobs observations that you want to classify with a trained classifier. Suppose you have K classes. You place the observations into a matrix Xnew with one observation per row.
The expected cost matrix CE has size Nobs-by-K. Each row of CE contains the expected (average) cost of classifying the observation into each of the K classes. CE(n,k) is
$$\sum _{i=1}^{K}\widehat{P}\left(i|Xnew(n)\right)C\left(k|i\right)},$$
where
K is the number of classes.
$$\widehat{P}\left(i|Xnew(n)\right)$$ is the posterior probability of class i for observation Xnew(n).
$$C\left(k|i\right)$$ is the true misclassification cost of classifying an observation as k when its true class is i.
The predictive measure of association between the optimal split on variable i and a surrogate split on variable j is:
$${\lambda}_{i,j}=\frac{\text{min}\left({P}_{L},{P}_{R}\right)-\left(1-{P}_{{L}_{i}{L}_{j}}-{P}_{{R}_{i}{R}_{j}}\right)}{\text{min}\left({P}_{L},{P}_{R}\right)}.$$
Here
P_{L} and P_{R} are the node probabilities for the optimal split of node i into Left and Right nodes respectively.
$${P}_{{L}_{i}{L}_{j}}$$ is the probability that both (optimal) node i and (surrogate) node j send an observation to the Left.
$${P}_{{R}_{i}{R}_{j}}$$ is the probability that both (optimal) node i and (surrogate) node j send an observation to the Right.
Clearly, λ_{i,j} lies from –∞ to 1. Variable j is a worthwhile surrogate split for variable i if λ_{i,j} > 0.
Examine predictions for a few rows in the Fisher iris data.
load fisheriris tree = fitctree(meas,species); X = meas(99:102,:); % take four rows [label score node cnum] = predict(tree,X)
label = 'versicolor' 'versicolor' 'virginica' 'virginica' score = 0 1.0000 0 0 1.0000 0 0 0.0217 0.9783 0 0.0217 0.9783 node = 8 8 5 5 cnum = 2 2 3 3
Examine predictions from pruned trees for the Fisher iris model.
load fisheriris tree = fitctree(meas,species); X = meas(99:102,:); % taking four rows [label score node cnum] = predict(tree,X,'Subtrees',[2 3 4])
label = 'versicolor' 'versicolor' 'setosa' 'versicolor' 'versicolor' 'setosa' 'virginica' 'versicolor' 'setosa' 'virginica' 'versicolor' 'setosa' score(:,:,1) = 0 0.9074 0.0926 0 0.9074 0.0926 0 0.0217 0.9783 0 0.0217 0.9783 score(:,:,2) = 0 0.5000 0.5000 0 0.5000 0.5000 0 0.5000 0.5000 0 0.5000 0.5000 score(:,:,3) = 0.3333 0.3333 0.3333 0.3333 0.3333 0.3333 0.3333 0.3333 0.3333 0.3333 0.3333 0.3333 node = 4 3 1 4 3 1 5 3 1 5 3 1 cnum = 2 2 1 2 2 1 3 2 1 3 2 1
predict generates predictions by following the branches of tree until it reaches a leaf node or a missing value. If predict reaches a leaf node, it returns the classification of that node.
If predict reaches a node with a missing value for a predictor, its behavior depends on the setting of the Surrogate name-value pair when fitctree constructs tree.
Surrogate = 'off' (default) — predict returns the label with the largest number of training samples that reach the node.
Surrogate = 'on' — predict uses the best surrogate split at the node. If all surrogate split variables with positive predictive measure of association are missing, predict returns the label with the largest number of training samples that reach the node. For a definition, see Predictive Measure of Association.