label = predict(tree,X)
[label,score]
= predict(tree,X)
[label,score,node]
= predict(tree,X)
[label,score,node,cnum]
= predict(tree,X)
[label,...] = predict(tree,X,Name,Value)
returns
a vector of predicted class labels for a matrix label
= predict(tree
,X
)X
,
based on tree
, a trained full or compact classification
tree.
[
returns
a matrix of scores, indicating the likelihood that a label comes from
a particular class.label
,score
]
= predict(tree
,X
)
[
returns
a vector of predicted node numbers for the classification, based on label
,score
,node
]
= predict(tree
,X
)tree
.
[
returns
a vector of predicted class number for the classification, based on label
,score
,node
,cnum
]
= predict(tree
,X
)tree
.
[
returns
labels with additional options specified by one or more label
,...] = predict(tree
,X
,Name,Value
)Name,Value
pair
arguments.

A classification tree created by 

A matrix where each row represents an observation, and each
column represents a predictor. The number of columns in 
Specify optional commaseparated pairs of Name,Value
arguments.
Name
is the argument
name and Value
is the corresponding
value. Name
must appear
inside single quotes (' '
).
You can specify several name and value pair
arguments in any order as Name1,Value1,...,NameN,ValueN
.

A vector of nonnegative integers in ascending order or If you specify a vector, then all elements must be at least If you specify
To invoke Default: 

Vector of class labels of the same type as the response data
used in training If 

Numeric matrix of size If 

Numeric vector of node numbers for the predicted classes. Each
entry corresponds to the predicted node in 

Numeric vector of class numbers corresponding to the predicted 
predict
classifies so as to minimize the expected
classification cost:
$$\widehat{y}=\underset{y=1,\mathrm{...},K}{\mathrm{arg}\mathrm{min}}{\displaystyle \sum _{k=1}^{K}\widehat{P}\left(kx\right)C\left(yk\right)},$$
where
$$\widehat{y}$$ is the predicted classification.
K is the number of classes.
$$\widehat{P}\left(kx\right)$$ is the posterior probability of class k for observation x.
$$C\left(yk\right)$$ is the cost of classifying an observation as y when its true class is k.
For trees, the score of a classification of a leaf node is the posterior probability of the classification at that node. The posterior probability of the classification at a node is the number of training sequences that lead to that node with the classification, divided by the number of training sequences that lead to that node.
For example, consider classifying a predictor X
as true
when X
< 0.15
or X
> 0.95
, and X
is
false otherwise.
Generate 100 random points and classify them:
rng(0,'twister') % for reproducibility X = rand(100,1); Y = (abs(X  .55) > .4); tree = fitctree(X,Y); view(tree,'Mode','Graph')
Prune the tree:
tree1 = prune(tree,'Level',1); view(tree1,'Mode','Graph')
The pruned tree correctly classifies observations that are less
than 0.15 as true
. It also correctly classifies
observations from .15 to .94 as false
. However,
it incorrectly classifies observations that are greater than .94 as false
.
Therefore, the score for observations that are greater than .15 should
be about .05/.85=.06 for true
, and about .8/.85=.94
for false
.
Compute the prediction scores for the first 10 rows of X
:
[~,score] = predict(tree1,X(1:10)); [score X(1:10,:)]
ans = 0.9059 0.0941 0.8147 0.9059 0.0941 0.9058 0 1.0000 0.1270 0.9059 0.0941 0.9134 0.9059 0.0941 0.6324 0 1.0000 0.0975 0.9059 0.0941 0.2785 0.9059 0.0941 0.5469 0.9059 0.0941 0.9575 0.9059 0.0941 0.9649
Indeed, every value of X
(the rightmost
column) that is less than 0.15 has associated scores (the left and
center columns) of 0
and 1
,
while the other values of X
have associated scores
of 0.91
and 0.09
. The difference
(score 0.09
instead of the expected .06
)
is due to a statistical fluctuation: there are 8
observations
in X
in the range (.95,1)
instead
of the expected 5
observations.
There are two costs associated with classification: the true misclassification cost per class, and the expected misclassification cost per observation.
You can set the true misclassification cost per class in the Cost
namevalue
pair when you create the classifier using the fitctree
method. Cost(i,j)
is
the cost of classifying an observation into class j
if
its true class is i
. By default, Cost(i,j)=1
if i~=j
,
and Cost(i,j)=0
if i=j
. In other
words, the cost is 0
for correct classification,
and 1
for incorrect classification.
There are two costs associated with classification: the true misclassification cost per class, and the expected misclassification cost per observation.
Suppose you have Nobs
observations that you
want to classify with a trained classifier. Suppose you have K
classes.
You place the observations into a matrix Xnew
with
one observation per row.
The expected cost matrix CE
has size Nobs
byK
.
Each row of CE
contains the expected (average)
cost of classifying the observation into each of the K
classes. CE(n,k)
is
$$\sum _{i=1}^{K}\widehat{P}\left(iXnew(n)\right)C\left(ki\right)},$$
where
K is the number of classes.
$$\widehat{P}\left(iXnew(n)\right)$$ is the posterior probability of class i for observation Xnew(n).
$$C\left(ki\right)$$ is the true misclassification cost of classifying an observation as k when its true class is i.
The predictive measure of association between the optimal split on variable i and a surrogate split on variable j is:
$${\lambda}_{i,j}=\frac{\text{min}\left({P}_{L},{P}_{R}\right)\left(1{P}_{{L}_{i}{L}_{j}}{P}_{{R}_{i}{R}_{j}}\right)}{\text{min}\left({P}_{L},{P}_{R}\right)}.$$
Here
P_{L} and P_{R} are the node probabilities for the optimal split of node i into Left and Right nodes respectively.
$${P}_{{L}_{i}{L}_{j}}$$ is the probability that both (optimal) node i and (surrogate) node j send an observation to the Left.
$${P}_{{R}_{i}{R}_{j}}$$ is the probability that both (optimal) node i and (surrogate) node j send an observation to the Right.
Clearly, λ_{i,j} lies from –∞ to 1. Variable j is a worthwhile surrogate split for variable i if λ_{i,j} > 0.
predict
generates predictions by following
the branches of tree
until it reaches a leaf node
or a missing value. If predict
reaches a leaf node,
it returns the classification of that node.
If predict
reaches a node with a missing value
for a predictor, its behavior depends on the setting of the Surrogate
namevalue
pair when fitctree
constructs tree
.
Surrogate
= 'off'
(default)
— predict
returns the label with the largest
number of training samples that reach the node.
Surrogate
= 'on'
— predict
uses
the best surrogate split at the node. If all surrogate split variables
with positive predictive measure of association are
missing, predict
returns the label with the largest
number of training samples that reach the node. For a definition,
see Predictive Measure of Association.