L = loss(tree,X,Y)
L = loss(tree,X,Y,Name,Value)
L = loss(tree,X,Y,'Subtrees',subtreevector)
[L,se] =
loss(tree,X,Y,'Subtrees',subtreevector)
[L,se,NLeaf]
= loss(tree,X,Y,'Subtrees',subtreevector)
[L,se,NLeaf,bestlevel]
= loss(tree,X,Y,'Subtrees',subtreevector)
[L,...] = loss(tree,X,Y,'Subtrees',subtreevector,Name,Value)
returns
a scalar representing how well L
= loss(tree
,X
,Y
)tree
classifies
the data in X
, when Y
contains
the true classifications.
When computing the loss, loss
normalizes the
class probabilities in Y
to the class probabilities
used for training, stored in the Prior
property
of tree
.
returns
the loss with additional options specified by one or more L
= loss(tree
,X
,Y
,Name,Value
)Name,Value
pair
arguments.
returns
a vector of classification errors for the trees in the pruning sequence L
= loss(tree
,X
,Y
,'Subtrees'
,subtreevector
)subtreevector
.
[
returns
the vector of standard errors of the classification errors.L
,se
] =
loss(tree
,X
,Y
,'Subtrees'
,subtreevector
)
Note:

[
returns
the vector of numbers of leaf nodes in the trees of the pruning sequence.L
,se
,NLeaf
]
= loss(tree
,X
,Y
,'Subtrees'
,subtreevector
)
[
returns
the best pruning level as defined in the L
,se
,NLeaf
,bestlevel
]
= loss(tree
,X
,Y
,'Subtrees'
,subtreevector
)TreeSize
namevalue
pair. By default, bestlevel
is the pruning level
that gives loss within one standard deviation of minimal loss.
[
returns
loss statistics with additional options specified by one or more L
,...] = loss(tree
,X
,Y
,'Subtrees'
,subtreevector
,Name,Value
)Name,Value
pair
arguments.

A classification tree or compact classification tree constructed
by 

Matrix of data to classify. Each row of 

Classification of 
Specify optional commaseparated pairs of Name,Value
arguments.
Name
is the argument
name and Value
is the corresponding
value. Name
must appear
inside single quotes (' '
).
You can specify several name and value pair
arguments in any order as Name1,Value1,...,NameN,ValueN
.

Function handle or string representing a loss function. Builtin loss functions:
You can write your own loss function in the syntax described in Loss Functions. Default: 

A numeric vector of length Default: 
Name,Value
arguments associated with pruning
subtrees:

A vector of nonnegative integers in ascending order or If you specify a vector, then all elements must be at least If you specify
To invoke Default: 

One of the following strings:


Classification error, a vector the length of 

Standard error of loss, a vector the length of 

Number of leaves (terminal nodes) in the pruned subtrees, a
vector the length of 

A scalar whose value depends on

The default classification error is the fraction of data X
that tree
misclassifies,
where Y
represents the true classifications.
Weighted classification error is the sum of weight i times
the Boolean value that is 1
when tree
misclassifies
the ith row of X
, divided by
the sum of the weights.
The builtin loss functions are:
'binodeviance'
— For binary
classification, assume the classes y_{n} are 1
and 1
.
With weight vector w normalized to have sum 1
,
and predictions of row n of data X as f(X_{n}),
the binomial deviance is
$$\sum {w}_{n}\mathrm{log}\left(1+\mathrm{exp}\left(2{y}_{n}f\left({X}_{n}\right)\right)\right)}.$$
'exponential'
— With the
same definitions as for 'binodeviance'
, the exponential
loss is
$$\sum {w}_{n}\mathrm{exp}\left({y}_{n}f\left({X}_{n}\right)\right)}.$$
'classiferror'
— Predict
the label with the largest posterior probability. The loss is then
the fraction of misclassified observations.
'hinge'
— Classification
error measure that has the form
$$L=\frac{{\displaystyle \sum}_{j=1}^{n}{w}_{j}\mathrm{max}\left\{0,1{y}_{j}\prime f\left({X}_{j}\right)\right\}}{{\displaystyle \sum}_{j=1}^{n}{w}_{j}},$$
where:
w_{j} is weight j.
For binary classification, y_{j} = 1 for the positive class and 1 for the negative class. For problems where the number of classes K > 3, y_{j} is a vector of 0s, but with a 1 in the position corresponding to the true class, e.g., if the second observation is in the third class and K = 4, then y_{2} = [0 0 1 0]′.
$$f({X}_{j})$$ is, for binary classification, the posterior probability or, for K > 3, a vector of posterior probabilities for each class given observation j.
'mincost'
— Predict the
label with the smallest expected misclassification cost, with expectation
taken over the posterior probability, and cost as given by the Cost
property
of the classifier (a matrix). The loss is then the true misclassification
cost averaged over the observations.
To write your own loss function, create a function file in this form:
function loss = lossfun(C,S,W,COST)
N
is the number of rows of X
.
K
is the number of classes in the
classifier, represented in the ClassNames
property.
C
is an N
byK
logical
matrix, with one true
per row for the true class.
The index for each class is its position in the ClassNames
property.
S
is an N
byK
numeric
matrix. S
is a matrix of posterior probabilities
for classes with one row per observation, similar to the posterior
output
from predict
.
W
is a numeric vector with N
elements,
the observation weights. If you pass W
, the elements
are normalized to sum to the prior probabilities in the respective
classes.
COST
is a K
byK
numeric
matrix of misclassification costs. For example, you can use COST = ones(K)  eye(K)
,
which means a cost of 0
for correct classification,
and 1
for misclassification.
The output loss
should be a scalar.
Pass the function handle @
as
the value of the lossfun
LossFun
namevalue pair.
There are two costs associated with classification: the true misclassification cost per class, and the expected misclassification cost per observation.
You can set the true misclassification cost per class in the Cost
namevalue
pair when you create the classifier using the fitctree
method. Cost(i,j)
is
the cost of classifying an observation into class j
if
its true class is i
. By default, Cost(i,j)=1
if i~=j
,
and Cost(i,j)=0
if i=j
. In other
words, the cost is 0
for correct classification,
and 1
for incorrect classification.
There are two costs associated with classification: the true misclassification cost per class, and the expected misclassification cost per observation.
Suppose you have Nobs
observations that you
want to classify with a trained classifier. Suppose you have K
classes.
You place the observations into a matrix Xnew
with
one observation per row.
The expected cost matrix CE
has size Nobs
byK
.
Each row of CE
contains the expected (average)
cost of classifying the observation into each of the K
classes. CE(n,k)
is
$$\sum _{i=1}^{K}\widehat{P}\left(iXnew(n)\right)C\left(ki\right)},$$
where
K is the number of classes.
$$\widehat{P}\left(iXnew(n)\right)$$ is the posterior probability of class i for observation Xnew(n).
$$C\left(ki\right)$$ is the true misclassification cost of classifying an observation as k when its true class is i.
For trees, the score of a classification of a leaf node is the posterior probability of the classification at that node. The posterior probability of the classification at a node is the number of training sequences that lead to that node with the classification, divided by the number of training sequences that lead to that node.
For example, consider classifying a predictor X
as true
when X
< 0.15
or X
> 0.95
, and X
is
false otherwise.
Generate 100 random points and classify them:
rng(0,'twister') % for reproducibility X = rand(100,1); Y = (abs(X  .55) > .4); tree = fitctree(X,Y); view(tree,'Mode','Graph')
Prune the tree:
tree1 = prune(tree,'Level',1); view(tree1,'Mode','Graph')
The pruned tree correctly classifies observations that are less
than 0.15 as true
. It also correctly classifies
observations from .15 to .94 as false
. However,
it incorrectly classifies observations that are greater than .94 as false
.
Therefore, the score for observations that are greater than .15 should
be about .05/.85=.06 for true
, and about .8/.85=.94
for false
.
Compute the prediction scores for the first 10 rows of X
:
[~,score] = predict(tree1,X(1:10)); [score X(1:10,:)]
ans = 0.9059 0.0941 0.8147 0.9059 0.0941 0.9058 0 1.0000 0.1270 0.9059 0.0941 0.9134 0.9059 0.0941 0.6324 0 1.0000 0.0975 0.9059 0.0941 0.2785 0.9059 0.0941 0.5469 0.9059 0.0941 0.9575 0.9059 0.0941 0.9649
Indeed, every value of X
(the rightmost
column) that is less than 0.15 has associated scores (the left and
center columns) of 0
and 1
,
while the other values of X
have associated scores
of 0.91
and 0.09
. The difference
(score 0.09
instead of the expected .06
)
is due to a statistical fluctuation: there are 8
observations
in X
in the range (.95,1)
instead
of the expected 5
observations.