L = resubLoss(tree)
L = resubLoss(tree,Name,Value)
L = resubLoss(tree,'Subtrees',subtreevector)
[L,se] =
resubLoss(tree,'Subtrees',subtreevector)
[L,se,NLeaf]
= resubLoss(tree,'Subtrees',subtreevector)
[L,se,NLeaf,bestlevel]
= resubLoss(tree,'Subtrees',subtreevector)
[L,...] = resubLoss(tree,'Subtrees',subtreevector,Name,Value)
returns
the resubstitution loss, meaning the loss computed for the data that L
= resubLoss(tree
)fitctree
used to create tree
.
returns
the loss with additional options specified by one or more L
= resubLoss(tree
,Name,Value
)Name,Value
pair
arguments. You can specify several namevalue pair arguments in any
order as Name1,Value1,…,NameN,ValueN
.
returns
a vector of classification errors for the trees in the pruning sequence L
= resubLoss(tree
,'Subtrees'
,subtreevector)subtreevector
.
[
returns
the vector of standard errors of the classification errors.L
,se
] =
resubLoss(tree
,'Subtrees'
,subtreevector)
[
returns
the vector of numbers of leaf nodes in the trees of the pruning sequence.L
,se
,NLeaf
]
= resubLoss(tree
,'Subtrees'
,subtreevector)
[
returns
the best pruning level as defined in the L
,se
,NLeaf
,bestlevel
]
= resubLoss(tree
,'Subtrees'
,subtreevector)TreeSize
namevalue
pair. By default, bestlevel
is the pruning level
that gives loss within one standard deviation of minimal loss.
[L,...] = resubLoss(
returns
loss statistics with additional options specified by one or more tree
,'Subtrees'
,subtreevector,Name,Value
)Name,Value
pair
arguments. You can specify several namevalue pair arguments in any
order as Name1,Value1,…,NameN,ValueN
.

Classification
loss, a vector the length of 

Standard error of loss, a vector the length of 

Number of leaves (terminal nodes) in the pruned subtrees, a
vector the length of 

A scalar whose value depends on

Classification loss functions measure the predictive inaccuracy of classification models. When comparing the same type of loss among many models, lower loss indicates a better predictive model.
Suppose that:
L is the weighted average classification loss.
n is the sample size.
For binary classification:
y_{j} is the observed class label. The software codes it as –1 or 1 indicating the negative or positive class, respectively.
f(X_{j}) is the raw classification score for observation (row) j of the predictor data X.
m_{j} = y_{j}f(X_{j}) is the classification score for classifying observation j into the class corresponding to y_{j}. Positive values of m_{j} indicate correct classification and do not contribute much to the average loss. Negative values of m_{j} indicate incorrect classification and contribute to the average loss.
For algorithms that support multiclass classification (that is, K ≥ 3):
y_{j}^{*} is
a vector of K – 1 zeros, and a 1 in the
position corresponding to the true, observed class y_{j}.
For example, if the true class of the second observation is the third
class and K = 4, then y^{*}_{2} =
[0 0 1 0]′. The order of the classes corresponds to the order
in the ClassNames
property of the input model.
f(X_{j})
is the length K vector of class scores for observation j of
the predictor data X. The order of the scores corresponds
to the order of the classes in the ClassNames
property
of the input model.
m_{j} = y_{j}^{*}′f(X_{j}). Therefore, m_{j} is the scalar classification score that the model predicts for the true, observed class.
The weight for observation j is w_{j}. The software normalizes the observation weights so that they sum to the corresponding prior class probability. The software also normalizes the prior probabilities so they sum to 1. Therefore,
$$\sum _{j=1}^{n}{w}_{j}}=1.$$
The supported loss functions are:
Binomial deviance, specified using 'LossFun','binodeviance'
.
Its equation is
$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{log}\left\{1+\mathrm{exp}\left[2{m}_{j}\right]\right\}}.$$
Exponential loss, specified using 'LossFun','exponential'
.
Its equation is
$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{exp}\left({m}_{j}\right)}.$$
Classification error, specified using 'LossFun','classiferror'
.
It is the weighted fraction of misclassified observations, with equation
$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}}I\left\{{\widehat{y}}_{j}\ne {y}_{j}\right\}.$$
$${\widehat{y}}_{j}$$ is the class label corresponding to the class with the maximal posterior probability. I{x} is the indicator function.
Hinge loss, specified using 'LossFun','hinge'
.
Its equation is
$$L={\displaystyle \sum}_{j=1}^{n}{w}_{j}\mathrm{max}\left\{0,1{m}_{j}\right\}.$$
Logit loss, specified using 'LossFun','logit'
.
Its equation is
$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{log}\left(1+\mathrm{exp}\left({m}_{j}\right)\right)}.$$
Minimal cost, specified using 'LossFun','mincost'
.
The software computes the weighted minimal cost using this procedure
for observations j = 1,...,n:
Estimate the 1byK vector of expected classification costs for observation j
$${\gamma}_{j}=f{\left({X}_{j}\right)}^{\prime}C.$$
f(X_{j})
is the column vector of class posterior probabilities for binary and
multiclass classification. C is the cost matrix
the input model stores in the property Cost
.
For observation j, predict the class label corresponding to the minimum, expected classification cost:
$${\widehat{y}}_{j}=\underset{j=1,\mathrm{...},K}{\mathrm{min}}\left({\gamma}_{j}\right).$$
Using C, identify the cost incurred (c_{j}) for making the prediction.
The weighted, average, minimum cost loss is
$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}{c}_{j}}.$$
Quadratic loss, specified using 'LossFun','quadratic'
.
Its equation is
$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}{\left(1{m}_{j}\right)}^{2}}.$$
This figure compares some of the loss functions for one observation over m (some functions are normalized to pass through [0,1]).
There are two costs associated with classification: the true misclassification cost per class, and the expected misclassification cost per observation.
You can set the true misclassification cost per class in the Cost
namevalue
pair when you create the classifier using the fitctree
method. Cost(i,j)
is
the cost of classifying an observation into class j
if
its true class is i
. By default, Cost(i,j)=1
if i~=j
,
and Cost(i,j)=0
if i=j
. In other
words, the cost is 0
for correct classification,
and 1
for incorrect classification.
There are two costs associated with classification: the true misclassification cost per class, and the expected misclassification cost per observation.
Suppose you have Nobs
observations that you
want to classify with a trained classifier. Suppose you have K
classes.
You place the observations into a matrix Xnew
with
one observation per row.
The expected cost matrix CE
has size Nobs
byK
.
Each row of CE
contains the expected (average)
cost of classifying the observation into each of the K
classes. CE(n,k)
is
$$\sum _{i=1}^{K}\widehat{P}\left(iXnew(n)\right)C\left(ki\right)},$$
where
K is the number of classes.
$$\widehat{P}\left(iXnew(n)\right)$$ is the posterior probability of class i for observation Xnew(n).
$$C\left(ki\right)$$ is the true misclassification cost of classifying an observation as k when its true class is i.
fitctree
 loss
 resubEdge
 resubMargin
 resubPredict