`L = loss(mdl,tbl,ResponseVarName)`

`L = loss(mdl,tbl,Y)`

`L = loss(mdl,X,Y)`

`L = loss(___,Name,Value)`

returns
a scalar representing how well `L`

= loss(`mdl`

,`tbl`

,`ResponseVarName`

)`mdl`

classifies
the data in `tbl`

, when `tbl.ResponseVarName`

contains
the true classifications.

When computing the loss, `loss`

normalizes the
class probabilities in `tbl.ResponseVarNames`

to
the class probabilities used for training, stored in the `Prior`

property
of `mdl`

.

returns
the loss with additional options specified by one or more `L`

= loss(___,`Name,Value`

)`Name,Value`

pair
arguments, using any of the previous syntaxes.

*Classification loss* functions
measure the predictive inaccuracy of classification models. When comparing
the same type of loss among many models, lower loss indicates a better
predictive model.

Suppose that:

*L*is the weighted average classification loss.*n*is the sample size.For binary classification:

*y*is the observed class label. The software codes it as –1 or 1 indicating the negative or positive class, respectively._{j}*f*(*X*) is the raw classification score for observation (row)_{j}*j*of the predictor data*X*.*m*=_{j}*y*_{j}*f*(*X*) is the classification score for classifying observation_{j}*j*into the class corresponding to*y*. Positive values of_{j}*m*indicate correct classification and do not contribute much to the average loss. Negative values of_{j}*m*indicate incorrect classification and contribute to the average loss._{j}

For algorithms that support multiclass classification (that is,

*K*≥ 3):*y*is a vector of_{j}^{*}*K*– 1 zeros, and a 1 in the position corresponding to the true, observed class*y*. For example, if the true class of the second observation is the third class and_{j}*K*= 4, then*y*^{*}_{2}= [0 0 1 0]′. The order of the classes corresponds to the order in the`ClassNames`

property of the input model.*f*(*X*) is the length_{j}*K*vector of class scores for observation*j*of the predictor data*X*. The order of the scores corresponds to the order of the classes in the`ClassNames`

property of the input model.*m*=_{j}*y*_{j}^{*}′*f*(*X*). Therefore,_{j}*m*is the scalar classification score that the model predicts for the true, observed class._{j}

The weight for observation

*j*is*w*. The software normalizes the observation weights so that they sum to the corresponding prior class probability. The software also normalizes the prior probabilities so they sum to 1. Therefore,_{j}$$\sum _{j=1}^{n}{w}_{j}}=1.$$

The supported loss functions are:

Binomial deviance, specified using

`'LossFun','binodeviance'`

. Its equation is$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{log}\left\{1+\mathrm{exp}\left[-2{m}_{j}\right]\right\}}.$$

Exponential loss, specified using

`'LossFun','exponential'`

. Its equation is$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{exp}\left(-{m}_{j}\right)}.$$

Classification error, specified using

`'LossFun','classiferror'`

. It is the weighted fraction of misclassified observations, with equation$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}}I\left\{{\widehat{y}}_{j}\ne {y}_{j}\right\}.$$

$${\widehat{y}}_{j}$$ is the class label corresponding to the class with the maximal posterior probability.

*I*{*x*} is the indicator function.Hinge loss, specified using

`'LossFun','hinge'`

. Its equation is$$L={\displaystyle \sum}_{j=1}^{n}{w}_{j}\mathrm{max}\left\{0,1-{m}_{j}\right\}.$$

Logit loss, specified using

`'LossFun','logit'`

. Its equation is$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{log}\left(1+\mathrm{exp}\left(-{m}_{j}\right)\right)}.$$

Minimal cost, specified using

`'LossFun','mincost'`

. The software computes the weighted minimal cost using this procedure for observations*j*= 1,...,*n*:Estimate the 1-by-

*K*vector of expected classification costs for observation*j*$${\gamma}_{j}=f{\left({X}_{j}\right)}^{\prime}C.$$

*f*(*X*) is the column vector of class posterior probabilities for binary and multiclass classification._{j}*C*is the cost matrix the input model stores in the property`Cost`

.For observation

*j*, predict the class label corresponding to the minimum, expected classification cost:$${\widehat{y}}_{j}=\underset{j=1,\mathrm{...},K}{\mathrm{min}}\left({\gamma}_{j}\right).$$

Using

*C*, identify the cost incurred (*c*) for making the prediction._{j}

The weighted, average, minimum cost loss is

$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}{c}_{j}}.$$

Quadratic loss, specified using

`'LossFun','quadratic'`

. Its equation is$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}{\left(1-{m}_{j}\right)}^{2}}.$$

This figure compares some of the loss functions for one observation
over *m* (some functions are normalized to pass through
[0,1]).

There are two costs associated with KNN classification: the true misclassification cost per class, and the expected misclassification cost per observation.

You can set the true misclassification cost per class in the `Cost`

name-value
pair when you run `fitcknn`

. `Cost(i,j)`

is
the cost of classifying an observation into class `j`

if
its true class is `i`

. By default, `Cost(i,j)=1`

if `i~=j`

,
and `Cost(i,j)=0`

if `i=j`

. In other
words, the cost is `0`

for correct classification,
and `1`

for incorrect classification.

There are two costs associated with KNN classification: the
true misclassification cost per class, and the expected misclassification
cost per observation. The third output of `predict`

is
the expected misclassification cost per observation.

Suppose you have `Nobs`

observations that you
want to classify with a trained classifier `mdl`

.
Suppose you have `K`

classes. You place the observations
into a matrix `Xnew`

with one observation per row.
The command

[label,score,cost] = predict(mdl,Xnew)

returns, among other outputs, a `cost`

matrix
of size `Nobs`

-by-`K`

. Each row
of the `cost`

matrix contains the expected (average)
cost of classifying the observation into each of the `K`

classes. `cost(n,k)`

is

$$\sum _{i=1}^{K}\widehat{P}\left(i|Xnew(n)\right)C\left(k|i\right)},$$

where

*K*is the number of classes.$$\widehat{P}\left(i|Xnew(n)\right)$$ is the posterior probability of class

*i*for observation*Xnew*(*n*).$$C\left(k|i\right)$$ is the true misclassification cost of classifying an observation as

*k*when its true class is*i*.

Was this topic helpful?