Note: This page has been translated by MathWorks. Please click here

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

`L = loss(mdl,tbl,ResponseVarName)`

`L = loss(mdl,tbl,Y)`

`L = loss(mdl,X,Y)`

`L = loss(___,Name,Value)`

returns
a scalar representing how well `L`

= loss(`mdl`

,`tbl`

,`ResponseVarName`

)`mdl`

classifies
the data in `tbl`

, when `tbl.ResponseVarName`

contains
the true classifications.

When computing the loss, `loss`

normalizes the
class probabilities in `tbl.ResponseVarNames`

to
the class probabilities used for training, stored in the `Prior`

property
of `mdl`

.

returns
the loss with additional options specified by one or more `L`

= loss(___,`Name,Value`

)`Name,Value`

pair
arguments, using any of the previous syntaxes.

*Classification loss* functions
measure the predictive inaccuracy of classification models. When comparing
the same type of loss among many models, lower loss indicates a better
predictive model.

Suppose that:

*L*is the weighted average classification loss.*n*is the sample size.For binary classification:

*y*is the observed class label. The software codes it as –1 or 1 indicating the negative or positive class, respectively._{j}*f*(*X*) is the raw classification score for observation (row)_{j}*j*of the predictor data*X*.*m*=_{j}*y*_{j}*f*(*X*) is the classification score for classifying observation_{j}*j*into the class corresponding to*y*. Positive values of_{j}*m*indicate correct classification and do not contribute much to the average loss. Negative values of_{j}*m*indicate incorrect classification and contribute to the average loss._{j}

For algorithms that support multiclass classification (that is,

*K*≥ 3):*y*is a vector of_{j}^{*}*K*– 1 zeros, and a 1 in the position corresponding to the true, observed class*y*. For example, if the true class of the second observation is the third class and_{j}*K*= 4, then*y*^{*}_{2}= [0 0 1 0]′. The order of the classes corresponds to the order in the`ClassNames`

property of the input model.*f*(*X*) is the length_{j}*K*vector of class scores for observation*j*of the predictor data*X*. The order of the scores corresponds to the order of the classes in the`ClassNames`

property of the input model.*m*=_{j}*y*_{j}^{*}′*f*(*X*). Therefore,_{j}*m*is the scalar classification score that the model predicts for the true, observed class._{j}

The weight for observation

*j*is*w*. The software normalizes the observation weights so that they sum to the corresponding prior class probability. The software also normalizes the prior probabilities so they sum to 1. Therefore,_{j}$$\sum _{j=1}^{n}{w}_{j}}=1.$$

The supported loss functions are:

Binomial deviance, specified using

`'LossFun','binodeviance'`

. Its equation is$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{log}\left\{1+\mathrm{exp}\left[-2{m}_{j}\right]\right\}}.$$

Exponential loss, specified using

`'LossFun','exponential'`

. Its equation is$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{exp}\left(-{m}_{j}\right)}.$$

Classification error, specified using

`'LossFun','classiferror'`

. It is the weighted fraction of misclassified observations, with equation$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}}I\left\{{\widehat{y}}_{j}\ne {y}_{j}\right\}.$$

$${\widehat{y}}_{j}$$ is the class label corresponding to the class with the maximal posterior probability.

*I*{*x*} is the indicator function.Hinge loss, specified using

`'LossFun','hinge'`

. Its equation is$$L={\displaystyle \sum}_{j=1}^{n}{w}_{j}\mathrm{max}\left\{0,1-{m}_{j}\right\}.$$

Logit loss, specified using

`'LossFun','logit'`

. Its equation is$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{log}\left(1+\mathrm{exp}\left(-{m}_{j}\right)\right)}.$$

Minimal cost, specified using

`'LossFun','mincost'`

. The software computes the weighted minimal cost using this procedure for observations*j*= 1,...,*n*:Estimate the 1-by-

*K*vector of expected classification costs for observation*j*$${\gamma}_{j}=f{\left({X}_{j}\right)}^{\prime}C.$$

*f*(*X*) is the column vector of class posterior probabilities for binary and multiclass classification._{j}*C*is the cost matrix the input model stores in the property`Cost`

.For observation

*j*, predict the class label corresponding to the minimum, expected classification cost:$${\widehat{y}}_{j}=\underset{j=1,\mathrm{...},K}{\mathrm{min}}\left({\gamma}_{j}\right).$$

Using

*C*, identify the cost incurred (*c*) for making the prediction._{j}

The weighted, average, minimum cost loss is

$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}{c}_{j}}.$$

Quadratic loss, specified using

`'LossFun','quadratic'`

. Its equation is$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}{\left(1-{m}_{j}\right)}^{2}}.$$

This figure compares some of the loss functions for one observation
over *m* (some functions are normalized to pass through
[0,1]).

There are two costs associated with KNN classification: the true misclassification cost per class, and the expected misclassification cost per observation.

You can set the true misclassification cost per class in the `Cost`

name-value
pair when you run `fitcknn`

. `Cost(i,j)`

is
the cost of classifying an observation into class `j`

if
its true class is `i`

. By default, `Cost(i,j)=1`

if `i~=j`

,
and `Cost(i,j)=0`

if `i=j`

. In other
words, the cost is `0`

for correct classification,
and `1`

for incorrect classification.

There are two costs associated with KNN classification: the
true misclassification cost per class, and the expected misclassification
cost per observation. The third output of `predict`

is
the expected misclassification cost per observation.

Suppose you have `Nobs`

observations that you
want to classify with a trained classifier `mdl`

.
Suppose you have `K`

classes. You place the observations
into a matrix `Xnew`

with one observation per row.
The command

[label,score,cost] = predict(mdl,Xnew)

returns, among other outputs, a `cost`

matrix
of size `Nobs`

-by-`K`

. Each row
of the `cost`

matrix contains the expected (average)
cost of classifying the observation into each of the `K`

classes. `cost(n,k)`

is

$$\sum _{i=1}^{K}\widehat{P}\left(i|Xnew(n)\right)C\left(k|i\right)},$$

where

*K*is the number of classes.$$\widehat{P}\left(i|Xnew(n)\right)$$ is the posterior probability of class

*i*for observation*Xnew*(*n*).$$C\left(k|i\right)$$ is the true misclassification cost of classifying an observation as

*k*when its true class is*i*.

Was this topic helpful?