**Class: **ClassificationSVM

Classification loss for support vector machine classifiers by resubstitution

returns
the classification
loss by resubstitution (`L`

= resubLoss(`SVMModel`

)`L`

), the in-sample
classification loss, for the support vector machine (SVM) classifier `SVMModel`

using
the training data stored in `SVMModel.X`

and corresponding
class labels stored in `SVMModel.Y`

.

returns
the classification loss by resubstitution with additional options
specified by one or more `L`

= resubLoss(`SVMModel`

,`Name,Value`

)`Name,Value`

pair arguments.

*Classification loss* functions
measure the predictive inaccuracy of classification models. When comparing
the same type of loss among many models, lower loss indicates a better
predictive model.

Suppose that:

*L*is the weighted average classification loss.*n*is the sample size.For binary classification:

*y*is the observed class label. The software codes it as –1 or 1 indicating the negative or positive class, respectively._{j}*f*(*X*) is the raw classification score for observation (row)_{j}*j*of the predictor data*X*.*m*=_{j}*y*_{j}*f*(*X*) is the classification score for classifying observation_{j}*j*into the class corresponding to*y*. Positive values of_{j}*m*indicate correct classification and do not contribute much to the average loss. Negative values of_{j}*m*indicate incorrect classification and contribute to the average loss._{j}

For algorithms that support multiclass classification (that is,

*K*≥ 3):*y*is a vector of_{j}^{*}*K*– 1 zeros, and a 1 in the position corresponding to the true, observed class*y*. For example, if the true class of the second observation is the third class and_{j}*K*= 4, then*y*^{*}_{2}= [0 0 1 0]′. The order of the classes corresponds to the order in the`ClassNames`

property of the input model.*f*(*X*) is the length_{j}*K*vector of class scores for observation*j*of the predictor data*X*. The order of the scores corresponds to the order of the classes in the`ClassNames`

property of the input model.*m*=_{j}*y*_{j}^{*}′*f*(*X*). Therefore,_{j}*m*is the scalar classification score that the model predicts for the true, observed class._{j}

The weight for observation

*j*is*w*. The software normalizes the observation weights so that they sum to the corresponding prior class probability. The software also normalizes the prior probabilities so they sum to 1. Therefore,_{j}$$\sum _{j=1}^{n}{w}_{j}}=1.$$

The supported loss functions are:

Binomial deviance, specified using

`'LossFun','binodeviance'`

. Its equation is$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{log}\left\{1+\mathrm{exp}\left[-2{m}_{j}\right]\right\}}.$$

Exponential loss, specified using

`'LossFun','exponential'`

. Its equation is$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{exp}\left(-{m}_{j}\right)}.$$

Classification error, specified using

`'LossFun','classiferror'`

. It is the weighted fraction of misclassified observations, with equation$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}}I\left\{{\widehat{y}}_{j}\ne {y}_{j}\right\}.$$

$${\widehat{y}}_{j}$$ is the class label corresponding to the class with the maximal posterior probability.

*I*{*x*} is the indicator function.Hinge loss, specified using

`'LossFun','hinge'`

. Its equation is$$L={\displaystyle \sum}_{j=1}^{n}{w}_{j}\mathrm{max}\left\{0,1-{m}_{j}\right\}.$$

Logit loss, specified using

`'LossFun','logit'`

. Its equation is$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{log}\left(1+\mathrm{exp}\left(-{m}_{j}\right)\right)}.$$

Minimal cost, specified using

`'LossFun','mincost'`

. The software computes the weighted minimal cost using this procedure for observations*j*= 1,...,*n*:Estimate the 1-by-

*K*vector of expected classification costs for observation*j*$${\gamma}_{j}=f{\left({X}_{j}\right)}^{\prime}C.$$

*f*(*X*) is the column vector of class posterior probabilities for binary and multiclass classification._{j}*C*is the cost matrix the input model stores in the property`Cost`

.For observation

*j*, predict the class label corresponding to the minimum, expected classification cost:$${\widehat{y}}_{j}=\underset{j=1,\mathrm{...},K}{\mathrm{min}}\left({\gamma}_{j}\right).$$

Using

*C*, identify the cost incurred (*c*) for making the prediction._{j}

The weighted, average, minimum cost loss is

$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}{c}_{j}}.$$

Quadratic loss, specified using

`'LossFun','quadratic'`

. Its equation is$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}{\left(1-{m}_{j}\right)}^{2}}.$$

This figure compares some of the loss functions for one observation
over *m* (some functions are normalized to pass through
[0,1]).

The SVM *classification score* for
classifying observation *x* is the signed distance
from *x* to the decision boundary ranging from -∞
to +∞. A positive score for a class indicates that *x* is
predicted to be in that class, a negative score indicates otherwise.

The score for predicting *x* into the positive
class, also the numerical, predicted response for *x*, $$f(x)$$, is the trained SVM classification
function

$$f(x)={\displaystyle \sum _{j=1}^{n}{\alpha}_{j}}{y}_{j}G({x}_{j},x)+b,$$

where $$({\alpha}_{1},\mathrm{...},{\alpha}_{n},b)$$ are
the estimated SVM parameters, $$G({x}_{j},x)$$ is
the dot product in the predictor space between *x* and
the support vectors, and the sum includes the training set observations.
The score for predicting *x* into the negative class
is –*f*(*x*).

If *G*(*x _{j}*,

$$f\left(x\right)=\left(x/s\right)\prime \beta +b.$$

*s* is
the kernel scale and *β* is the vector of fitted
linear coefficients.

[1] Hastie, T., R. Tibshirani, and J. Friedman. *The
Elements of Statistical Learning*, second edition. Springer,
New York, 2008.

`ClassificationSVM`

| `CompactClassificationSVM`

| `fitcsvm`

| `loss`

| `resubMargin`

| `resubPredict`

Was this topic helpful?