Accelerating the pace of engineering and science

# loss

Classification error

## Syntax

L = loss(obj,X,Y)
L = loss(obj,X,Y,Name,Value)

## Description

L = loss(obj,X,Y) returns a scalar representing how well obj classifies the data in X, when Y contains the true classifications.

When computing the loss, loss normalizes the class probabilities in Y to the class probabilities used for training, stored in the Prior property of obj.

L = loss(obj,X,Y,Name,Value) returns the loss with additional options specified by one or more Name,Value pair arguments.

## Input Arguments

 obj Discriminant analysis classifier of class ClassificationDiscriminant or CompactClassificationDiscriminant, typically constructed with fitcdiscr. X Matrix where each row represents an observation, and each column represents a predictor. The number of columns in X must equal the number of predictors in obj. Y Class labels, with the same data type as exists in obj. The number of elements of Y must equal the number of rows of X.

### Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

 'lossfun ' Function handle or string representing a loss function. Built-in loss functions: 'binodeviance' — See Loss Functions.'classiferror' — Fraction of misclassified observations. See Loss Functions.'exponential' — See Loss Functions.'hinge' — See Loss Functions.'mincost' — Smallest misclassification cost as given by the obj.Cost matrix. See Loss Functions. You can write your own loss function using the syntax described in Loss Functions. Default: 'mincost' 'weights' Numeric vector of length N, where N is the number of rows of X. weights are nonnegative. loss normalizes the weights so that observation weights in each class sum to the prior probability of that class. When you supply weights, loss computes weighted classification loss. Default: ones(N,1)

## Output Arguments

 L Classification error, a scalar. The meaning of the error depends on the values in weights and lossfun. See Classification Error.

## Definitions

### Classification Error

The default classification error is the fraction of data X that obj misclassifies, where Y represents the true classifications.

Weighted classification error is the sum of weight i times the Boolean value that is 1 when obj misclassifies the ith row of X, divided by the sum of the weights.

### Loss Functions

The built-in loss functions are:

• 'binodeviance' — For binary classification, assume the classes yn are -1 and 1. With weight vector w normalized to have sum 1, and predictions of row n of data X as f(Xn), the binomial deviance is

$\sum {w}_{n}\mathrm{log}\left(1+\mathrm{exp}\left(-2{y}_{n}f\left({X}_{n}\right)\right)\right).$

• 'exponential' — With the same definitions as for 'binodeviance', the exponential loss is

$\sum {w}_{n}\mathrm{exp}\left(-{y}_{n}f\left({X}_{n}\right)\right).$

• 'classiferror' — Predict the label with the largest posterior probability. The loss is then the fraction of misclassified observations.

• 'hinge' — Classification error measure that has the form

$L=\frac{\sum _{j=1}^{n}{w}_{j}\mathrm{max}\left\{0,1-{y}_{j}\prime f\left({X}_{j}\right)\right\}}{\sum _{j=1}^{n}{w}_{j}},$

where:

• wj is weight j.

• For binary classification, yj = 1 for the positive class and -1 for the negative class. For problems where the number of classes K > 3, yj is a vector of 0s, but with a 1 in the position corresponding to the true class, e.g., if the second observation is in the third class and K = 4, then y2 = [0 0 1 0]′.

• $f\left({X}_{j}\right)$ is, for binary classification, the posterior probability or, for K > 3, a vector of posterior probabilities for each class given observation j.

• 'mincost' — Predict the label with the smallest expected misclassification cost, with expectation taken over the posterior probability, and cost as given by the Cost property of the classifier (a matrix). The loss is then the true misclassification cost averaged over the observations.

To write your own loss function, create a function file in this form:

`function loss = lossfun(C,S,W,COST)`
• N is the number of rows of X.

• K is the number of classes in the classifier, represented in the ClassNames property.

• C is an N-by-K logical matrix, with one true per row for the true class. The index for each class is its position in the ClassNames property.

• S is an N-by-K numeric matrix. S is a matrix of posterior probabilities for classes with one row per observation, similar to the posterior output from predict.

• W is a numeric vector with N elements, the observation weights. If you pass W, the elements are normalized to sum to the prior probabilities in the respective classes.

• COST is a K-by-K numeric matrix of misclassification costs. For example, you can use COST = ones(K) - eye(K), which means a cost of 0 for correct classification, and 1 for misclassification.

• The output loss should be a scalar.

Pass the function handle @lossfun as the value of the LossFun name-value pair.

### Posterior Probability

The posterior probability that a point z belongs to class j is the product of the prior probability and the multivariate normal density. The density function of the multivariate normal with mean μj and covariance Σj at a point z is

$P\left(x|k\right)=\frac{1}{{\left(2\pi |{\Sigma }_{k}|\right)}^{1/2}}\mathrm{exp}\left(-\frac{1}{2}{\left(x-{\mu }_{k}\right)}^{T}{\Sigma }_{k}^{-1}\left(x-{\mu }_{k}\right)\right),$

where $|{\Sigma }_{k}|$ is the determinant of Σk, and ${\Sigma }_{k}^{-1}$ is the inverse matrix.

Let P(k) represent the prior probability of class k. Then the posterior probability that an observation x is of class k is

$\stackrel{^}{P}\left(k|x\right)=\frac{P\left(x|k\right)P\left(k\right)}{P\left(x\right)},$

where P(x) is a normalization constant, the sum over k of P(x|k)P(k).

### Prior Probability

The prior probability is one of three choices:

• 'uniform' — The prior probability of class k is one over the total number of classes.

• 'empirical' — The prior probability of class k is the number of training samples of class k divided by the total number of training samples.

• Custom — The prior probability of class k is the kth element of the prior vector. See fitcdiscr.

After creating a classifier obj, you can set the prior using dot notation:

`obj.Prior = v;`

where v is a vector of positive elements representing the frequency with which each element occurs. You do not need to retrain the classifier when you set a new prior.

### Cost

The matrix of expected costs per observation is defined in Cost.

## Examples

Compute the resubstituted classification error for the Fisher iris data:

```load fisheriris
obj = fitcdiscr(meas,species);
L = loss(obj,meas,species)

L =
0.0200```