L = loss(obj,X,Y)
L = loss(obj,X,Y,Name,Value)
returns
the classification
loss, which is a scalar representing how well L
= loss(obj
,X
,Y
)obj
classifies
the data in X
, when Y
contains
the true classifications.
When computing the loss, loss
normalizes the
class probabilities in Y
to the class probabilities
used for training, stored in the Prior
property
of obj
.
returns
the loss with additional options specified by one or more L
= loss(obj
,X
,Y
,Name,Value
)Name,Value
pair
arguments.

Discriminant analysis classifier of class 

Matrix where each row represents an observation, and each column
represents a predictor. The number of columns in 

Class labels, with the same data type as exists in 
Specify optional commaseparated pairs of Name,Value
arguments.
Name
is the argument
name and Value
is the corresponding
value. Name
must appear
inside single quotes (' '
).
You can specify several name and value pair
arguments in any order as Name1,Value1,...,NameN,ValueN
.

Builtin, lossfunction name (character vector in the table) or function handle.
For more details on loss functions, see Classification Loss. Default:  

Numeric vector of length Default: 

Classification
loss, a scalar. The interpretation of 
Classification loss functions measure the predictive inaccuracy of classification models. When comparing the same type of loss among many models, lower loss indicates a better predictive model.
Suppose that:
L is the weighted average classification loss.
n is the sample size.
For binary classification:
y_{j} is the observed class label. The software codes it as –1 or 1 indicating the negative or positive class, respectively.
f(X_{j}) is the raw classification score for observation (row) j of the predictor data X.
m_{j} = y_{j}f(X_{j}) is the classification score for classifying observation j into the class corresponding to y_{j}. Positive values of m_{j} indicate correct classification and do not contribute much to the average loss. Negative values of m_{j} indicate incorrect classification and contribute to the average loss.
For algorithms that support multiclass classification (that is, K ≥ 3):
y_{j}^{*} is
a vector of K – 1 zeros, and a 1 in the
position corresponding to the true, observed class y_{j}.
For example, if the true class of the second observation is the third
class and K = 4, then y^{*}_{2} =
[0 0 1 0]′. The order of the classes corresponds to the order
in the ClassNames
property of the input model.
f(X_{j})
is the length K vector of class scores for observation j of
the predictor data X. The order of the scores corresponds
to the order of the classes in the ClassNames
property
of the input model.
m_{j} = y_{j}^{*}′f(X_{j}). Therefore, m_{j} is the scalar classification score that the model predicts for the true, observed class.
The weight for observation j is w_{j}. The software normalizes the observation weights so that they sum to the corresponding prior class probability. The software also normalizes the prior probabilities so they sum to 1. Therefore,
$$\sum _{j=1}^{n}{w}_{j}}=1.$$
The supported loss functions are:
Binomial deviance, specified using 'LossFun','binodeviance'
.
Its equation is
$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{log}\left\{1+\mathrm{exp}\left[2{m}_{j}\right]\right\}}.$$
Exponential loss, specified using 'LossFun','exponential'
.
Its equation is
$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{exp}\left({m}_{j}\right)}.$$
Classification error, specified using 'LossFun','classiferror'
.
It is the weighted fraction of misclassified observations, with equation
$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}}I\left\{{\widehat{y}}_{j}\ne {y}_{j}\right\}.$$
$${\widehat{y}}_{j}$$ is the class label corresponding to the class with the maximal posterior probability. I{x} is the indicator function.
Hinge loss, specified using 'LossFun','hinge'
.
Its equation is
$$L={\displaystyle \sum}_{j=1}^{n}{w}_{j}\mathrm{max}\left\{0,1{m}_{j}\right\}.$$
Logit loss, specified using 'LossFun','logit'
.
Its equation is
$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}\mathrm{log}\left(1+\mathrm{exp}\left({m}_{j}\right)\right)}.$$
Minimal cost, specified using 'LossFun','mincost'
.
The software computes the weighted minimal cost using this procedure
for observations j = 1,...,n:
Estimate the 1byK vector of expected classification costs for observation j
$${\gamma}_{j}=f{\left({X}_{j}\right)}^{\prime}C.$$
f(X_{j})
is the column vector of class posterior probabilities for binary and
multiclass classification. C is the cost matrix
the input model stores in the property Cost
.
For observation j, predict the class label corresponding to the minimum, expected classification cost:
$${\widehat{y}}_{j}=\underset{j=1,\mathrm{...},K}{\mathrm{min}}\left({\gamma}_{j}\right).$$
Using C, identify the cost incurred (c_{j}) for making the prediction.
The weighted, average, minimum cost loss is
$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}{c}_{j}}.$$
Quadratic loss, specified using 'LossFun','quadratic'
.
Its equation is
$$L={\displaystyle \sum _{j=1}^{n}{w}_{j}{\left(1{m}_{j}\right)}^{2}}.$$
This figure compares some of the loss functions for one observation over m (some functions are normalized to pass through [0,1]).
The posterior probability that a point z belongs to class j is the product of the prior probability and the multivariate normal density. The density function of the multivariate normal with mean μ_{j} and covariance Σ_{j} at a point z is
$$P\left(xk\right)=\frac{1}{{\left(2\pi \left{\Sigma}_{k}\right\right)}^{1/2}}\mathrm{exp}\left(\frac{1}{2}{\left(x{\mu}_{k}\right)}^{T}{\Sigma}_{k}^{1}\left(x{\mu}_{k}\right)\right),$$
where $$\left{\Sigma}_{k}\right$$ is the determinant of Σ_{k}, and $${\Sigma}_{k}^{1}$$ is the inverse matrix.
Let P(k) represent the prior probability of class k. Then the posterior probability that an observation x is of class k is
$$\widehat{P}\left(kx\right)=\frac{P\left(xk\right)P\left(k\right)}{P\left(x\right)},$$
where P(x) is a normalization constant, the sum over k of P(xk)P(k).
The prior probability is one of three choices:
'uniform'
— The prior probability
of class k
is one over the total number of classes.
'empirical'
— The prior
probability of class k
is the number of training
samples of class k
divided by the total number
of training samples.
Custom — The prior probability of class k
is
the k
th element of the prior
vector.
See fitcdiscr
.
After creating a classifier obj
, you can
set the prior using dot notation:
obj.Prior = v;
where v
is a vector of positive elements
representing the frequency with which each element occurs. You do
not need to retrain the classifier when you set a new prior.
The matrix of expected costs per observation is defined in Cost.