Note: This page has been translated by MathWorks. Click here to see

To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

`predict`

uses three quantities to classify observations: posterior probability, prior probability, and cost.

`predict`

classifies so as to minimize the expected
classification cost:

$$\widehat{y}=\underset{y=1,\mathrm{...},K}{\mathrm{arg}\mathrm{min}}{\displaystyle \sum _{k=1}^{K}\widehat{P}\left(k|x\right)C\left(y|k\right)},$$

where

$$\widehat{y}$$ is the predicted classification.

*K*is the number of classes.$$\widehat{P}\left(k|x\right)$$ is the posterior probability of class

*k*for observation*x*.$$C\left(y|k\right)$$ is the cost of classifying an observation as

*y*when its true class is*k*.

The space of `X`

values divides into regions where a classification
`Y`

is a particular value. The regions are separated by straight
lines for linear discriminant analysis, and by conic sections (ellipses, hyperbolas, or
parabolas) for quadratic discriminant analysis. For a visualization of these regions,
see Create and Visualize Discriminant Analysis Classifier.

The posterior probability that a point *x* belongs to class
*k* is the product of the prior
probability and the multivariate normal density. The density function of
the multivariate normal with mean *μ _{k}* and
covariance Σ

$$P\left(x|k\right)=\frac{1}{{\left(2\pi \left|{\Sigma}_{k}\right|\right)}^{1/2}}\mathrm{exp}\left(-\frac{1}{2}{\left(x-{\mu}_{k}\right)}^{T}{\Sigma}_{k}^{-1}\left(x-{\mu}_{k}\right)\right),$$

where $$\left|{\Sigma}_{k}\right|$$ is the determinant of
Σ* _{k}*, and $${\Sigma}_{k}^{-1}$$ is the inverse matrix.

Let *P*(*k*) represent the prior probability of
class *k*. Then the posterior probability that an observation
*x* is of class *k* is

$$\widehat{P}\left(k|x\right)=\frac{P\left(x|k\right)P\left(k\right)}{P\left(x\right)},$$

where *P*(*x*) is a normalization constant,
namely, the sum over *k* of
*P*(*x*|*k*)*P*(*k*).

The prior probability is one of three choices:

`'uniform'`

— The prior probability of class`k`

is 1 over the total number of classes.`'empirical'`

— The prior probability of class`k`

is the number of training samples of class`k`

divided by the total number of training samples.A numeric vector — The prior probability of class

`k`

is the`j`

th element of the`Prior`

vector. See`fitcdiscr`

.

After creating a classifier `obj`

, you can set the prior using
dot notation:

obj.Prior = v;

where `v`

is a vector of positive elements representing the
frequency with which each element occurs. You do not need to retrain the classifier
when you set a new prior.

There are two costs associated with discriminant analysis classification: the true misclassification cost per class, and the expected misclassification cost per observation.

`Cost(i,j)`

is the cost of classifying an observation into
class `j`

if its true class is `i`

. By
default, `Cost(i,j)=1`

if `i~=j`

, and
`Cost(i,j)=0`

if `i=j`

. In other words,
the cost is `0`

for correct classification, and
`1`

for incorrect classification.

You can set any cost matrix you like when creating a classifier. Pass the cost
matrix in the `Cost`

name-value pair in `fitcdiscr`

.

After you create a classifier `obj`

, you can set a custom
cost using dot notation:

obj.Cost = B;

`B`

is a square matrix of size
`K`

-by-`K`

when there are
`K`

classes. You do not need to retrain the classifier when
you set a new cost.

Suppose you have `Nobs`

observations that you want to
classify with a trained discriminant analysis classifier `obj`

.
Suppose you have `K`

classes. You place the observations into a
matrix `Xnew`

with one observation per row. The command

[label,score,cost] = predict(obj,Xnew)

returns, among other outputs, a cost matrix of size
`Nobs`

-by-`K`

. Each row of the cost matrix
contains the expected (average) cost of classifying the observation into each of
the `K`

classes. `cost(n,k)`

is

$$\sum _{i=1}^{K}\widehat{P}\left(i|Xnew(n)\right)C\left(k|i\right)},$$

where

*K*is the number of classes.$$\widehat{P}\left(i|Xnew(n)\right)$$ is the posterior probability of class

*i*for observation*Xnew*(*n*).$$C\left(k|i\right)$$ is the cost of classifying an observation as

*k*when its true class is*i*.