## Documentation Center |

Discriminant analysis

`class = classify(sample,training,group)class = classify(sample,training,group,'type')class = classify(sample,training,group,'type',prior)[class,err] = classify(...)[class,err,POSTERIOR] = classify(...)[class,err,POSTERIOR,logp] = classify(...)[class,err,POSTERIOR,logp,coeff] = classify(...)`

`class = classify(sample,training,group)` classifies
each row of the data in `sample` into one of the
groups in `training`. `sample` and `training` must
be matrices with the same number of columns. `group` is
a grouping variable for `training`. Its unique values
define groups; each element defines the group to which the corresponding
row of `training` belongs. `group` can
be a categorical variable, a numeric vector, a string array, or a
cell array of strings. `training` and `group` must
have the same number of rows. `classify` treats `NaN`s
or empty strings in `group` as missing values, and
ignores the corresponding rows of `training`. The
output `class` indicates the group to which each
row of `sample` has been assigned, and is of the
same type as `group`.

`class = classify(sample,training,group,'type')` allows
you to specify the type of discriminant function. Specify

`linear`— Fits a multivariate normal density to each group, with a pooled estimate of covariance. This is the default.`diaglinear`— Similar to`linear`, but with a diagonal covariance matrix estimate (naive Bayes classifiers).`quadratic`— Fits multivariate normal densities with covariance estimates stratified by group.`diagquadratic`— Similar to`quadratic`, but with a diagonal covariance matrix estimate (naive Bayes classifiers).`mahalanobis`— Uses Mahalanobis distances with stratified covariance estimates.

`class = classify(sample,training,group,'type',prior)` allows
you to specify prior probabilities for the groups.

A numeric vector the same length as the number of unique values in

`group`(or the number of levels defined for`group`, if`group`is categorical). If`group`is numeric or categorical, the order ofmust correspond to the ordered values in`prior``group`, or, if`group`contains strings, to the order of first occurrence of the values in`group`.A 1-by-1 structure with fields:

`prob`— A numeric vector.`group`— Of the same type as`group`, containing unique values indicating the groups to which the elements of`prob`correspond.

As a structure,

can contain groups that do not appear in`prior``group`. This can be useful if`training`is a subset a larger training set.`classify`ignores any groups that appear in the structure but not in the`group`array.The string

`'empirical'`, indicating that group prior probabilities should be estimated from the group relative frequencies in`training`.

* prior* defaults to a numeric vector
of equal probabilities, i.e., a uniform distribution.

`[class,err] = classify(...)` also
returns an estimate `err` of the misclassification
error rate based on the `training` data. `classify` returns
the apparent error rate, i.e., the percentage of observations in `training` that
are misclassified, weighted by the prior probabilities for the groups.

`[class,err,POSTERIOR] = classify(...)` also
returns a matrix `POSTERIOR` of estimates of the
posterior probabilities that the *j*th training group
was the source of the *i*th sample observation, i.e., *Pr*(*group
j*|*obs i*). `POSTERIOR` is
not computed for Mahalanobis discrimination.

`[class,err,POSTERIOR,logp] = classify(...)` also
returns a vector `logp` containing estimates of the
logarithms of the unconditional predictive probability density of
the sample observations, *p*(*obs i*)
= ∑*p*(*obs i*|*group
j*)*Pr*(*group j*) over
all groups. `logp` is not computed for Mahalanobis
discrimination.

`[class,err,POSTERIOR,logp,coeff] = classify(...)` also
returns a structure array `coeff` containing coefficients
of the boundary curves between pairs of groups. Each element `coeff(I,J)`
contains information for comparing group `I` to group `J` in
the following fields:

`type`— Type of discriminant function, from theinput.`type``name1`— Name of the first group.`name2`— Name of the second group.`const`— Constant term of the boundary equation (K)`linear`— Linear coefficients of the boundary equation (L)`quadratic`— Quadratic coefficient matrix of the boundary equation (Q)

For the `linear` and `diaglinear` types,
the `quadratic` field is absent, and a row `x` from
the `sample` array is classified into group `I` rather
than group `J` if `0 < K+x*L`.
For the other types, `x` is classified into group `I` if `0
< K+x*L+x*Q*x'`.

[1] Krzanowski, W. J. *Principles
of Multivariate Analysis: A User's Perspective*. New York:
Oxford University Press, 1988.

[2] Seber, G. A. F. *Multivariate
Observations*. Hoboken, NJ: John Wiley & Sons, Inc.,
1984.

Was this topic helpful?