Note: This page has been translated by MathWorks. Please click here

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

`label = predict(Mdl,X)`

`[label,score,cost] = predict(Mdl,X)`

returns
a vector of predicted
class label for the predictor data in the table or matrix `label`

= predict(`Mdl`

,`X`

)`X`

,
based on the trained *k*-nearest neighbor classification
model `Mdl`

.

`[`

also
returns:`label`

,`score`

,`cost`

]
= predict(`Mdl`

,`X`

)

A matrix of classification scores (

`score`

) indicating the likelihood that a label comes from a particular class. For*k*-nearest neighbor, scores are posterior probabilities.A matrix of expected classification cost (

`cost`

). For each observation in`X`

, the predicted class label corresponds to the minimum expected classification costs among all classes.

`predict`

classifies so as to minimize the expected
classification cost:

$$\widehat{y}=\underset{y=1,\mathrm{...},K}{\mathrm{arg}\mathrm{min}}{\displaystyle \sum _{k=1}^{K}\widehat{P}\left(k|x\right)C\left(y|k\right)},$$

where

$$\widehat{y}$$ is the predicted classification.

*K*is the number of classes.$$\widehat{P}\left(k|x\right)$$ is the posterior probability of class

*k*for observation*x*.$$C\left(y|k\right)$$ is the cost of classifying an observation as

*y*when its true class is*k*.

For a vector (single query point) `Xnew`

and
model `mdl`

, let:

`K`

be the number of nearest neighbors used in prediction,`mdl.NumNeighbors`

`nbd(mdl,Xnew)`

be the`K`

nearest neighbors to`Xnew`

in`mdl.X`

`Y(nbd)`

be the classifications of the points in`nbd(mdl,Xnew)`

, namely`mdl.Y(nbd)`

`W(nbd)`

be the weights of the points in`nbd(mdl,Xnew)`

`prior`

be the priors of the classes in`mdl.Y`

If there is a vector of prior probabilities, then the observation
weights `W`

are normalized by class to sum to the
priors. This might involve a calculation for the point `Xnew`

,
because weights can depend on the distance from `Xnew`

to
the points in `mdl.X`

.

The posterior probability *p*(*j*|`Xnew`

)
is

$$p\left(j|\text{Xnew}\right)=\frac{{\displaystyle \sum _{i\in \text{nbd}}W(i){1}_{Y(X(i)=j)}}}{{\displaystyle \sum _{i\in \text{nbd}}W(i)}}.$$

Here, $${1}_{Y(X(i)=j)}$$ means `1`

when `mdl.Y(i) = j`

, and `0`

otherwise.

There are two costs associated with KNN classification: the true misclassification cost per class, and the expected misclassification cost per observation.

You can set the true misclassification cost per class in the `Cost`

name-value
pair when you run `fitcknn`

. `Cost(i,j)`

is
the cost of classifying an observation into class `j`

if
its true class is `i`

. By default, `Cost(i,j)=1`

if `i~=j`

,
and `Cost(i,j)=0`

if `i=j`

. In other
words, the cost is `0`

for correct classification,
and `1`

for incorrect classification.

There are two costs associated with KNN classification: the
true misclassification cost per class, and the expected misclassification
cost per observation. The third output of `predict`

is
the expected misclassification cost per observation.

Suppose you have `Nobs`

observations that you
want to classify with a trained classifier `mdl`

.
Suppose you have `K`

classes. You place the observations
into a matrix `Xnew`

with one observation per row.
The command

[label,score,cost] = predict(mdl,Xnew)

returns, among other outputs, a `cost`

matrix
of size `Nobs`

-by-`K`

. Each row
of the `cost`

matrix contains the expected (average)
cost of classifying the observation into each of the `K`

classes. `cost(n,k)`

is

$$\sum _{i=1}^{K}\widehat{P}\left(i|Xnew(n)\right)C\left(k|i\right)},$$

where

*K*is the number of classes.$$\widehat{P}\left(i|Xnew(n)\right)$$ is the posterior probability of class

*i*for observation*Xnew*(*n*).$$C\left(k|i\right)$$ is the true misclassification cost of classifying an observation as

*k*when its true class is*i*.

Was this topic helpful?