# Documentation

### This is machine translation

Translated by
Mouse over text to see original. Click the button below to return to the English verison of the page.

# resubPredict

Class: ClassificationKNN

Predict resubstitution response of k-nearest neighbor classifier

## Syntax

```label = resubPredict(mdl)[label,score] = resubPredict(mdl)[label,score,cost] = resubPredict(mdl)```

## Description

`label = resubPredict(mdl)` returns the labels `mdl` predicts for the data `mdl.X`. `label` is the predictions of `mdl` on the data that `fitcknn` used to create `mdl`.

```[label,score] = resubPredict(mdl)``` returns the posterior class probabilities for the predictions.

```[label,score,cost] = resubPredict(mdl)``` returns the misclassification costs.

## Input Arguments

expand all

k-nearest neighbor classifier model, returned as a classifier model object.

Note that using the `'CrossVal'`, `'KFold'`, `'Holdout'`, `'Leaveout'`, or `'CVPartition'` options results in a model of class `ClassificationPartitionedModel`. You cannot use a partitioned tree for prediction, so this kind of tree does not have a `predict` method.

Otherwise, `mdl` is of class `ClassificationKNN`, and you can use the `predict` method to make predictions.

## Output Arguments

 `label` Predicted class labels for the points in the training data `X`, a vector with length equal to the number of rows in the training data `X`. The label is the class with minimal expected cost (see Expected Cost). `score` Numeric matrix of size `N`-by-`K`, where `N` is the number of observations (rows) in the training data `X`, and `K` is the number of classes (in `mdl.ClassNames`). `score(i,j)` is the posterior probability that row `i` of `X` is of class `j`. See Posterior Probability. `cost` Matrix of expected costs of size `N`-by-`K`, where `N` is the number of observations (rows) in the training data `X`, and `K` is the number of classes (in `mdl.ClassNames`). `cost(i,j)` is the cost of classifying row `i` of `X` as class `j`. See Expected Cost.

## Definitions

### Posterior Probability

For a vector (single query point) `X` and model `mdl`, let

• `K` be the number of nearest neighbors used in prediction, `mdl.NumNeighbors`

• `nbd(mdl,X)` be the `K` nearest neighbors to `X` in `mdl.X`

• `Y(nbd)` be the classifications of the points in `nbd(mdl,X)`, namely `mdl.Y(nbd)`

• `W(nbd)` be the weights of the points in `nbd(mdl,X)`

• `prior` be the priors of the classes in `mdl.Y`

If there is a vector of prior probabilities, then the observation weights `W` are normalized by class to sum to the priors. This might involve a calculation for the point `X`, because weights can depend on the distance from `X` to the points in `mdl.X`.

The posterior probability p(j|X) is

`$p\left(j|\text{X}\right)=\frac{\sum _{i\in \text{nbd}}W\left(i\right){1}_{Y\left(X\left(i\right)=j\right)}}{\sum _{i\in \text{nbd}}W\left(i\right)}.$`

Here ${1}_{Y\left(X\left(i\right)=j\right)}$ means `1` when `mdl.Y(i) = j`, and `0` otherwise.

### Expected Cost

There are two costs associated with KNN classification: the true misclassification cost per class, and the expected misclassification cost per observation. The third output of `predict` is the expected misclassification cost per observation.

Suppose you have `Nobs` observations that you want to classify with a trained classifier `mdl`. Suppose you have `K` classes. You place the observations into a matrix `X` with one observation per row. The command

`[label,score,cost] = predict(mdl,X)`

returns, among other outputs, a `cost` matrix of size `Nobs`-by-`K`. Each row of the `cost` matrix contains the expected (average) cost of classifying the observation into each of the `K` classes. `cost(n,k)` is

`$\sum _{i=1}^{K}\stackrel{^}{P}\left(i|X\left(n\right)\right)C\left(k|i\right),$`

where

• K is the number of classes.

• $\stackrel{^}{P}\left(i|X\left(n\right)\right)$ is the posterior probability of class i for observation X(n).

• $C\left(k|i\right)$ is the true misclassification cost of classifying an observation as k when its true class is i.

### True Misclassification Cost

There are two costs associated with KNN classification: the true misclassification cost per class, and the expected misclassification cost per observation.

You can set the true misclassification cost per class in the `Cost` name-value pair when you run `fitcknn`. `Cost(i,j)` is the cost of classifying an observation into class `j` if its true class is `i`. By default, `Cost(i,j)=1` if `i~=j`, and `Cost(i,j)=0` if `i=j`. In other words, the cost is `0` for correct classification, and `1` for incorrect classification.

## Algorithms

If you specified to standardize the predictor data, that is, `mdl.Mu` and `mdl.Sigma` are not empty (`[]`), then `resubPredict` standardizes the predictor data before predicting labels.

## Examples

expand all

Examine the quality of a classifier by its resubstitution predictions.

Load the data.

```load fisheriris X = meas; Y = species; ```

Construct a classifier for 5-nearest neighbors.

```mdl = fitcknn(X,Y,'NumNeighbors',5); ```

Generate the resubstitution predictions.

```label = resubPredict(mdl); ```

Calculate the number of differences between the predictions `label` and the original data `Y`.

```mydiff = not(strcmp(Y,label)); % mydiff(i) = 1 means they differ sum(mydiff) % Number of differences ```
```ans = 5 ```

A values of `1` in `mydiff` indicates that the observed label differs from the corresponding predicted label. There are `5` misclassifications.

## See Also

Was this topic helpful?

Download now