resubPredict

Class: ClassificationKNN

Predict resubstitution response of k-nearest neighbor classifier

Syntax

label = resubPredict(mdl)
[label,score] = resubPredict(mdl)
[label,score,cost] = resubPredict(mdl)

Description

label = resubPredict(mdl) returns the labels mdl predicts for the data mdl.X. label is the predictions of mdl on the data that fitcknn used to create mdl.

[label,score] = resubPredict(mdl) returns the posterior class probabilities for the predictions.

[label,score,cost] = resubPredict(mdl) returns the misclassification costs.

Input Arguments

expand all

mdl — Classifier modelclassifier model object

k-nearest neighbor classifier model, returned as a classifier model object.

Note that using the 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition' options results in a model of class ClassificationPartitionedModel. You cannot use a partitioned tree for prediction, so this kind of tree does not have a predict method.

Otherwise, mdl is of class ClassificationKNN, and you can use the predict method to make predictions.

Output Arguments

label

Predicted class labels for the points in the training data X, a vector with length equal to the number of rows in the training data X. The label is the class with minimal expected cost (see Expected Cost).

score

Numeric matrix of size N-by-K, where N is the number of observations (rows) in the training data X, and K is the number of classes (in mdl.ClassNames). score(i,j) is the posterior probability that row i of X is of class j. See Posterior Probability.

cost

Matrix of expected costs of size N-by-K, where N is the number of observations (rows) in the training data X, and K is the number of classes (in mdl.ClassNames). cost(i,j) is the cost of classifying row i of X as class j. See Expected Cost.

Definitions

Posterior Probability

For a vector (single query point) X and model mdl, let

  • K be the number of nearest neighbors used in prediction, mdl.NumNeighbors

  • nbd(mdl,X) be the K nearest neighbors to X in mdl.X

  • Y(nbd) be the classifications of the points in nbd(mdl,X), namely mdl.Y(nbd)

  • W(nbd) be the weights of the points in nbd(mdl,X)

  • prior be the priors of the classes in mdl.Y

If there is a vector of prior probabilities, then the observation weights W are normalized by class to sum to the priors. This might involve a calculation for the point X, because weights can depend on the distance from X to the points in mdl.X.

The posterior probability p(j|X) is

p(j|X)=inbdW(i)1Y(X(i)=j)inbdW(i).

Here 1Y(X(i)=j) means 1 when mdl.Y(i) = j, and 0 otherwise.

Expected Cost

There are two costs associated with KNN classification: the true misclassification cost per class, and the expected misclassification cost per observation. The third output of predict is the expected misclassification cost per observation.

Suppose you have Nobs observations that you want to classify with a trained classifier mdl. Suppose you have K classes. You place the observations into a matrix X with one observation per row. The command

[label,score,cost] = predict(mdl,X)

returns, among other outputs, a cost matrix of size Nobs-by-K. Each row of the cost matrix contains the expected (average) cost of classifying the observation into each of the K classes. cost(n,k) is

i=1KP^(i|X(n))C(k|i),

where

True Misclassification Cost

There are two costs associated with KNN classification: the true misclassification cost per class, and the expected misclassification cost per observation.

You can set the true misclassification cost per class in the Cost name-value pair when you run fitcknn. Cost(i,j) is the cost of classifying an observation into class j if its true class is i. By default, Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j. In other words, the cost is 0 for correct classification, and 1 for incorrect classification.

Algorithms

If you specified to standardize the predictor data, that is, mdl.Mu and mdl.Sigma are not empty ([]), then resubPredict standardizes the predictor data before predicting labels.

Examples

expand all

Predict the Labels of the Training Data

Examine the quality of a classifier by its resubstitution predictions.

Load the data.

load fisheriris
X = meas;
Y = species;

Construct a classifier for 5-nearest neighbors.

mdl = fitcknn(X,Y,'NumNeighbors',5);

Generate the resubstitution predictions.

label = resubPredict(mdl);

Calculate the number of differences between the predictions label and the original data Y.

mydiff = not(strcmp(Y,label)); % mydiff(i) = 1 means they differ
sum(mydiff) % Number of differences
ans =

     5

A values of 1 in mydiff indicates that the observed label differs from the corresponding predicted label. There are 5 misclassifications.

Was this topic helpful?