predict

Class: ClassificationKNN

Predict k-nearest neighbor classification

Syntax

label = predict(mdl,Xnew)
[label,score] = predict(mdl,Xnew)
[label,score,cost] = predict(mdl,Xnew)

Description

label = predict(mdl,Xnew) returns a vector of predicted class labels for a matrix Xnew, based on mdl, a ClassificationKNN model.

[label,score] = predict(mdl,Xnew) returns a matrix of scores, indicating the likelihood that a label comes from a particular class.

[label,score,cost] = predict(mdl,Xnew) returns a matrix of costs; label is the vector of minimal costs for each row of cost

Input Arguments

expand all

mdl — Classifier modelclassifier model object

k-nearest neighbor classifier model, returned as a classifier model object.

Note that using the 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition' options results in a model of class ClassificationPartitionedModel. You cannot use a partitioned tree for prediction, so this kind of tree does not have a predict method.

Otherwise, mdl is of class ClassificationKNN, and you can use the predict method to make predictions.

Xnew — Prediction pointsmatrix

Points at which mdl predicts classifications. Each row of Xnew is one point. The number of columns in Xnew must equal the number of predictors in mdl.

If you specified to standardize the predictor data, that is, mdl.Mu and mdl.Sigma are not empty ([]), then predict standardizes Xnew before predicting labels.

Output Arguments

label

Predicted class labels for the points in Xnew, a vector with length equal to the number of rows of Xnew. The label is the class with minimal expected cost. See Predicted Class Label.

score

Numeric matrix of size N-by-K, where N is the number of observations (rows) in Xnew, and K is the number of classes (in mdl.ClassNames). score(i,j) is the posterior probability that row i of Xnew is of class j. See Posterior Probability.

cost

Matrix of expected costs of size N-by-K, where N is the number of observations (rows) in Xnew, and K is the number of classes (in mdl.ClassNames). cost(i,j) is the cost of classifying row i of X as class j. See Expected Cost.

Definitions

Predicted Class Label

predict classifies so as to minimize the expected classification cost:

y^=argminy=1,...,Kk=1KP^(k|x)C(y|k),

where

  • y^ is the predicted classification.

  • K is the number of classes.

  • P^(k|x) is the posterior probability of class k for observation x.

  • C(y|k) is the cost of classifying an observation as y when its true class is k.

Posterior Probability

For a vector (single query point) Xnew and model mdl, let:

  • K be the number of nearest neighbors used in prediction, mdl.NumNeighbors

  • nbd(mdl,Xnew) be the K nearest neighbors to Xnew in mdl.X

  • Y(nbd) be the classifications of the points in nbd(mdl,Xnew), namely mdl.Y(nbd)

  • W(nbd) be the weights of the points in nbd(mdl,Xnew)

  • prior be the priors of the classes in mdl.Y

If there is a vector of prior probabilities, then the observation weights W are normalized by class to sum to the priors. This might involve a calculation for the point Xnew, because weights can depend on the distance from Xnew to the points in mdl.X.

The posterior probability p(j|Xnew) is

p(j|Xnew)=inbdW(i)1Y(X(i)=j)inbdW(i).

Here, 1Y(X(i)=j) means 1 when mdl.Y(i) = j, and 0 otherwise.

True Misclassification Cost

There are two costs associated with KNN classification: the true misclassification cost per class, and the expected misclassification cost per observation.

You can set the true misclassification cost per class in the Cost name-value pair when you run fitcknn. Cost(i,j) is the cost of classifying an observation into class j if its true class is i. By default, Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j. In other words, the cost is 0 for correct classification, and 1 for incorrect classification.

Expected Cost

There are two costs associated with KNN classification: the true misclassification cost per class, and the expected misclassification cost per observation. The third output of predict is the expected misclassification cost per observation.

Suppose you have Nobs observations that you want to classify with a trained classifier mdl. Suppose you have K classes. You place the observations into a matrix Xnew with one observation per row. The command

[label,score,cost] = predict(mdl,Xnew)

returns, among other outputs, a cost matrix of size Nobs-by-K. Each row of the cost matrix contains the expected (average) cost of classifying the observation into each of the K classes. cost(n,k) is

i=1KP^(i|Xnew(n))C(k|i),

where

Examples

expand all

k-Nearest Neighbor Classification Predictions

Construct a k-nearest neighbor classifier for Fisher's iris data, where k = 5. Evaluate some model predictions on new data.

Load the data.

load fisheriris
X = meas;
Y = species;

Construct a classifier for 5-nearest neighbors. It is good practice to standardize non-categorical predictor data.

mdl = fitcknn(X,Y,'NumNeighbors',5,'Standardize',1);

Predict the classifications for flowers with minimum, mean, and maximum characteristics.

Xnew = [min(X);mean(X);max(X)];
[label,score,cost] = predict(mdl,Xnew)
label = 

    'versicolor'
    'versicolor'
    'virginica'


score =

    0.4000    0.6000         0
         0    1.0000         0
         0         0    1.0000


cost =

    0.6000    0.4000    1.0000
    1.0000         0    1.0000
    1.0000    1.0000         0

The classifications have binary values for the score and cost matrices, meaning all five nearest neighbors of each of the three points have identical classifications.

Was this topic helpful?