Documentation Center

  • Trial Software
  • Product Updates

loss

Class: ClassificationKNN

Loss of k-nearest neighbor classifier

Syntax

L = loss(mdl,X,Y)
L = loss(mdl,X,Y,Name,Value)

Description

L = loss(mdl,X,Y) returns a scalar representing how well mdl classifies the data in X, when Y contains the true classifications.

When computing the loss, loss normalizes the class probabilities in Y to the class probabilities used for training, stored in the Prior property of mdl.

L = loss(mdl,X,Y,Name,Value) returns the loss with additional options specified by one or more Name,Value pair arguments.

Input Arguments

expand all

mdl — Classifier modelclassifier model object

k-nearest neighbor classifier model, returned as a classifier model object.

Note that using the 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition' options results in a model of class ClassificationPartitionedModel. You cannot use a partitioned tree for prediction, so this kind of tree does not have a predict method.

Otherwise, mdl is of class ClassificationKNN, and you can use the predict method to make predictions.

X — Matrix of predictor valuesmatrix

Matrix of predictor values. Each column of X represents one variable, and each row represents one observation.

Y — Categorical variablescategorical array | cell array of strings | character array | logical vector | numeric vector

A categorical array, cell array of strings, character array, logical vector, or a numeric vector with the same number of rows as X. Each row of Y represents the classification of the corresponding row of X.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'lossfun '

Function handle or string representing a loss function. The built-in loss functions are:

You can write your own loss function using the syntax described in Loss Functions.

Default: 'mincost'

'weights'

Numeric vector of length N, where N is the number of rows of X. weights are nonnegative. loss normalizes the weights so that observation weights in each class sum to the prior probability of that class. When you supply weights, loss computes weighted classification loss.

Default: ones(N,1)

Output Arguments

L

Classification error, a scalar. The meaning of the error depends on the values in weights and lossfun. See Classification Error.

Definitions

Classification Error

The default classification error is the fraction of data X that mdl misclassifies, where Y represents the true classifications.

The weighted classification error is the sum of weight i times the Boolean value that is 1 when mdl misclassifies the ith row of X, divided by the sum of the weights.

Loss Functions

The built-in loss functions are:

  • 'binodeviance' — For binary classification, assume the classes yn are -1 and 1. With weight vector w normalized to have sum 1, and predictions of row n of data X as f(Xn), the binomial deviance is

  • 'exponential' — With the same definitions as for 'binodeviance', the exponential loss is

  • 'classiferror' — Predict the label with the largest posterior probability. The loss is then the fraction of misclassified observations.

  • 'mincost' — Predict the label with the smallest expected misclassification cost, with expectation taken over the posterior probability, and cost as given by the Cost property of the classifier (a matrix). The loss is then the true misclassification cost averaged over the observations.

To write your own loss function, create a function file in this form:

function loss = lossfun(C,S,W,COST)
  • N is the number of rows of X.

  • K is the number of classes in the classifier, represented in the ClassNames property.

  • C is an N-by-K logical matrix, with one true per row for the true class. The index for each class is its position in the ClassNames property.

  • S is an N-by-K numeric matrix. S is a matrix of posterior probabilities for classes with one row per observation, similar to the posterior output from predict.

  • W is a numeric vector with N elements, the observation weights. If you pass W, the elements are normalized to sum to the prior probabilities in the respective classes.

  • COST is a K-by-K numeric matrix of misclassification costs. For example, you can use COST = ones(K) - eye(K), which means a cost of 0 for correct classification, and 1 for misclassification.

  • The output loss should be a scalar.

Pass the function handle @lossfun as the value of the lossfun name-value pair.

True Misclassification Cost

There are two costs associated with KNN classification: the true misclassification cost per class, and the expected misclassification cost per observation.

You can set the true misclassification cost per class in the Cost name-value pair when you run fitcknn. Cost(i,j) is the cost of classifying an observation into class j if its true class is i. By default, Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j. In other words, the cost is 0 for correct classification, and 1 for incorrect classification.

Expected Cost

There are two costs associated with KNN classification: the true misclassification cost per class, and the expected misclassification cost per observation. The third output of predict is the expected misclassification cost per observation.

Suppose you have Nobs observations that you want to classify with a trained classifier mdl. Suppose you have K classes. You place the observations into a matrix Xnew with one observation per row. The command

[label,score,cost] = predict(mdl,Xnew)

returns, among other outputs, a cost matrix of size Nobs-by-K. Each row of the cost matrix contains the expected (average) cost of classifying the observation into each of the K classes. cost(n,k) is

where

Examples

expand all

Loss Calculation

Construct a k-nearest neighbor classifier for the Fisher iris data, where k = 5.

Load the data.

load fisheriris

Construct a classifier for 5-nearest neighbors.

mdl = fitcknn(meas,species,'NumNeighbors',5);

Examine the loss of the classifier for a mean observation classified 'versicolor'.

X = mean(meas);
Y = {'versicolor'};
L = loss(mdl,X,Y)
L =

     0

The classifier has no doubt that 'versicolor' is the correct classification (all five nearest neighbors classify as 'versicolor').

See Also

| | |

More About

Was this topic helpful?