Documentation Center

  • Trial Software
  • Product Updates

ClassificationKNN class

k-nearest neighbor classification

Description

A nearest-neighbor classification object, where both distance metric ("nearest") and number of neighbors can be altered. The object classifies new observations using the predict method. The object contains the data used for training, so can compute resubstitution predictions.

Construction

mdl = fitcknn(x,y) creates a k-nearest neighbor classification model. For details, see fitcknn.

mdl = fitcknn(x,y,Name,Value) creates a classifier with additional options specified by one or more Name,Value pair arguments. For details, see fitcknn.

Input Arguments

expand all

x — Predictor valuesmatrix of scalar values

Predictor values, specified as a matrix of scalar values. Each column of x represents one variable, and each row represents one observation.

Data Types: single | double

y — Classification valuesnumeric vector | categorical vector | logical vector | character array | cell array of strings

Classification values, specified as a numeric vector, categorical vector, logical vector, character array, or cell array of strings, with the same number of rows as x. Each row of y represents the classification of the corresponding row of x.

Data Types: single | double | cell | logical | char

Properties

BreakTies

String specifying the method predict uses to break ties if multiple classes have the same smallest cost. By default, ties occur when multiple classes have the same number of nearest points among the K nearest neighbors.

  • 'nearest' — Use the class with the nearest neighbor among tied groups.

  • 'random' — Use a random tiebreaker among tied groups.

  • 'smallest' — Use the smallest index among tied groups.

'BreakTies' applies when 'IncludeTies' is false.

Change BreakTies using dot notation: mdl.BreakTies = newBreakTies.

CategoricalPredictors

Specification of which predictors are categorical.

  • 'all' — All predictors are categorical.

  • [] — No predictors are categorical.

ClassNames

List of elements in the training data Y with duplicates removed. ClassNames can be a numeric vector, vector of categorical variables, logical vector, character array, or cell array of strings. ClassNames has the same data type as the data in the argument Y.

Change ClassNames using dot notation: mdl.ClassNames = newClassNames

Cost

Square matrix, where Cost(i,j) is the cost of classifying a point into class j if its true class is i. Cost is K-by-K, where K is the number of classes.

Change a Cost matrix using dot notation: mdl.Cost = costMatrix.

Distance

String or function handle specifying the distance metric. The allowable strings depend on the NSMethod parameter, which you set in fitcknn, and which exists as a field in ModelParameters.

NSMethodDistance Metric Names
exhaustiveAny distance metric of ExhaustiveSearcher
kdtree'cityblock', 'chebychev', 'euclidean', or 'minkowski'

For definitions, see Distance Metrics.

The distance metrics of ExhaustiveSearcher:

ValueDescription
'cityblock'City block distance.
'chebychev'Chebychev distance (maximum coordinate difference).
'correlation'One minus the sample linear correlation between observations (treated as sequences of values).
'cosine'One minus the cosine of the included angle between observations (treated as vectors).
'euclidean'Euclidean distance.
'hamming'Hamming distance, percentage of coordinates that differ.
'jaccard'One minus the Jaccard coefficient, the percentage of nonzero coordinates that differ.
'mahalanobis'Mahalanobis distance, computed using a positive definite covariance matrix C. The default value of C is the sample covariance matrix of X, as computed by nancov(X). To specify a different value for C, use the 'Cov' name-value pair.
'minkowski'Minkowski distance. The default exponent is 2. To specify a different exponent, use the 'P' name-value pair.
'seuclidean'Standardized Euclidean distance. Each coordinate difference between X and a query point is scaled, meaning divided by a scale value S. The default value of S is the standard deviation computed from X, S = nanstd(X). To specify another value for S, use the Scale name-value pair.
'spearman'One minus the sample Spearman's rank correlation between observations (treated as sequences of values).
@distfunDistance function handle. distfun has the form
function D2 = DISTFUN(ZI,ZJ)
% calculation of  distance
...
where
  • ZI is a 1-by-N vector containing one row of X or Y.

  • ZJ is an M2-by-N matrix containing multiple rows of X or Y.

  • D2 is an M2-by-1 vector of distances, and D2(k) is the distance between observations ZI and ZJ(J,:).

Change Distance using dot notation: mdl.Distance = newDistance.

If NSMethod is kdtree, you can use dot notation to change Distance only among the types 'cityblock', 'chebychev', 'euclidean', or 'minkowski'.

DistanceWeight

String or function handle specifying the distance weighting function.

DistanceWeightMeaning
'equal'No weighting
'inverse'Weight is 1/distance
'inversesquared'Weight is 1/distance2
@fcnfcn is a function that accepts a matrix of nonnegative distances, and returns a matrix the same size containing nonnegative distance weights. For example, 'inversesquared' is equivalent to @(d)d.^(-2).

Change DistanceWeight using dot notation: mdl.DistanceWeight = newDistanceWeight.

DistParameter

Additional parameter for the distance metric.

Distance MetricParameter
'mahalanobis'Positive definite covariance matrix C.
'minkowski'Minkowski distance exponent, a positive scalar.
'seuclidean'Vector of positive scale values with length equal to the number of columns of X.

For values of the distance metric other than those in the table, DistParameter must be []. Change DistParameter using dot notation: mdl.DistParameter = newDistParameter.

IncludeTies

Logical value indicating whether predict includes all the neighbors whose distance values are equal to the Kth smallest distance. If IncludeTies is true, predict includes all these neighbors. Otherwise, predict uses exactly K neighbors (see 'BreakTies').

Change IncludeTies using dot notation: mdl.IncludeTies = newIncludeTies.

ModelParameters

Parameters used in training mdl.

NumObservations

Number of observations used in training mdl. This can be less than the number of rows in the training data, because data rows containing NaN values are not part of the fit.

NumNeighbors

Positive integer specifying the number of nearest neighbors in X to find for classifying each point when predicting. Change NumNeighbors using dot notation: mdl.NumNeighbors = newNumNeighbors.

PredictorNames

Cell array of names for the predictor variables, in the order in which they appear in the training data X. Change PredictorNames using dot notation: mdl.PredictorNames = newPredictorNames.


Prior

Prior probabilities for each class. Prior is a numeric vector whose entries relate to the corresponding ClassNames property.

Add or change a Prior vector using dot notation: obj.Prior = priorVector.

ResponseName

String describing the response variable Y. Change ResponseName using dot notation: mdl.ResponseName = newResponseName.

W

Numeric vector of nonnegative weights with the same number of rows as Y. Each entry in W specifies the relative importance of the corresponding observation in Y. Change W using dot notation: mdl.W = newW.

X

Numeric matrix of predictor values. Each column of X represents one predictor (variable), and each row represents one observation.

Y

A numeric vector, vector of categorical variables, logical vector, character array, or cell array of strings, with the same number of rows as X.

Y is of the same type as the passed-in Y data.

Methods

crossvalCross-validated k-nearest neighbor classifier
edgeEdge of k-nearest neighbor classifier
lossLoss of k-nearest neighbor classifier
marginMargin of k-nearest neighbor classifier
predictPredict k-nearest neighbor classification
resubEdgeEdge of k-nearest neighbor classifier by resubstitution
resubLossLoss of k-nearest neighbor classifier by resubstitution
resubMarginMargin of k-nearest neighbor classifier by resubstitution
resubPredictPredict resubstitution response of k-nearest neighbor classifier

Definitions

Prediction

ClassificationKNN predicts the classification of a point Xnew using a procedure equivalent to this:

  1. Find the NumNeighbors points in the training set X that are nearest to Xnew.

  2. Find the NumNeighbors response values Y to those nearest points.

  3. Assign the classification label Ynew that has smallest expected misclassification cost among the values in Y.

For details, see Posterior Probability and Expected Cost in the predict documentation.

Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects in the MATLAB® documentation.

Examples

expand all

Construct a KNN Classifier

Construct a k-nearest neighbor classifier for the Fisher iris data, where k = 5.

Load the data.

load fisheriris
X = meas;
Y = species;

Construct a classifier for 5-nearest neighbors.

mdl = fitcknn(X,Y,'NumNeighbors',5)
mdl = 

ClassificationKNN:
    PredictorNames: {'x1'  'x2'  'x3'  'x4'}
      ResponseName: 'Y'
        ClassNames: {'setosa'  'versicolor'  'virginica'}
    ScoreTransform: 'none'
     NumObservations: 150
          Distance: 'euclidean'
      NumNeighbors: 5

Alternatives

knnsearch finds the k-nearest neighbors of points. rangesearch finds all the points within a fixed distance. You can use these functions for classification, as shown in Classifying Query Data Using knnsearch. If you want to perform classification, ClassificationKNN can be more convenient, in that you can construct a classifier in one step and classify in other steps. Also, ClassificationKNN has cross-validation options.

See Also

|

More About

Was this topic helpful?