knnsearch

Class: ExhaustiveSearcher

Find k-nearest neighbors using ExhaustiveSearcher object

Syntax

IDX = knnsearch(NS,Y)
[IDX,D] = knnsearch(NS,Y)
[IDX,D] = knnsearch(NS,Y,'Name',Value)

Description

IDX = knnsearch(NS,Y) finds the nearest neighbor (closest point) in NS.X for each point in Y. Rows of Y correspond to observations and columns correspond to features. Y must have the same number of columns as NS.X. IDX is a column vector with ny rows, where ny is the number of rows in Y. Each row in IDX contains the index of observation in NS.X which has the smallest distance to the corresponding observation in Y.

[IDX,D] = knnsearch(NS,Y) returns a column vector D containing the distances between each observation in Y and the corresponding closest observation in NS.X. That is, D(i) is the distance between NS.X(IDX(i),:) and Y(i,:).

[IDX,D] = knnsearch(NS,Y,'Name',Value) accepts one or more comma-separated argument name/value pairs. Specify Name inside single quotes.

Input Arguments

Name-Value Pair Arguments

'K'

A positive integer, k, specifying the number of nearest neighbors in NS.X for each point in Y. IDX and D are ny-by-k matrices. D sorts the distances in each row in ascending order. Each row in IDX contains the indices of the k closest neighbors in NS.X corresponding to the k smallest distances in D.

Default: 1

'IncludeTies'

A logical value indicating whether knnsearch includes all the neighbors whose distance values are equal to the Kth smallest distance. If IncludeTies is true, knnsearch includes all these neighbors. In this case, IDX and D are ny-by-1 cell arrays. Each row in IDX and D contains a vector with at least K numeric numbers. D sorts the distances in each vector in ascending order. Each row in IDX contains the indices of the closest neighbors corresponding to these smallest distances in D.

Default: false

'Distance'

  • 'euclidean' — Euclidean distance (default).

  • 'seuclidean' — Standardized Euclidean distance. Each coordinate difference between X and each query point is scaled by dividing by a scale value S. The default value of S is NS.DistParameter if NS.Distance is 'seuclidean', otherwise the default is the standard deviation computed from X, S=nanstd(X). To specify another value for S, use the 'Scale' argument.

  • 'cityblock' — City block distance.

  • 'chebychev' — Chebychev distance (maximum coordinate difference).

  • 'minkowski' — Minkowski distance.

  • 'mahalanobis' — Mahalanobis distance, which is computed using a positive definite covariance matrix C. The default value of C is nancov(X). To change the value of C, use the Cov parameter.

  • 'cosine' — One minus the cosine of the included angle between observations (treated as vectors).

  • 'correlation' — One minus the sample linear correlation between observations (treated as sequences of values).

  • 'spearman' — One minus the sample Spearman's rank correlation between observations (treated as sequences of values).

  • 'hamming' — Hamming distance, which is the percentage of coordinates that differ.

  • 'jaccard' — One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ.

  • custom distance function — A distance function specified using @ (for example, @distfun). A distance function must be of the form function D2 = distfun(ZI, ZJ), taking as arguments a 1-by-n vector ZI containing a single row of from X or from the query points Y, an m2-by-n matrix ZJ containing multiple rows of X or Y, and returning an m2-by-1 vector of distances D2, whose jth element is the distance between the observations ZI and ZJ(j,:).

For more information on these distance metrics, see Distance Metrics.

Default: NS.Distance

'P'

A positive scalar, p, indicating the exponent of the Minkowski distance. This parameter is only valid if knnsearch uses the 'minkowski' distance metric.

Default: NS.DistParameter if NS.Distance is 'minkowski' and 2 otherwise.

'Cov'

A positive definite matrix indicating the covariance matrix when computing the Mahalanobis distance. This parameter is only valid when knnsearch uses the 'mahalanobis' distance metric.

Default: NS.DistParameter if NS.Distance is 'mahalanobis', or nancov(X) otherwise.

'Scale'

A vector S with the length equal to the number of columns in X. Each coordinate of X and each query point is scaled by the corresponding element of S when computing the standardized Euclidean distance. This parameter is only valid when Distance is 'seuclidean'.

Default: nanstd(X)

Examples

expand all

Search for k-Nearest Neighbors

Find k-nearest neighbors of query data given training data using knnserach on an ExhaustiveSearcher model.

Load Fisher's iris data. Focus on the petal dimensions.

load fisheriris
X = meas(:,3:4); % Predictors
Y = species;     % Response

Train a k-nearest neighbors searcher using the predictors. Specify to use the cosine distance.

ESMdl = ExhaustiveSearcher(X,'Distance','cosine')
ESMdl = 

  ExhaustiveSearcher with properties:

         Distance: 'cosine'
    DistParameter: []
                X: [150x2 double]

ESMdl is an ExhaustiveSearcher model. You can access its properties using dot notation.

Find the 10 nearest neighbors from X to a query point (newpoint), first using the cosine, and then Mahalanobis distance metrics. The query point must have the same column dimension as the data used to train the model.

newpoint = [5 1.45];
[IdxCs,DCs] = knnsearch(ESMdl,newpoint,'k',10);
[IdxMs,DMs] = knnsearch(ESMdl,newpoint,'k',10,...
   'Distance','mahalanobis');

IdxCs and IdxMs are 1-by-10 matrices containing the row indices of X corresponding to the nearest neighbors to newpoint using cosine and Mahalanobis distances, respectively. Element (1,1) is the nearest, element (1,2) is the next nearest, and so on.

Plot the training data.

figure;
gscatter(X(:,1),X(:,2),Y);
title('Fisher''s Iris Data -- Nearest Neighbors');
xlabel('Petal length (cm)');
ylabel('Petal width (cm)');
hold on

Plot a cross for the query point.

plot(newpoint(1),newpoint(2),'kx','MarkerSize',10,...
    'LineWidth',2);

Plot circles to identify the cosine nearest neighbors.

plot(X(IdxCs,1),X(IdxCs,2),'o','Color',[.5 .5 .5],'MarkerSize',10);

Plot pentagrams to identify the Mahalanobis nearest neighbors.

plot(X(IdxMs,1),X(IdxMs,2),'p','Color',[.5 .5 .5],'MarkerSize',10);
legend('setosa','versicolor','virginica','query point',...
   'cosine','mahalanobis','Location','Best');
hold off;

Algorithms

For information on a specific search algorithm, see Distance Metrics.

Was this topic helpful?