knnsearch

Class: KDTreeSearcher

Find k-nearest neighbors using KDTreeSearcher object

Syntax

IDX = knnsearch(NS,Y)
[IDX,D] = knnsearch(NS,Y)
[IDX,D] = knnsearch(NS,Y,'Name',Value)

Description

IDX = knnsearch(NS,Y) finds the nearest neighbor (closest point) in NS.X for each point in Y. Rows of Y correspond to observations and columns correspond to features. Y must have the same number of columns as NS.X. IDX is a column vector with ny rows, where ny is the number of rows in Y. Each row in IDX contains the index of observation in NS.X which has the smallest distance to the corresponding observation in Y.

[IDX,D] = knnsearch(NS,Y) returns a column vector D containing the distances between each observation in Y and the corresponding closest observation in NS.X. That is, D(i) is the distance between NS.X(IDX(i),:) and Y(i,:).

[IDX,D] = knnsearch(NS,Y,'Name',Value) accepts one or more comma-separated name-value pair arguments. Specify Name inside single quotes.

Input Arguments

NS

KDTreeSearcher object, constructed using KDTreeSearcher or createns.

Y

my-by-n numeric matrix, where each row represents one n-dimensional point. The number of columns n must equal the number of columns in NS.X.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'K'

A positive integer, k, specifying the number of nearest neighbors in NS.X for each point in Y. Default is 1. IDX and D are ny-by-k matrices. D sorts the distances in each row in ascending order. Each row in IDX contains the indices of the k closest neighbors in NS.X corresponding to the k smallest distances in D.

'Distance'

Select one of the following distance algorithms.

  • 'euclidean' — Euclidean distance (default).

  • 'cityblock' — City block distance.

  • 'chebychev' — Chebychev distance (maximum coordinate difference).

  • 'minkowski' — Minkowski distance.

Default is NS.Distance. For details on these distance metrics, see Distance Metrics.

'IncludeTies'

A logical value indicating whether knnsearch includes all the neighbors whose distance values are equal to the Kth smallest distance. If IncludeTies is true, knnsearch includes all these neighbors. In this case, IDX and D are ny-by-1 cell arrays. Each row in IDX and D contains a vector with at least K numeric numbers. D sorts the distances in each vector in ascending order. Each row in IDX contains the indices of the closest neighbors corresponding to these smallest distances in D.

Default: false

'P'

A positive scalar, p, indicating the exponent of the Minkowski distance. This parameter is only valid when the Distance is 'minkowski'. Default is NS.DistParameter if NS.Distance is 'minkowski' and 2 otherwise.

Examples

expand all

Search for k-Nearest Neighbors

Find k-nearest neighbors of query data given training data using knnserach on a KDTreeSearcher model.

Load Fisher's iris data. Focus on the petal dimensions.

load fisheriris
X = meas(:,3:4); % Predictors
Y = species;     % Response

Train a k-nearest neighbors searcher using the predictors. Specify to use the Minkowski distance with exponent 5.

KDTreeMdl = KDTreeSearcher(X,'Distance','minkowski','P',5)
KDTreeMdl = 

  KDTreeSearcher with properties:

       BucketSize: 50
         Distance: 'minkowski'
    DistParameter: 5
                X: [150x2 double]

KDTreeMdl is a KDTreeSearcher model. You can access its properties using dot notation.

Find the 10 nearest neighbors from X to a query point (newpoint), using first Minkowski then Chebychev distance metrics. The query point must have the same column dimension as the data used to train the model.

newpoint = [5 1.45];
[IdxMk,DMk] = knnsearch(KDTreeMdl,newpoint,'k',10);
[IdxCb,DCb] = knnsearch(KDTreeMdl,newpoint,'k',10,...
   'Distance','chebychev');

IdxMk and IdxCb are 1-by-10 matrices containing the row indices of X corresponding to the nearest neighbors to newpoint using Minkowski and then Chebychev distances, respectively. Element (1,1) is the nearest, element (1,2) is the next nearest, and so on.

Plot the training data.

figure;
gscatter(X(:,1),X(:,2),Y);
title('Fisher''s Iris Data -- Nearest Neighbors');
xlabel('Petal length (cm)');
ylabel('Petal width (cm)');
hold on

Zoom in on the points of interest.

h = gca; % Get current axis handle.
h.XLim = [4.5 5.5];
h.YLim = [1 2];
axis square;

Plot a cross for the query point.

plot(newpoint(1),newpoint(2),'kx','MarkerSize',10,...
    'LineWidth',2);

Plot circles to identify the Minkowski nearest neighbors.

plot(X(IdxMk,1),X(IdxMk,2),'o','Color',[.5 .5 .5],'MarkerSize',10);

Plot pentagrams to identify the Chebychev nearest neighbors.

plot(X(IdxCb,1),X(IdxCb,2),'p','Color',[.5 .5 .5],'MarkerSize',10);
legend('setosa','versicolor','virginica','query point',...
   'minkowski','chebychev','Location','Best');
hold off;

Several observations are equal, which is why only eight nearest neighbors are identified in the plot.

Algorithms

For information on a specific search algorithm, see Distance Metrics.

References

[1] Friedman, J. H., Bentely, J., and Finkel, R. A. (1977). An Algorithm for Finding Best Matches in Logarithmic Expected Time, ACM Transactions on Mathematical Software 3, 209.

Was this topic helpful?