Documentation Center

  • Trial Software
  • Product Updates

knnsearch

Class: ExhaustiveSearcher

Find k-nearest neighbors using ExhaustiveSearcher object

Syntax

IDX = knnsearch(NS,Y)
[IDX,D] = knnsearch(NS,Y)
[IDX,D] = knnsearch(NS,Y,'Name',Value)

Description

IDX = knnsearch(NS,Y) finds the nearest neighbor (closest point) in NS.X for each point in Y. Rows of Y correspond to observations and columns correspond to features. Y must have the same number of columns as NS.X. IDX is a column vector with ny rows, where ny is the number of rows in Y. Each row in IDX contains the index of observation in NS.X which has the smallest distance to the corresponding observation in Y.

[IDX,D] = knnsearch(NS,Y) returns a column vector D containing the distances between each observation in Y and the corresponding closest observation in NS.X. That is, D(i) is the distance between NS.X(IDX(i),:) and Y(i,:).

[IDX,D] = knnsearch(NS,Y,'Name',Value) accepts one or more comma-separated argument name/value pairs. Specify Name inside single quotes.

Input Arguments

Name-Value Pair Arguments

'K'

A positive integer, k, specifying the number of nearest neighbors in NS.X for each point in Y. Default is 1. IDX and D are ny-by-k matrices. D sorts the distances in each row in ascending order. Each row in IDX contains the indices of the k closest neighbors in NS.X corresponding to the k smallest distances in D.

'IncludeTies'

A logical value indicating whether knnsearch includes all the neighbors whose distance values are equal to the Kth smallest distance. If IncludeTies is true, knnsearch includes all these neighbors. In this case, IDX and D are ny-by-1 cell arrays. Each row in IDX and D contains a vector with at least K numeric numbers. D sorts the distances in each vector in ascending order. Each row in IDX contains the indices of the closest neighbors corresponding to these smallest distances in D.

Default: false

'Distance'

  • 'euclidean' — Euclidean distance (default).

  • 'seuclidean' — Standardized Euclidean distance. Each coordinate difference between X and each query point is scaled by dividing by a scale value S. The default value of S is NS.DistParameter if NS.Distance is 'seuclidean', otherwise the default is the standard deviation computed from X, S=nanstd(X). To specify another value for S, use the 'Scale' argument.

  • 'cityblock' — City block distance.

  • 'chebychev' — Chebychev distance (maximum coordinate difference).

  • 'minkowski' — Minkowski distance.

  • 'mahalanobis' — Mahalanobis distance, which is computed using a positive definite covariance matrix C. The default value of C is nancov(X). To change the value of C, use the Cov parameter.

  • 'cosine' — One minus the cosine of the included angle between observations (treated as vectors).

  • 'correlation' — One minus the sample linear correlation between observations (treated as sequences of values).

  • 'spearman' — One minus the sample Spearman's rank correlation between observations (treated as sequences of values).

  • 'hamming' — Hamming distance, which is the percentage of coordinates that differ.

  • 'jaccard' — One minus the Jaccard coefficient, which is the percentage of nonzero coordinates that differ.

  • custom distance function — A distance function specified using @ (for example, @distfun). A distance function must be of the form function D2 = distfun(ZI, ZJ), taking as arguments a 1-by-n vector ZI containing a single row of from X or from the query points Y, an m2-by-n matrix ZJ containing multiple rows of X or Y, and returning an m2-by-1 vector of distances D2, whose jth element is the distance between the observations ZI and ZJ(j,:).

Default is NS.Distance. For more information on these distance metrics, see Distance Metrics.

'P'

A positive scalar, p, indicating the exponent of the Minkowski distance. This parameter is only valid if knnsearch uses the 'minkowski' distance metric. Default is NS.DistParameter if NS.Distance is 'minkowski' and 2 otherwise.

'Cov'

A positive definite matrix indicating the covariance matrix when computing the Mahalanobis distance. This parameter is only valid when knnsearch uses the 'mahalanobis' distance metric. Default is NS.DistParameter if NS.Distance is 'mahalanobis', or nancov(X) otherwise.

'Scale'

A vector S with the length equal to the number of columns in X. Each coordinate of X and each query point is scaled by the corresponding element of S when computing the standardized Euclidean distance. This parameter is only valid when Distance is 'seuclidean'. Default is nanstd(X).

Examples

Create an ExhaustiveSearcher object specifying 'cosine' as the distance metric. Perform a k-nearest neighbors search on the object using the mahalanobis metric and compare the results:

load fisheriris
x = meas(:,3:4);
exhaustiveobj = ExhaustiveSearcher(x,'Distance','cosine')

exhaustiveobj = 

  ExhaustiveSearcher

  Properties:
                X: [150x2 double]
         Distance: 'cosine'
    DistParameter: []

% Perform a knnsearch between x and a query point, using 
% first cosine then mahalanobis distance metrics:
newpoint = [5 1.45];
[n,d]=knnsearch(exhaustiveobj,newpoint,'k',10);
[nmah,dmah] = knnsearch(exhaustiveobj,newpoint,'k',10,...
   'distance','mahalanobis');

% Visualize the results of the two different nearest 
% neighbors searches:

% First plot the training data:
gscatter(x(:,1),x(:,2),species)
% Plot an X for the query point:
line(newpoint(1),newpoint(2),'marker','x','color','k',...
   'markersize',10,'linewidth',2,'linestyle','none')
% Use circles to denote the cosine nearest neighbors:
line(x(n,1),x(n,2),'color',[.5 .5 .5],'marker','o',...
   'linestyle','none','markersize',10)
% Use pentagrams to denote the mahalanobis nearest neighbors:
line(x(nmah,1),x(nmah,2),'color',[.5 .5 .5],'marker','p',...
   'linestyle','none','markersize',10)
legend('setosa','versicolor','virginica','query point',...
   'cosine','mahalanobis')
set(legend,'location','best')

Algorithms

For information on a specific search algorithm, see Distance Metrics.

See Also

| | | |

How To

Was this topic helpful?