File Exchange

image thumbnail

Efficient K-Nearest Neighbor Search using JIT

version 1.3 (2.48 KB) by

A small but efficient tool to perform KNN search

4.11111
10 Ratings

47 Downloads

Updated

View License

Editor's Note: Popular File 2008

This is a small but efficient tool to perform K-nearest neighbor search, which has wide Science and Engineering applications, such as pattern recognition, data mining and signal processing.

The code was initially implemented through vectorization. After discussions with John D'Errico, I realized that my algorithm will suffer numerical accurancy problem for data with large values. Then, after trying several approaches, I found simple loops with JIT acceleration is the most efficient solution. Now, the performance of the code is comparable with kd-tree even the latter is coded in a mex file.

The code is very simple, hence is also suitable for beginner to learn knn search.

Comments and Ratings (26)

I tried but did not find any improvements over matlab's original knnsearch (R2016b, Windows 10) :(

Rafael

Rafael (view profile)

Alex Hoang

Thank you!

Greg

Greg (view profile)

Insanely fast. Thank you!

Orly amsalem

ahmed

ahmed (view profile)

i have matrix (M,N), with M=8 vectors and each vectors contain N=100 samples.

i need your help, if i can apply KNN in the case of matrix, contain M vectors? thanks in advance
 
 

Venkat R

Dear Cao,
Your code is amazingly fast.
Can you help change the metric from euclidean distance to chi-square.

with regards,
Ramana

Daniel Lagos

Hi Yi Cao,
Your code is amazing, congratulations!.
I´m simulating dipolar forces in a crystal structure. How Could I implement periodic boundary conditions (PBC) in your code?.

joshua

joshua (view profile)

this seems great. do you have a wrapper script that implements the knn classification algorithm using the knnsearch file?

Yi Cao

Yi Cao (view profile)

Hi Bree,

Thanks for pointing out this to me. This is because for the former, the function did not check whether the first and second input arguments are equal, hence they are treated as different. This causes some difference in results from the later. I have updated the code to do the check. You should be able to download the new version within a day or so.

Yi

Bree

Bree (view profile)

Hi Yi Cao,
simple question,
why when I used
idx = knnsearch(A,A,2)
will return differently with
idx = knnsearch(A,[],2)?

hope you don't mind to answer it.

best,
-Bree

aarif

aarif (view profile)

Easy to follow but much slower compared to several mex kdtree implementations on the FEX for large data.

Yi Cao

Yi Cao (view profile)

You have to copy and past the example into the command window for test. Or, highlight lines of the example then press F9.

HTH
Yi

Larry

Larry (view profile)

Hi Yi,

My version supports block comments. From my understanding of the code, I should be able to take one example the examples out of block comments to test run the code, am I correct? I have tried using block comments on all the examples, but I get this error when I do this:

??? Error using ==> knnsearch>parseinputs at 140
Not enough input arguments.

Error in ==> knnsearch at 97
[Q,R,K,fident] = parseinputs(varargin{:});

Im just not understanding, thanks for your help.

Yi Cao

Yi Cao (view profile)

Larry,

Line 32 is within a block comment for an example. The error means your Matlab version is too old to support block comments. To correct this, you have to change the block comment, i.e. lines between "%{" and "%}" to comment lines by putting "%" at beginning of these line.

Yi

Larry

Larry (view profile)

I have been trying to run this fille using Matlab 7.4.0 Student Version 2007a. However I keep getting the following error message:

 ??? Maximum recursion limit of 500 reached. Use set(0,'RecursionLimit',N)
to change the limit. Be aware that exceeding your available stack space can
crash MATLAB and/or your computer.

Error in ==> knnsearch at 32
idx=knnsearch(Q,R);

Can someone help me?

lucy

lucy (view profile)

It is fast! Thank you very much!

Chuan-Yung

Oh bingo! I forgot that I'm still using Matlab 7.0 (which is too old, I guess).

Thank you so much for all the patient and kind replies!

Sincerely,
CY

Yi Cao

Yi Cao (view profile)

Strange. This could be the Matlab versio. On my machine with R2009B, I got almost the same time for both L1 and L2 using your example. I have tested this on two differet machines, one with 2 cores another with 4.

Yi

Chuan-Yung

Here is a test close to my dataset:

Q = rand(1,4000); R = rand(4000,4000);
Original L2 knnsearch(Q,R,10); = 0.09375 seconds
"abs" L1 knnsearch(Q,R,10); = 0.65625 seconds
".*sign" L1 knnsearch(Q,R,10); = 0.21875 seconds

They're all very fast, but since I have to repeat searching sequentially incoming Q for thousands of times, the speed differences become pretty obvious.

CY

Yi Cao

Yi Cao (view profile)

It could be because your data. I tested abs with random data (example 5), but could find any significant difference in cpu time comparing with L2 distance.

Yi

Chuan-Yung

Dear Dr. Cao,

Yes, I've also tried d= d+abs((R(:,t)-Q(k,t)), which was sadly even slower than the ".*sign" style. Maybe JIT doesn't really support "abs" function in this case?

Thank you in advance for any further suggestion.

CY

Yi Cao

Yi Cao (view profile)

Chuan-Yung,

You may try to use abs instead of sign function.

Good luck.
Yi

Chuan-Yung

This code runs really fast, especially for my over 4000d dataset! However, if I changed the original L2 distance into L1 distance, e.g. rewriting
d= d+((R(:,t)-Q(k,t)).^2; into
d=d+((R(:,t)-Q(k,t)).*sign(R(:,t)-Q(k,t)));
the overall speed became much slower. Any suggestion for a better L1 distance implementation? Thanks a lot!

V. Poor

I think someone, maybe not the author, is faking the download number.
My opinion is that this is a very usefull routine, very well commented and totally m-coded.
The performances of course can not be the ones declared in the presentation. The algorithm it is just a brute search can only be competitive against kd-tree data structure for a small number of search and reference points.
For KNN search is used the sort function to find K-neighbours, and this is a very brute algorithm. I Know this the fastest m-code way to find KNN, but it is clear
that kd-trees will overcome this algorithm for very small models.

Updates

1.3

a bug fixed

MATLAB Release
MATLAB 7.5 (R2007b)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video