Code covered by the BSD License  

Highlights from
Efficient K-Nearest Neighbor Search using JIT

4.42857
4.4 | 7 ratings Rate this file 78 Downloads (last 30 days) File Size: 2.48 KB File ID: #19345
image thumbnail

Efficient K-Nearest Neighbor Search using JIT

by

Yi Cao (view profile)

 

27 Mar 2008 (Updated )

A small but efficient tool to perform KNN search

| Watch this File

File Information
Description

This is a small but efficient tool to perform K-nearest neighbor search, which has wide Science and Engineering applications, such as pattern recognition, data mining and signal processing.

The code was initially implemented through vectorization. After discussions with John D'Errico, I realized that my algorithm will suffer numerical accurancy problem for data with large values. Then, after trying several approaches, I found simple loops with JIT acceleration is the most efficient solution. Now, the performance of the code is comparable with kd-tree even the latter is coded in a mex file.

The code is very simple, hence is also suitable for beginner to learn knn search.

Acknowledgements

Distance.M, Kd Tree Nearest Neighbor And Range Search, Nearestpoint (Sep 2012), Nearestneighbour.M, and Ipdm: Inter Point Distance Matrix inspired this file.

This file inspired Compute Surface Variation and Near2.

MATLAB release MATLAB 7.5 (R2007b)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (23)
11 Mar 2015 Greg

Greg (view profile)

Insanely fast. Thank you!

29 Aug 2012 Orly amsalem  
25 Oct 2011 ahmed

ahmed (view profile)

i have matrix (M,N), with M=8 vectors and each vectors contain N=100 samples.

i need your help, if i can apply KNN in the case of matrix, contain M vectors? thanks in advance

Comment only
21 Oct 2011 Venkat R

Dear Cao,
Your code is amazingly fast.
Can you help change the metric from euclidean distance to chi-square.

with regards,
Ramana

Comment only
02 Mar 2011 Daniel Lagos

Hi Yi Cao,
Your code is amazing, congratulations!.
I´m simulating dipolar forces in a crystal structure. How Could I implement periodic boundary conditions (PBC) in your code?.

Comment only
13 Jul 2010 joshua

joshua (view profile)

this seems great. do you have a wrapper script that implements the knn classification algorithm using the knnsearch file?

Comment only
22 Apr 2010 Yi Cao

Yi Cao (view profile)

Hi Bree,

Thanks for pointing out this to me. This is because for the former, the function did not check whether the first and second input arguments are equal, hence they are treated as different. This causes some difference in results from the later. I have updated the code to do the check. You should be able to download the new version within a day or so.

Yi

Comment only
22 Apr 2010 Bree

Bree (view profile)

Hi Yi Cao,
simple question,
why when I used
idx = knnsearch(A,A,2)
will return differently with
idx = knnsearch(A,[],2)?

hope you don't mind to answer it.

best,
-Bree

Comment only
08 Apr 2010 aarif

aarif (view profile)

Easy to follow but much slower compared to several mex kdtree implementations on the FEX for large data.

30 Mar 2010 Yi Cao

Yi Cao (view profile)

You have to copy and past the example into the command window for test. Or, highlight lines of the example then press F9.

HTH
Yi

Comment only
29 Mar 2010 Larry

Larry (view profile)

Hi Yi,

My version supports block comments. From my understanding of the code, I should be able to take one example the examples out of block comments to test run the code, am I correct? I have tried using block comments on all the examples, but I get this error when I do this:

??? Error using ==> knnsearch>parseinputs at 140
Not enough input arguments.

Error in ==> knnsearch at 97
[Q,R,K,fident] = parseinputs(varargin{:});

Im just not understanding, thanks for your help.

Comment only
25 Mar 2010 Yi Cao

Yi Cao (view profile)

Larry,

Line 32 is within a block comment for an example. The error means your Matlab version is too old to support block comments. To correct this, you have to change the block comment, i.e. lines between "%{" and "%}" to comment lines by putting "%" at beginning of these line.

Yi

Comment only
24 Mar 2010 Larry

Larry (view profile)

I have been trying to run this fille using Matlab 7.4.0 Student Version 2007a. However I keep getting the following error message:

??? Maximum recursion limit of 500 reached. Use set(0,'RecursionLimit',N)
to change the limit. Be aware that exceeding your available stack space can
crash MATLAB and/or your computer.

Error in ==> knnsearch at 32
idx=knnsearch(Q,R);

Can someone help me?

Comment only
01 Mar 2010 lucy

lucy (view profile)

It is fast! Thank you very much!

04 Dec 2009 Chuan-Yung

Oh bingo! I forgot that I'm still using Matlab 7.0 (which is too old, I guess).

Thank you so much for all the patient and kind replies!

Sincerely,
CY

Comment only
04 Dec 2009 Yi Cao

Yi Cao (view profile)

Strange. This could be the Matlab versio. On my machine with R2009B, I got almost the same time for both L1 and L2 using your example. I have tested this on two differet machines, one with 2 cores another with 4.

Yi

Comment only
04 Dec 2009 Chuan-Yung

Here is a test close to my dataset:

Q = rand(1,4000); R = rand(4000,4000);
Original L2 knnsearch(Q,R,10); = 0.09375 seconds
"abs" L1 knnsearch(Q,R,10); = 0.65625 seconds
".*sign" L1 knnsearch(Q,R,10); = 0.21875 seconds

They're all very fast, but since I have to repeat searching sequentially incoming Q for thousands of times, the speed differences become pretty obvious.

CY

Comment only
03 Dec 2009 Yi Cao

Yi Cao (view profile)

It could be because your data. I tested abs with random data (example 5), but could find any significant difference in cpu time comparing with L2 distance.

Yi

Comment only
03 Dec 2009 Chuan-Yung

Dear Dr. Cao,

Yes, I've also tried d= d+abs((R(:,t)-Q(k,t)), which was sadly even slower than the ".*sign" style. Maybe JIT doesn't really support "abs" function in this case?

Thank you in advance for any further suggestion.

CY

Comment only
03 Dec 2009 Yi Cao

Yi Cao (view profile)

Chuan-Yung,

You may try to use abs instead of sign function.

Good luck.
Yi

Comment only
03 Dec 2009 Chuan-Yung

This code runs really fast, especially for my over 4000d dataset! However, if I changed the original L2 distance into L1 distance, e.g. rewriting
d= d+((R(:,t)-Q(k,t)).^2; into
d=d+((R(:,t)-Q(k,t)).*sign(R(:,t)-Q(k,t)));
the overall speed became much slower. Any suggestion for a better L1 distance implementation? Thanks a lot!

05 Apr 2009 V. Poor  
20 Dec 2008 Luigi Giaccari

I think someone, maybe not the author, is faking the download number.
My opinion is that this is a very usefull routine, very well commented and totally m-coded.
The performances of course can not be the ones declared in the presentation. The algorithm it is just a brute search can only be competitive against kd-tree data structure for a small number of search and reference points.
For KNN search is used the sort function to find K-neighbours, and this is a very brute algorithm. I Know this the fastest m-code way to find KNN, but it is clear
that kd-trees will overcome this algorithm for very small models.

Updates
22 Apr 2010

a bug fixed

Contact us