Code covered by the BSD License  

Highlights from
Efficient K-Nearest Neighbor Search using JIT

4.33333

4.3 | 6 ratings Rate this file 95 Downloads (last 30 days) File Size: 2.48 KB File ID: #19345
image thumbnail

Efficient K-Nearest Neighbor Search using JIT

by

 

27 Mar 2008 (Updated )

A small but efficient tool to perform KNN search

| Watch this File

File Information
Description

This is a small but efficient tool to perform K-nearest neighbor search, which has wide Science and Engineering applications, such as pattern recognition, data mining and signal processing.

The code was initially implemented through vectorization. After discussions with John D'Errico, I realized that my algorithm will suffer numerical accurancy problem for data with large values. Then, after trying several approaches, I found simple loops with JIT acceleration is the most efficient solution. Now, the performance of the code is comparable with kd-tree even the latter is coded in a mex file.

The code is very simple, hence is also suitable for beginner to learn knn search.

Acknowledgements

Distance.M, Kd Tree Nearest Neighbor And Range Search, Nearestpoint (Sep 2012), Nearestneighbour.M, and Ipdm: Inter Point Distance Matrix inspired this file.

This file inspired Compute Surface Variation and Near2.

MATLAB release MATLAB 7.5 (R2007b)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (22)
29 Aug 2012 Orly amsalem  
25 Oct 2011 ahmed

i have matrix (M,N), with M=8 vectors and each vectors contain N=100 samples.

i need your help, if i can apply KNN in the case of matrix, contain M vectors? thanks in advance

21 Oct 2011 Venkat R

Dear Cao,
Your code is amazingly fast.
Can you help change the metric from euclidean distance to chi-square.

with regards,
Ramana

02 Mar 2011 Daniel Lagos

Hi Yi Cao,
Your code is amazing, congratulations!.
I´m simulating dipolar forces in a crystal structure. How Could I implement periodic boundary conditions (PBC) in your code?.

13 Jul 2010 joshua

this seems great. do you have a wrapper script that implements the knn classification algorithm using the knnsearch file?

22 Apr 2010 Yi Cao

Hi Bree,

Thanks for pointing out this to me. This is because for the former, the function did not check whether the first and second input arguments are equal, hence they are treated as different. This causes some difference in results from the later. I have updated the code to do the check. You should be able to download the new version within a day or so.

Yi

22 Apr 2010 Bree

Hi Yi Cao,
simple question,
why when I used
idx = knnsearch(A,A,2)
will return differently with
idx = knnsearch(A,[],2)?

hope you don't mind to answer it.

best,
-Bree

08 Apr 2010 aarif

Easy to follow but much slower compared to several mex kdtree implementations on the FEX for large data.

30 Mar 2010 Yi Cao

You have to copy and past the example into the command window for test. Or, highlight lines of the example then press F9.

HTH
Yi

29 Mar 2010 Larry

Hi Yi,

My version supports block comments. From my understanding of the code, I should be able to take one example the examples out of block comments to test run the code, am I correct? I have tried using block comments on all the examples, but I get this error when I do this:

??? Error using ==> knnsearch>parseinputs at 140
Not enough input arguments.

Error in ==> knnsearch at 97
[Q,R,K,fident] = parseinputs(varargin{:});

Im just not understanding, thanks for your help.

25 Mar 2010 Yi Cao

Larry,

Line 32 is within a block comment for an example. The error means your Matlab version is too old to support block comments. To correct this, you have to change the block comment, i.e. lines between "%{" and "%}" to comment lines by putting "%" at beginning of these line.

Yi

24 Mar 2010 Larry

I have been trying to run this fille using Matlab 7.4.0 Student Version 2007a. However I keep getting the following error message:

??? Maximum recursion limit of 500 reached. Use set(0,'RecursionLimit',N)
to change the limit. Be aware that exceeding your available stack space can
crash MATLAB and/or your computer.

Error in ==> knnsearch at 32
idx=knnsearch(Q,R);

Can someone help me?

01 Mar 2010 lucy

It is fast! Thank you very much!

04 Dec 2009 Chuan-Yung

Oh bingo! I forgot that I'm still using Matlab 7.0 (which is too old, I guess).

Thank you so much for all the patient and kind replies!

Sincerely,
CY

04 Dec 2009 Yi Cao

Strange. This could be the Matlab versio. On my machine with R2009B, I got almost the same time for both L1 and L2 using your example. I have tested this on two differet machines, one with 2 cores another with 4.

Yi

04 Dec 2009 Chuan-Yung

Here is a test close to my dataset:

Q = rand(1,4000); R = rand(4000,4000);
Original L2 knnsearch(Q,R,10); = 0.09375 seconds
"abs" L1 knnsearch(Q,R,10); = 0.65625 seconds
".*sign" L1 knnsearch(Q,R,10); = 0.21875 seconds

They're all very fast, but since I have to repeat searching sequentially incoming Q for thousands of times, the speed differences become pretty obvious.

CY

03 Dec 2009 Yi Cao

It could be because your data. I tested abs with random data (example 5), but could find any significant difference in cpu time comparing with L2 distance.

Yi

03 Dec 2009 Chuan-Yung

Dear Dr. Cao,

Yes, I've also tried d= d+abs((R(:,t)-Q(k,t)), which was sadly even slower than the ".*sign" style. Maybe JIT doesn't really support "abs" function in this case?

Thank you in advance for any further suggestion.

CY

03 Dec 2009 Yi Cao

Chuan-Yung,

You may try to use abs instead of sign function.

Good luck.
Yi

03 Dec 2009 Chuan-Yung

This code runs really fast, especially for my over 4000d dataset! However, if I changed the original L2 distance into L1 distance, e.g. rewriting
d= d+((R(:,t)-Q(k,t)).^2; into
d=d+((R(:,t)-Q(k,t)).*sign(R(:,t)-Q(k,t)));
the overall speed became much slower. Any suggestion for a better L1 distance implementation? Thanks a lot!

05 Apr 2009 V. Poor  
20 Dec 2008 Luigi Giaccari

I think someone, maybe not the author, is faking the download number.
My opinion is that this is a very usefull routine, very well commented and totally m-coded.
The performances of course can not be the ones declared in the presentation. The algorithm it is just a brute search can only be competitive against kd-tree data structure for a small number of search and reference points.
For KNN search is used the sort function to find K-neighbours, and this is a very brute algorithm. I Know this the fastest m-code way to find KNN, but it is clear
that kd-trees will overcome this algorithm for very small models.

Updates
22 Apr 2010

a bug fixed

Contact us