Code covered by the BSD License  

Highlights from
Efficient K-Nearest Neighbor Search using JIT

4.2

4.2 | 5 ratings Rate this file 127 Downloads (last 30 days) File Size: 2.48 KB File ID: #19345
image thumbnail

Efficient K-Nearest Neighbor Search using JIT

by Yi Cao

 

27 Mar 2008 (Updated 22 Apr 2010)

A small but efficient tool to perform KNN search

| Watch this File

File Information
Description

This is a small but efficient tool to perform K-nearest neighbor search, which has wide Science and Engineering applications, such as pattern recognition, data mining and signal processing.

The code was initially implemented through vectorization. After discussions with John D'Errico, I realized that my algorithm will suffer numerical accurancy problem for data with large values. Then, after trying several approaches, I found simple loops with JIT acceleration is the most efficient solution. Now, the performance of the code is comparable with kd-tree even the latter is coded in a mex file.

The code is very simple, hence is also suitable for beginner to learn knn search.

Acknowledgements

The author wishes to acknowledge the following in the creation of this submission:
distance.m, KD Tree Nearest Neighbor and Range Search, NEARESTPOINT, nearestneighbour.m
This submission has inspired the following:
Compute surface variation

MATLAB release MATLAB 7.5 (R2007b)
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (21)
20 Dec 2008 Luigi Giaccari

I think someone, maybe not the author, is faking the download number.
My opinion is that this is a very usefull routine, very well commented and totally m-coded.
The performances of course can not be the ones declared in the presentation. The algorithm it is just a brute search can only be competitive against kd-tree data structure for a small number of search and reference points.
For KNN search is used the sort function to find K-neighbours, and this is a very brute algorithm. I Know this the fastest m-code way to find KNN, but it is clear
that kd-trees will overcome this algorithm for very small models.

05 Apr 2009 V. Poor  
03 Dec 2009 Chuan-Yung

This code runs really fast, especially for my over 4000d dataset! However, if I changed the original L2 distance into L1 distance, e.g. rewriting
d= d+((R(:,t)-Q(k,t)).^2; into
d=d+((R(:,t)-Q(k,t)).*sign(R(:,t)-Q(k,t)));
the overall speed became much slower. Any suggestion for a better L1 distance implementation? Thanks a lot!

03 Dec 2009 Yi Cao

Chuan-Yung,

You may try to use abs instead of sign function.

Good luck.
Yi

03 Dec 2009 Chuan-Yung

Dear Dr. Cao,

Yes, I've also tried d= d+abs((R(:,t)-Q(k,t)), which was sadly even slower than the ".*sign" style. Maybe JIT doesn't really support "abs" function in this case?

Thank you in advance for any further suggestion.

CY

03 Dec 2009 Yi Cao

It could be because your data. I tested abs with random data (example 5), but could find any significant difference in cpu time comparing with L2 distance.

Yi

04 Dec 2009 Chuan-Yung

Here is a test close to my dataset:

Q = rand(1,4000); R = rand(4000,4000);
Original L2 knnsearch(Q,R,10); = 0.09375 seconds
"abs" L1 knnsearch(Q,R,10); = 0.65625 seconds
".*sign" L1 knnsearch(Q,R,10); = 0.21875 seconds

They're all very fast, but since I have to repeat searching sequentially incoming Q for thousands of times, the speed differences become pretty obvious.

CY

04 Dec 2009 Yi Cao

Strange. This could be the Matlab versio. On my machine with R2009B, I got almost the same time for both L1 and L2 using your example. I have tested this on two differet machines, one with 2 cores another with 4.

Yi

04 Dec 2009 Chuan-Yung

Oh bingo! I forgot that I'm still using Matlab 7.0 (which is too old, I guess).

Thank you so much for all the patient and kind replies!

Sincerely,
CY

01 Mar 2010 lucy

It is fast! Thank you very much!

24 Mar 2010 Larry

I have been trying to run this fille using Matlab 7.4.0 Student Version 2007a. However I keep getting the following error message:

 ??? Maximum recursion limit of 500 reached. Use set(0,'RecursionLimit',N)
to change the limit. Be aware that exceeding your available stack space can
crash MATLAB and/or your computer.

Error in ==> knnsearch at 32
idx=knnsearch(Q,R);

Can someone help me?

25 Mar 2010 Yi Cao

Larry,

Line 32 is within a block comment for an example. The error means your Matlab version is too old to support block comments. To correct this, you have to change the block comment, i.e. lines between "%{" and "%}" to comment lines by putting "%" at beginning of these line.

Yi

29 Mar 2010 Larry

Hi Yi,

My version supports block comments. From my understanding of the code, I should be able to take one example the examples out of block comments to test run the code, am I correct? I have tried using block comments on all the examples, but I get this error when I do this:

??? Error using ==> knnsearch>parseinputs at 140
Not enough input arguments.

Error in ==> knnsearch at 97
[Q,R,K,fident] = parseinputs(varargin{:});

Im just not understanding, thanks for your help.

30 Mar 2010 Yi Cao

You have to copy and past the example into the command window for test. Or, highlight lines of the example then press F9.

HTH
Yi

08 Apr 2010 aarif

Easy to follow but much slower compared to several mex kdtree implementations on the FEX for large data.

22 Apr 2010 Bree

Hi Yi Cao,
simple question,
why when I used
idx = knnsearch(A,A,2)
will return differently with
idx = knnsearch(A,[],2)?

hope you don't mind to answer it.

best,
-Bree

22 Apr 2010 Yi Cao

Hi Bree,

Thanks for pointing out this to me. This is because for the former, the function did not check whether the first and second input arguments are equal, hence they are treated as different. This causes some difference in results from the later. I have updated the code to do the check. You should be able to download the new version within a day or so.

Yi

13 Jul 2010 joshua

this seems great. do you have a wrapper script that implements the knn classification algorithm using the knnsearch file?

02 Mar 2011 Daniel Lagos

Hi Yi Cao,
Your code is amazing, congratulations!.
I´m simulating dipolar forces in a crystal structure. How Could I implement periodic boundary conditions (PBC) in your code?.

21 Oct 2011 Venkat R

Dear Cao,
Your code is amazingly fast.
Can you help change the metric from euclidean distance to chi-square.

with regards,
Ramana

25 Oct 2011 ahmed

i have matrix (M,N), with M=8 vectors and each vectors contain N=100 samples.

i need your help, if i can apply KNN in the case of matrix, contain M vectors? thanks in advance
 
 

Please login to add a comment or rating.
Updates
07 Jul 2008

fix a bug

22 Apr 2010

a bug fixed

Tag Activity for this File
Tag Applied By Date/Time
statistics Yi Cao 22 Oct 2008 09:55:26
probability Yi Cao 22 Oct 2008 09:55:26
knearest neighbors Yi Cao 22 Oct 2008 09:55:26
nearest neighbor search Yi Cao 22 Oct 2008 09:55:26
delaunay Yi Cao 22 Oct 2008 09:55:26
jit Yi Cao 22 Oct 2008 09:55:26
kdtree Yi Cao 22 Oct 2008 09:55:26
knearest neighbors Company Din 08 Feb 2011 02:29:18
delaunay Company Din 09 Feb 2011 02:15:36
nearest neighbor search Gabe Espinosa 28 Oct 2011 02:05:12
knearest neighbors Katerina Rota 12 Jan 2012 14:11:36

Contact us at files@mathworks.com