There are several utilities around to compute interpoint distances, but none of them did fully what I thought important. What were my goals?
1. Inter-point distances are sometimes computed within one set of points, or between two sets. So a tool must handle either case.
2. Efficiency is important, but a common method for inter-point (Euclidean) distances uses a trick that results in a loss of accuracy. The good thing is bsxfun allows us to compute distances both efficiently and accurately.
3. Many times we wish to compute an inter-point, but we only need some subset of the entire matrix. Then it might be nice to have a list of only the single nearest neighbor for each point in our set, or only the large or small distances beyond some limit.
4. Where appropriate, a sparse distance matrix might be useful.
5. Really large problems can sometimes be solved by breaking the problem into smaller chunks. IPDM does this where appropriate.
6. There are many special cases that can be solved efficiently. For example, to find the nearest neighbor for one dimensional data is a simple thing, costing no more than a sort. One does not need to compute all distances if only the closest point is of interest. (There are several other special cases that can be sped up, perhaps using k-d trees, or other algorithms. If interest is seen I'll try to provide them.)
All of these things and more are satisfied by IPDM. It uses a property/value pair interface to specify the options.
Find the nearest neighbors in 1-dimensional data:
A = randn(10000,1);
B = randn(15000,1);
tic,
d=ipdm(A,B,'subset','nearest');
toc
Elapsed time is 0.151346 seconds.
A note to those who might be worried about absolute speed on small sets of data. I've now considerably sped up the code for simple calls, reducing the basic overhead by a factor of roughly 4.
See the demo file for many examples of use. |