7 views (last 30 days)

Hi, I have a set of 3d points (x,y,z). I wish to compute the distance between every point and all of the other points in the set. so this is the code hat does this:

nPoints=size(Points, 1);

[idx, ~] = ndgrid(uint32(1:nPoints),uint32(1:nPoints)); %row indices to calculate distance between

CartesianDistance = arrayfun(@(row1, row2) norm(Points(row2, 1:3) - Points(row1, 1:3)), idx, idx');

The problem is that now i have 80000 points. and i get an out of memory error (an 80000X80000 matrix does that).note that it is very likely that i will have even more points in the futre. is there a way to go around this out of memory issue ?

Alfonso Nieto-Castanon
on 21 May 2015

Alfonso Nieto-Castanon
on 21 May 2015

In its current form, the only simplification I can think of would be to compute this measure in a for loop separately for each column of your distance matrix (since at the end you are only interested in the sum of dH across each column). That would trade the memory requirement for speed.

Alternatively if you simplify your dissimilarity measure formula perhaps you can also arrive a fast-computation implementation using simple linear algebra tricks (e.g. summing of squared distances can be done much faster, but summing root sum square differences cannot)

EDIT: just to add an example, if your measure was only based on cartesian distances (e.g. dH = CartesianDistance.^2 instead of dH = CartesianDistance.*FeatureDistance ), you could get the alternative dHigh measure directly (without for loop or full distance matrix computation) using:

normPoints = sum(Points.^2,2);

dH_mean = normPoints + mean(normPoints) - 2*Points*mean(Points,1)';

Dhigh = 1-exp(-dH_mean);

that involves O(nPoints) instead of O(nPoints^2) operations so it gets done considerably faster (not that this dissmilarity measure would be particularly useful, only posting as an example of how one goes about simplifying/speeding-up computations).

Jan
on 21 May 2015

By this method your output has 80'000^2*8 Byte = 51.2 GB . Therefore either use a 64 bit machine and install a lot of RAM - a fair guess is the double size of the largest used matrix.

But the distance matrix is symmetric and half of the information is redundant. So better use a smarter way to calculate the pairwise distances. Beside Matlab's pdist there are many tools in the FileExchange, simply search them: http://www.mathworks.com/matlabcentral/fileexchange/?utf8=%E2%9C%93&term=pairwise+distance

But a general problem remains: The more points you have, the larger is the output. But do you really need all distances? Usually there is a smarter way to obtain the required information.

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
## 0 Comments

Sign in to comment.