Speed up Matrix Subtraction for Euclidean Distance calculation
Show older comments
I have 2 matrices, both of them are feature matrices, M1 is 9216x26310, M2 is 9216x34000. each column represents 9216x1 represents one feature for a particular image, in this case 26310 images for the M1, 34000 for M2. I want to speed up the time for it to calculate euclidean distance, my current code takes average 1.3 secs or 1.4 secs per calculation.
tic;
for idx = 1:Number_of_Test_Images
for TrnIdx = 1 : Number_of_Train_Images
E_distance(TrnIdx) = sqrt(sum(abs((ftest(:,idx)-ftrain(:,TrnIdx)).^2)));
end
%
%[smallest_value, Idx_smallest] = min(E_distance);
%
Esimate_Test_labels(idx,1) = TrnLabels(Idx_smallest,1);
toc;
end
even if I replace the nested for loop with this
[smallest_value,Idx_smallest] = min(sum(abs(bsxfun(@minus,ftest(:,idx),ftrain))));
it still takes roughly the same time, although this one doesn't calculate sqrt.
Is there other way to speed this up to about 0.2 secs per calculation?
I have a i7 6700k CPU sadly no Nvidia GPU, owning AMD GPU. Else I can use gpuarray to speed things up.
Any ideas guys?
Both matrices are type double, I've tried to use sparse double on the my original code, my computer hangs every time after it reaches maximum memory usage, had to restart. and Sparse double matrices actually takes longer to compute.
Update! I've tried convert both matrices into single precision, and it runs faster, average of 0.63 secs per calculation by using the formula below, which runs the fastest compare to others.
I still need it to run faster, preferably under 0.2 secs if that's possible.
for TrnIdx = 1 : Number_of_Train_Images
E_distance(TrnIdx) = norm(ftest(:,idx)-ftrain(:,TrnIdx));
end
% runs averagely of 0.63secs which is the best at the moment, after converting both matrices from double to single, double runs at 1.05secs
14 Comments
James Tursa
on 30 Dec 2016
Do you have a supported C compiler installed for building mex routines?
TYS
on 30 Dec 2016
Walter Roberson
on 30 Dec 2016
What are the timing results for pdist2 of the transpose of the matrices?
Roger Stafford
on 31 Dec 2016
Assuming ‘ftest’ and ‘ftrain’ vectors are real-valued, then the ‘abs’ operation is totally unnecessary and a waste of time. The square of any real number is always non-negative.
Also you might try the default 2-norm operator in place of the computation you have there:
E_distance(TrnIdx) = norm(ftest(:,idx)-ftrain(:,TrnIdx));
It might be faster and gives the same answer.
Walter Roberson
on 31 Dec 2016
One bit I would suggest is to calculate the squared distance and then call sqrt on the result. Or at least sqrt blocks instead of individual calculations. This is to avoid the overhead of multiple calls, and to give a chance for SIMD instructions to be used.
Walter Roberson
on 31 Dec 2016
One thing I would suggest is to move ftest(:,idx) to before the loop:
this_ftest = ftest(:,idx);
for TrnIdx = 1 : Number_of_Train_Images
E_distance(TrnIdx) = norm(this_ftest - ftrain(:,TrnIdx));
end
Walter Roberson
on 31 Dec 2016
The time you are measuring corresponds to 34000 norm calculations of length 9216 each. So that is 9216 subtractions each, 9216 squares, 9215 sums, and 1 square root, so a minimum of 9216*3 = 27648 floating point operations per norm, ignoring for the moment any memory access costs. 27648 * 34000 = 940032000 floating point operations. That's about a gigaflop. For that to happen in 0.2 seconds your system would have to be running at 5 gigaflop (even assuming that the squares and square roots took the same time as a additions and subtractions). The i7 6700k is rated at most 4 GHz, so at most 4 gigaflop. Therefore with the most efficient possible operations all perfectly pipelined, you cannot expect to do better than 0.25 seconds.
Unless, that is, you can take advantage of multiple cores. Which is plausible.
Walter Roberson
on 31 Dec 2016
My timing tests with norm() suggest that for 34000 items of length 9216, you probably are not going to be able to do better than 0.46 seconds (unless you use multiple cores.)
Ahmet Cecen
on 31 Dec 2016
This is correct. However, to achieve what he actually wants to do, he doesn't actually need to do that many subtractions. His original code is asking the wrong question, and finding the answer inefficiently on top of that.
The right question here is the nearest neighbor problem. Optimizations of queries of this nature (search) is a very well studied area, with many exact and approximate solutions.
I believe the MATLAB implementation is taking advantage of some tree structure to do a fast exact search, reducing the number of necessary comparisons by structuring the searched data in a particular way.
A very optimized approximate algorithm is FLANN which has MATLAB bindings. This should further reduce the execution time of the entire task to around 3-5 minutes, with a minor fraction of inaccurate results (corresponding to the 5th nearest neighbor instead of 1st, for example).
Walter Roberson
on 31 Dec 2016
My testing with parfor and norm indicated it was about 7 times slower than a straight forward norm loop.
Accepted Answer
More Answers (1)
E_distance = DNorm2( bsxfun(@minus,ftest(:,idx),ftrain) ,1) ; %Equation (*)
You could also try parallelizing the outer loop with parfor and maybe even the computation of E_distance as well (break ftrain into smaller parallel chunks and perform pieces of Equation (*) above on different workers).
Finally, if ftest and ftrain are type double, you might try moving to type single.
5 Comments
TYS
on 31 Dec 2016
Walter Roberson
on 31 Dec 2016
Try installing a compiler, using mex -setup and then using mex to compile the code.
TYS
on 31 Dec 2016
Matt J
on 31 Dec 2016
No, it's usually not a big problem. Tech support might be able to help you with the installation.
Categories
Find more on GPU Computing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!