How can I use C++ to speed up my matlab code ?

18 views (last 30 days)
Hi every one, I am working on image processing and I generated a matlab code with many for loops, it takes too long to display results, can any one please tell how can I speed it up using C++ ?
thank you !
  5 Comments
Stephen23
Stephen23 on 7 Apr 2017
@mayssa: your is slow because you wrote it to expand the array of every iteration. Much easier than switching to C++ would be to simply write efficient MATLAB code:

Sign in to comment.

Accepted Answer

Jan
Jan on 7 Apr 2017
Edited: Jan on 7 Apr 2017
Sorry, I cannot resist :-)
MyCorrCoeff2 can still be improved by considering the columnwise storing of elements in arrays. The normalization happens repeatedly inside the inner loops, so it is cheaper to compute it once outside the loops:
function H = MyCorrCoeff3(Mat)
% Author: Jan Simon, License: CC BY-SA 3.0
c = size(Mat,1);
nc = (0.5 * c * (c + 1));
H = ones(nc, nc); % Pre-allocate!!!
MP = permute(Mat, [3, 2, 1]);
MP = MP - sum(MP, 1) / size(MP, 1);
% MP = bsxfun(@minus, MP, sum(MP, 1) / size(MP, 1));
CovN = squeeze(sqrt(sum(MP .* MP, 1)));
jH = 1;
for i = 1:c
for j = i:c
cov1 = MP(:, j, i);
cov1n = CovN(j, i);
iH = 1;
for d = 1:c
for l = d:c
if iH < jH % Use symmetry of H
% Emulate: r = cov([Mat(i, j, :), Mat(d.l,:)]);
r21 = (MP(:, l, d).' * cov1) / cov1n / CovN(l, d);
H(iH, jH) = r21;
H(jH, iH) = r21;
end
iH = iH + 1; % Update indices related to H:
end
end
jH = jH + 1;
end
end
H(isnan(H)) = 0;
0.085 sec, 38 times faster than the original version. :-)
[07-Apr-2017 16:34 UTC] Enough for today. Now the code is in a form which could profit from C-coding. The bottleneck is still with about 85% of the total time:
r21 = (MP(:, d, l).' * cov1) / cov1n / CovN(d, l);
The vector multiplication is calculated in an optimized BLAS-library already. Therefore I think you can reduce the time for the overhead of calling the library and some percent of the loops only, perhaps 10%. It will be more useful to try a parallelization with PARFOR.
  4 Comments
Mayssa
Mayssa on 8 Apr 2017
@Jan:Indeed, I must admit that this was an opportunity for me to learn a lot; still too much to learn though. For the time problem, it is solved, it takes about five minutes now and to answer your question, yes this is wanted, Mat is already symmetric.

Sign in to comment.

More Answers (3)

John D'Errico
John D'Errico on 6 Apr 2017
Almost always using compiled code will not give you great speedups, UNLESS your code was inefficiently written in MATLAB in the first place. Of course then almost anything you do will be an improvement.
But if your code is already well written in MATLAB, using a good choice of algorithms, then trying to go to C or C++ will probably not be worth the time and effort it will take to code. This is certainly true unless you have quite good programming skills in C/C++, AND you understand the essential algorithms well.
The point is almost always the most impact you can get for a given investment of time will come from a better understanding of MATLAB, learning how to improve your code. Often great gains can come from a completely different algorithmic approach, rethinking how you will solve the problem.
  3 Comments
John D'Errico
John D'Errico on 7 Apr 2017
Edited: John D'Errico on 7 Apr 2017
The slowness of your code is PURELY due to the fact that you never preallocated v !!!!!!!!! Writing c++ code here is a waste of time. Just write EFFICIENT MATLAB code. MATLAB is NOT "really slow". It is your code as written that is really slow!
When you append one element at a time to v, that forces MATLAB to reallocate the vector v in memory at EVERY iteration of the loop! It then needs to copy the entire array over. Small at first, but as v grows in size, this take serious time to do. This dynamic array growth is a bad idea. It is fine for tiny arrays, but as the arrays grow in size, not true. Instead, use the function zeros to fill v with elements of the final size of v.
You know (or can compute) what the final size of v will be, based on the value of c. Based on your loops, it will be:
c^2*(c+1)^2/4
So instead of setting v as an empty array to start with, use
v = zeros(1,c^2*(c+1)^2/4);
Inside your loop, maintain a counter that tells you where to put the next element of the vector. So your loops will look like this now:
c=size(Mat,1);
v = zeros(1,c^2*(c+1)^2/4);
ind = 0;
for i = 1:c
for j =i:c
for d=1:c
for l =d:c
ind = ind + 1;
a = corrcoef(Mat(i,j,:),Mat(d,l,:));
v(ind) = a(2,1);
end
end
end
end
Could you have written those loops themselves more efficiently, with fewer loops? This is probably also true, but not the immediate problem.
In fact, the editor will actually have warned you of this need to preallocate v. On the right hand side, you will see red and orange lines in the editor, indicating potential problems in your code. Read what they say, and what they suggest.
Note that the very same comments apply to the array H. You never bothered to preallocate H to the proper size, but then you stuff one element at a time into H. Even worse, all you are really doing is a simple RESHAPE!!!!!! Thus, you are creating a new array that has the same elements, in the same order, but the array is a different shape. So you can replace the entire set of loops that create H with one line of efficient code.
H = reshape(v,c*(c+1)/2,c*(c+1)/2);
As I said in my answer, look to your own code for the problem. Writing c++ code to fix these problems would be silly, and a complete waste of time.
Mayssa
Mayssa on 7 Apr 2017
for the loops, I tried several times and couldn't see a better way to do it to obtain what I need. It turned out that most of the time was spent when calling corrcoef, I replaced it by another built function, it reduced time to more than half, so yes as you said :) you convinced me and thank you for the reshape (didn't know about it),

Sign in to comment.


Jan
Jan on 7 Apr 2017
Edited: Jan on 7 Apr 2017
After your comment:
Letting an array grow iteratively is a massive waste of resources. You should see a corresponding MLint warning in the editor.
c = size(Mat,1);
nv = (0.5 * c * (c + 1)) ^ 2; % sum(1:c)^2
v = zeros(1, nv);
iv = 0;
for i = 1:c
for j = i:c
Matij = Mat(i, j, :);
for d = 1:c
for l = d:c
a = corrcoef(Matij, Mat(d,l,:));
iv = iv + 1;
v(iv) = a(2,1);
end
end
end
end
For Mat = [20 x 20 x 10] this reduces the runtim from 5.3 to 4.0 sec already, but the benefit will be higher for larger inputs.
This can be accelerated, if you do not compute all elements by corrcoef, but only the needed one:
function r21 = mycorrel(A, B)
r = cov([A(:), B(:)]);
r21 = r(2,1) / sqrt(r(1) * r(4));
% Exactly the same output as CORCOEFF without rounding differences:
% Slightly slower, but considers over/underflow:
% r21 = r(2, 1) / sqrt(r(1)) / sqrt(r(4));
and
... innermost loop:
for l = d:c
iv = iv + 1;
v(iv) = mycorrel(Matij, Mat(d,l,:));
end
2.5 seconds run time. :-)
The 2nd part: nchoosek is slow, so do not call it repeatedly:
k = 1;
nc = (0.5 * c * (c + 1)); % nchoosek(c,2)+c;
H = zeros(nc, nc); % Pre-allocate!!!
for i = 1:nc
for j = 1:nc
H(i,j) = v(k);
k = k+1;
end
end
H(isnan(H)) = 0;
What do you observe finally?
H(1:5, 1:5)
H is symmetric. This means that half of the computations are redundant. Then you can reduce the run time by 50% if you consider this in the first part already. Now the main diagonal can be ignored also. I leave this up to you, it is not hard to create the matrix H directly. Calculate the elements below the diagonal only and copy it above the diagonal. Something like:
for d = i:c % Instead of 1:c
...
H(x, y) = mycorrel...
H(y, x) = H(x, y);
Conclusion: With some reordering and avoiding of unnecessary computations you can accelerate the code by a factor of 4 without the need to create a C-Mex function. Because the main work is spent in COV, a straight translation to C would not have been faster.
  5 Comments
Mayssa
Mayssa on 7 Apr 2017
Edited: Mayssa on 7 Apr 2017
@Jan: I think reshape does the job better for H as mentioned above, don't you think ? Indeed it is faster but my problem is still with the main which takes near half an hour (well better than 1 hour yesterday), that's why I thought of creating separate C functions in the first place; I have seen this in a clustering program, although a very much larger data than mine is used, the code doesn't take more than 1 min.
Mayssa
Mayssa on 7 Apr 2017
@Stephen: thank you for the link, greatly appreciated

Sign in to comment.


Jan
Jan on 7 Apr 2017
Edited: Jan on 7 Apr 2017
function H = MyCorrCoeff1(Mat)
c = size(Mat,1);
nc = (0.5 * c * (c + 1));
H = zeros(nc, nc); % Pre-allocate!!!
iH = 1;
jH = 1;
for i = 1:c
for j = i:c
Matij = Mat(i, j, :); % Tiny speedup to copy this once only
for d = 1:c
for l = d:c
if iH == jH % Diagonal element is 1.0
H(iH, jH) = 1;
elseif iH > jH % Use symmetry of H
% Equivalent:
% r = corrcoef(Mat(i,j,:),Mat(d,l,:)); r21 = r(2, 1);
Matdl = Mat(d,l,:);
r = cov([Matij(:), Matdl(:)]);
r21 = r(2,1) / sqrt(r(1)) / sqrt(r(4));
H(iH, jH) = r21;
H(jH, iH) = r21;
end
% Update indices related to H:
iH = iH + 1;
if iH > nc
iH = 1;
jH = jH + 1;
end
end
end
end
end
H(isnan(H)) = 0;
This reduces the runtime from 3.26 to 0.63 sec (R2016b/64, Win7).
Now COV is still the bottleneck, but is called less often. Let's see if inlining the code of COV helps (in addition H is initialized with 1 and the diagonal is not set inside the loops):
function H = MyCorrCoeff2(Mat)
c = size(Mat,1);
nc = (0.5 * c * (c + 1));
H = ones(nc, nc); % Pre-allocate!!!
covM = size(Mat, 3);
iH = 1;
jH = 1;
for i = 1:c
for j = i:c
Matij = Mat(i, j, :);
cov1 = Matij(:) - sum(Matij(:)) / covM; % Remove mean
cov1n = sqrt(cov1.' * cov1);
for d = 1:c
for l = d:c
if iH > jH % Use symmetry of H
Matdl = Mat(d,l,:);
% Emulate: r = cov([Matij(:), Matdl(:)]);
cov2 = Matdl(:) - sum(Matdl(:)) / covM;
r21 = (cov2.' * cov1) / cov1n / sqrt(cov2.' * cov2);
H(iH, jH) = r21;
H(jH, iH) = r21;
end
iH = iH + 1; % Update indices related to H:
if iH > nc
iH = 1;
jH = jH + 1;
end
end
end
end
end
H(isnan(H)) = 0;
Yepp, it is faster: 0.24 sec. 13.6 times faster, and still no need to struggle with tricky C code (and as you can see from my FileExchange submissions, I do love C!).
But look at the code: It is ugly! The original code was much easier to understand. Here a clean version:
c = size(Mat,1);
nc = (0.5 * c * (c + 1)); % sum(1:c)
H = zeros(nc, nc)
iH = 0;
for i = 1:c
for j = i:c
for d = 1:c
for l = d:c
a = corrcoef(Mat(i, j, :), Mat(d,l,:));
iH = iH + 1;
H(iH) = a(2,1);
end
end
end
end
The results are equal except for rounding differences. This nicer version should be included as comment.

Categories

Find more on Introduction to Installation and Licensing in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!