how can i vectorize this for loop??

1 view (last 30 days)
nelson
nelson on 27 Jul 2017
Commented: nelson on 28 Jul 2017
clc;
clear all;
close all;
k = 1:0.5:10;
a = 1:0.5:5;
n1 = numel(k);
nump = kron(k,ones(1,length(a)))';
denp = repmat([ones(n2,1), a.', zeros(n2,1)] ,n1, 1);
w=[0.1,0.5,0.8,1,2,8,15,50,100];
n2 = numel(a);
delay=0;
[rn,cn]=size(nump);
[rd,cd]=size(denp);
[rw,cw]=size(w);
[rdel,cdel]=size(delay);
if rw>1,
w = w(:)';
end
if rn~=rd & (rn~=1 & rd~=1),
end
if rdel~=rn & rdel~=rd & rdel~=1,
end
i=sqrt(-1);
s=i*w;
mx = max(rn,rd);
q=1; r=1;
for h=1:mx,
q=(rn>1)*h+(rn==1); r=(rd>1)*h+(rd==1); d=(rdel>1)*h+(rdel==1);
upper = polyval(nump(q,:),s);
lower = polyval(denp(r,:),s);
cp(h,:)=(upper./lower).*exp(-s*delay(d));
end
toc
  1 Comment
Jan
Jan on 27 Jul 2017
Edited: Jan on 27 Jul 2017
You need to resort the code lines: n2 is used before it is defined. The toc is useless without a tic. Omit the darn clear all, because it removes all loaded functions from the memory. The reloading from disk wastes time and interfer with measuring the run time.
i=sqrt(-1) is useless, because this is the definition of "i" already.
Omit the useless code
if rn~=rd & (rn~=1 & rd~=1),
end
if rdel~=rn & rdel~=rd & rdel~=1,
end
except if you want to confuse the readers.

Sign in to comment.

Accepted Answer

Jan
Jan on 27 Jul 2017
Edited: Jan on 27 Jul 2017
Do you really want to vectorize the loop? Or is it enough to increase the speed remarkably? For the latter, use the profiler to find the bottleneck at first. Pre-allocate cp and use a leaner version of polyval:
function YourFcn
k = 1:0.5:10;
a = 1:0.5:5;
n1 = numel(k);
nump = kron(k,ones(1,length(a)))';
n2 = numel(a);
denp = repmat([ones(n2,1), a.', zeros(n2,1)] ,n1, 1);
w = [0.1,0.5,0.8,1,2,8,15,50,100];
delay = 0;
[rn,cn] = size(nump);
[rd,cd] = size(denp);
[rw,cw] = size(w);
[rdel,cdel] = size(delay);
if rw > 1
w = w(:)';
end
s = 1i * w;
mx = max(rn,rd);
cp = zeros(mx, numel(s));
for h = 1:mx
q = (rn>1)*h + (rn==1);
r = (rd>1)*h + (rd==1);
d = (rdel>1)*h + (rdel==1);
upper = mypolyval(nump(q,:), s);
lower = mypolyval(denp(r,:), s);
cp(h, :) = (upper ./ lower) .* exp(-s * delay(d));
end
end
function y = mypolyval(p,x)
% Use Horner's method for general case where X is an array.
nc = length(p);
y = zeros(size(x));
y(:) = p(1);
for i = 2:nc
y = x .* y + p(i);
end
end
This reduced the runtime on my computer from 0.46 to 0.21 seconds for 100 calls.
Use the profile with showing the time for the built-in functions:
profile on -detail builtin
This reveals the time spent in the expensive exp() function. Avoid the repeated calculation of the same exp output and inline the Horner scheme:
s = 1i * w;
mx = max(rn, rd);
cp = zeros(mx, numel(s));
if rdel == 1
C = exp(-s * delay(1));
end
nc1 = size(nump, 2);
nc2 = size(denp, 2);
upper = zeros(size(s));
lower = zeros(size(s));
for h=1:mx
q = (rn > 1) * h + (rn == 1);
r = (rd > 1) * h + (rd == 1);
if rdel > 1
C = exp(-s * delay(h));
end
% upper = polyval(nump(q,:), s);
upper(:) = nump(q, 1);
for i = 2:nc1
upper = s .* upper + nump(q, i);
end
% lower = polyval(denp(r,:), s);
lower(:) = denp(r, 1);
for i = 2:nc2
lower = s .* lower + denp(r, i);
end
cp(h,:) = (upper./lower) .* C;
end
Now it runs in 0.104 seconds, a speedup of factor 4.5 without vectorizing.
  3 Comments
Stephen23
Stephen23 on 27 Jul 2017
Edited: Stephen23 on 27 Jul 2017
@nelson: Jan Simon just showed you how to speed up your code without vectorization. It is quite possible that vectorization is not worth the bother if it makes your code more complicated. If you are not sure what code vectorization is, then why not learn by reading the MATLAB documentation:
@Jan Simon: +1 for a very comprehensive solution and explanation.
Jan
Jan on 27 Jul 2017
@nelson: The last code is the faster version, but it is not vectorized. I assume the Horner scheme can be vectorized, but this requires the creation of intermediate matrices. In consequence the result might (or might not) be slower than an efficient loop. Usually vectorizing the code is not the actual need, but improving the speed.
I've offered at least the efficient loop version. This is useful for a fair comparison, and now the code is much more clear, which allows for an easier vectorization. See my next answer for a vectorized code.

Sign in to comment.

More Answers (2)

Jan
Jan on 27 Jul 2017
Edited: Jan on 27 Jul 2017
Now the vectorized version:
function cp = YourFcn()
k = 1:0.5:10;
a = 1:0.5:5;
n1 = numel(k);
nump = kron(k, ones(1, length(a)))';
n2 = numel(a);
denp = repmat([ones(n2,1), a.', zeros(n2,1)], n1, 1);
w = [0.1,0.5,0.8,1,2,8,15,50,100];
delay = 0;
w = w(:)';
s = 1i * w;
upper = nump(:, 1);
for i = 2:size(nump, 2)
upper = s .* upper + nump(:, i); % requires >= R2016b !!!
end
lower = denp(:, 1);
for i = 2:size(denp, 2)
lower = s .* lower + denp(:, i); % requires >= R2016b !!!
end
cp = (upper ./ lower) .* exp(-s * delay); % requires >= R2016b !!!
end
:-) Nice. 0.023 seconds. 4 times faster than the efficient loop.
If you run this on older Matlab versions, you need some bsxfun() calls:
...
s = 1i * w;
upper = nump(:, 1);
for i = 2:size(nump, 2)
upper = bsxfun(@plus, bsxfun(@times, s, upper), nump(:, i));
end
lower = denp(:, 1);
for i = 2:size(denp, 2)
lower = bsxfun(@plus, bsxfun(@times, s, lower), denp(:, i));
end
cp = bsxfun(@times, bsxfun(@rdivide, upper, lower), exp(-s * delay));
While this code is compact and efficient, exhaustive comments are required to recognize, that this is a vectorized Horner scheme from polyval.
Note that the vectorized code does not need the case differentiation for rn>1, rd>1 and del>1 and that the number of exp() calls is reduced to the required minimum automatically.
I hope the real problem is much larger. Otherwise the absolute speed up is only some milliseconds from 0.0046 to 0.00023. For the inputs:
k = 1:0.05:10;
a = 1:0.05:5;
the original version needs 217, the efficient loop 8 and the vectorized code 1.06 seconds. And they even reply the same values ;-)
  2 Comments
nelson
nelson on 27 Jul 2017
Thanks for the help..
Jan
Jan on 28 Jul 2017
@nelson: You are welcome. This was an interesting question and I was not sure at first how to vectorize this and if this increases the speed. I've posted the intermediate steps also to demonstrate how the solution can be developped.

Sign in to comment.


nelson
nelson on 28 Jul 2017
thnku sir...... is it possible to vectorize these two for loops??
  2 Comments
Jan
Jan on 28 Jul 2017
@nelson: Please post comments as comments. When you use the section for answers, it is not clear to which answer they belong to. If you mean the Horner scheme:
upper = nump(:, 1);
for i = 2:size(nump, 2)
upper = s .* upper + nump(:, i);
end
No, this cannot be vectorized furtherly. Remember that the Horner scheme is known for its numerical stability and efficiency.
If you still need more speed, hire a programmer to create a fast C-Mex function.

Sign in to comment.

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!