# Double summation with vectorized loops

29 views (last 30 days)
Julián Francisco on 30 May 2011
Commented: Walter Roberson on 7 Apr 2019
Hi. I want to vectorize this double for loop because it is a bottleneck in my code. Since MATLAB is a based-one indexing language I have to create an additional term for M = 0.
R,r,lambda,phi,mu_p are constants
Slm(L,M),Clm(L,M) are matrices 70x70
Plm(L,M) is a matrix 70x71
Cl(L),Pl(L) are vectors 70x1
% function dU_r
s1 = 0;
for L = 2:70
s1 = s1 + ((R/r)^L)*(L+1)*Pl(L)*Cl(L);
for M = 1:L
s1 = s1 + ((R/r)^L)*(L+1)*Plm(L,M)*(Clm(L,M)*...
cos(M*lambda) + Slm(L,M)*sin(M*lambda));
end
end
dU_r = -(mu_p/(r^2))*s1;
% function dU_phi
s2=0;
for L = 2:70
s2 = s2 + ((R/r)^L))*Plm(L,1)*Cl(L);
for M = 1:L
s2 = s2 + ((R/r)^L)*(Plm(L,M+1)-M*tan(phi)*Plm(L,M))*...
(Clm(L,M)*cos(M*lambda) + Slm(L,M)*sin(M*lambda));
end;
end;
dU_phi = (mu_p/r)*s2;
% function dU_lambda
s3=0;
for L=2:70
for M=1:L
s3 = s3 + ((R/r)^L)*M*Plm(L,M)*(Slm(L,M)*cos(M*lambda)...
- Clm(L,M)*sin(M*lambda));
end;
dU_lambda = (mu_p/r)*s3;
Jan on 31 May 2011
@Julian: Now you have expanded your code. Following my example you can calcluate the three sums simultaneously.
The general rule is: Move all repeated claculations out of the loop. E.g. for dU_phi
you calculate TAN(phi) 2484 times, although it is constant.

Jan on 30 May 2011
Let's solve the the inner loop at first (I prefer "j", because the lower case "L" looks like a one):
R = rand; r = rand; lambda = rand;
Slm = rand(70); Clm = rand(70);
Plm = rand(70, 71);
Cl = rand(70, 1); Pl = rand(70, 1);
s = 0;
for j = 2:70
s = s + ((R/r)^j) * (j+1) * Pl(j) * Cl(j) + ...
sum(((R/r)^j) * (j+1) * Plm(L,1:j) .* ...
(Clm(L, 1:j) .* cos((1:j) * lambda) + ...
Slm(L, 1:j) .* sin((1:j) * lambda)));
end
But this is 4 times slower than the original version under Matlab 2009a!
Let's try to avoid the repeated power, COS and SIN:
Rr = R / r;
RrL = RrL; % EDITED: No cumprod anymore -> 5% faster
cosLambda = cos((1:70)* lambda);
sinLambda = sin((1:70)* lambda);
u1 = uint8(1);
s = 0;
for j = uint8(2):uint8(70)
RrL = RrL * Rr;
q = RrL * (double(j) + 1);
t = Pl(j) * Cl(j);
for m = u1:j
t = t + Plm(j,m) * ...
(Clm(j, m) * cosLambda(m) + ...
Slm(j, m) * sinLambda(m));
end
s = s + q * t;
end
EDITED: 40% faster with UINT8 loop indices instead of DOUBLEs! Same speed for INT32, but only 25% for UINT32.
This is 12 times faster than the original version - with FOR loops!
So vectorized does not necessarily mean faster. The JIT acceleration introduced with Matlab 6.5 increases the speed of this loop remarkably. And avoiding powers and trigonometric calculations is important also.
The old tale of the slow FOR loops is very sticky.
Walter Roberson on 7 Apr 2019
Please start a new Question for this.

Matt Fig on 31 May 2011
Here is a vectorized version, but I must say that I would have probably just went with Jan's FOR loop. It seems you need to see a vectorization, even if it is not the most efficient solution. Here you go:
V = ((R/r).^(2:70));
s = repmat(1:70,69,1);
s = sum(V(1:69).*(3:71).*sum(tril(Plm(2:70,1:70).*...
(Clm(2:70,1:70).*cos(s*lambda)+Slm(2:70,1:70).*...
sin(s*lambda)),1),2).'+V.*(3:71).*Pl(2:70).'.*Cl(2:70).');
Bjorn Gustavsson on 2 Jun 2011
@Jan, you got my hopes up with that 25%! (On the other hand maybe it was just as well that it was "by 25%" - saves me a whole lot of dreary rewriting. But with the speedup we're getting closer to the practices of "normal" programming - having to make sure that we use the optimal type for each variable...)

Julián Francisco on 1 Jun 2011
Hi. Taking like reference the solution Jan3, given by Jan Simon, I have written the following code for my problem. Thank all that have collaborated with comments and suggestions.
Rr = R/r;
RrL = Rr;
cosLambda = cos((1:70)* lambda);
sinLambda = sin((1:70)* lambda);
u1 = uint8(1);
s1 = 0;
s2 = 0;
s3 = 0;
for j = uint8(2):uint8(70)
RrL = RrL * Rr;
q1 = RrL * (double(j) + 1);
t1 = Pl(j) * datos.Cl(j);
q2 = RrL;
t2 = Plm(j,1) * Cl(j);
t3 = 0;
for m = u1:j
t1 = t1 + Plm(j,m) * ...
(Clm(j, m) * cosLambda(m) + ...
Slm(j, m) * sinLambda(m));
t2 = t2 + (Plm(j,m+1)-double(m)*tan_phi*Plm(j,m))*...
(Clm(j,m)*cosLambda(m) + Slm(j,m)*sinLambda(m));
t3 = t3 + double(m)*Plm(j,m)*(Slm(j,m)*cosLambda(m)...
- Clm(j,m)*sinLambda(m));
end
s1 = s1 + q1 * t1;
s2 = s2 + q2 * t2;
s3 = s3 + q2 * t3;
end
dU_r = -(mu_p/(r^2))*s1;
dU_phi = (mu_p/r)*s2;
dU_lambda = (mu_p/r)*s3;