Vectorizing loop over multidimensional data

I would like to increase the speed of the following code as it is currently a bottleneck in a piece of software that I am developing. As can be seen, I have vectorized the code over the "M" dimension but I haven't had much luck with getting rid of the loop over W. Ultimately, the code should result in a row vector the size of W containing the summed terms shown in the second-to-last line.
For brevity, I've replaced my real data with random data.
Any assistance with further vectorizing this code or other recommendations for making it more computationally efficient are greatly appreciated.
W = -599:2:500;
numW = length(W);
Wp = 500;
M = 200;
U = rand(M,1);
V = rand(M,1);
X = rand(M,1);
Y = rand(M,1);
K = rand(M,1);
Ep = rand(M,M);
En = rand(M,M);
% Loop over W terms
WSUM = zeros(1,numW);
tic
% Precompute L terms that are invariant under the loop
L1 = U*V.'; % M x M
L2 = (K*K.').^(-1/2); % M x M
L5 = exp(1i*K.'*Wp); % 1 x M
% Precompute Q terms that are invariant under the loop
Q1 = U*V.'; % M x M
Q2 = X*Y.'; % M x M
Q3 = K*K.'; % M x M
QQ1 = (Q1 - (Q2./Q3)).*Ep; % M x M
QQ2 = (Q1 + (Q2./Q3)).*En; % M x M
% Combine Q terms
QQ = QQ1 + QQ2; % M x M
for wi = 1:numW
L3 = W(wi)^(-1/2); % 1 x 1 scalar
L4 = exp(1i*K.'*W(wi)); % 1 x M
% Combine L terms
LL = (L1.*L2.*L3).*(L4.'*L5); % M x M
% combine L and Q terms
WW = (LL.*QQ); % M x M
% Compute sum
WSUM(wi) = sum(WW,'all'); % sum terms
end % end for loop
toc

4 Comments

As you've written it, only
L3 = W(wi)^(-1/2); % 1 x 1 scalar
L4 = exp(1i*K.'*W(wi)); % 1 x M
are not invariant inside the loop -- and they can be computed first and then just referenced.
I've not tried to factor out all the constants and figure out the algebra in the end, but looks like probably simply a summation over the array could be worked and eliminate the loop entirely.
dpb,
Thank you for the response. I've taken several stabs at full vectorization before making my post, but the results I obtain do not match the looped version of the code. I believe the issue is that I am somehow missing multiplications between cross terms in the MxM datastructures and the terms of dimension W.
CW
Have you just moved everything that isn't dependent upon the loop index out of the loop? Not doing all the other work over and over should be a noticeable savings unless the JIT compiler is smart enough to be able to recognize and cache results.
I just tested this suggestion and it yielded a 67% improvement in overall execution time of the code (keeping in mind there are lots of other things going on in the actual software). This was definitely one of those issues of me not seeing the forest for the trees as I was so focused on a fully vectorized solution.
That being said, thank you very much!
I've updated my original post to include the modified code should anyone have additional ideas on how to completely eradicate the for loop.

Sign in to comment.

 Accepted Answer

Jan
Jan on 1 May 2021
Edited: Jan on 1 May 2021
Some further improvements:
WSUM = zeros(1,numW);
L1 = U*V.'; % M x M
L2 = (K*K.').^(-1/2); % M x M
L5 = exp(1i * K.' * Wp); % 1 x M
L12 = L1 .* L2; % MOVED out of the loop
% Compute Q terms
Q1 = U * V.'; % M x M
Q2 = X * Y.'; % M x M
Q3 = K * K.'; % M x M
QQ1 = (Q1 - (Q2./Q3)).*Ep; % M x M
QQ2 = (Q1 + (Q2./Q3)).*En; % M x M
% Combine Q terms
QQ = QQ1 + QQ2; % M x M
for wi = 1:numW
% Compute L terms
L3 = 1 / sqrt(W(wi)); % Instead of: L3 = W(wi)^(-1/2)
L4 = exp(K * (1i * W(wi))); % 1 x M
% Combine L terms
% LL = (L1.*L2.*L3).*(L4.'*L5); % M x M
% ([200x200]*[200x200]*[1])*([200x1]*[1x200])
% 40000 * 4 multiplications
LL = L12 .* ((L3 .* L4) * L5); % M x M
% [200x200] .* [1] .* [200 x 1] * [1 x 200]
% 40000 * 2 + 200 multiplications
% combine L and Q terms
WW = LL .* QQ; % M x M
% Compute sum
WSUM(wi) = sum(WW, 'all'); % sum terms
end % end for loop
toc
See:
tic; for k = 1:1e6; y = k^(-1/2); end; toc % 0.134446 seconds.
tic; for k = 1:1e6; y = 1/sqrt(k); end; toc % 0.009798 seconds.
I'm getting from 0.33 sec for your version to 0.28sec.
But we can move the multiplication with L5 out of the loop also:
WSUM = zeros(1,numW);
L1 = U*V.'; % M x M
L2 = 1 ./ sqrt(K * K.'); % M x M
L5 = exp(1i * K.' * Wp); % 1 x M
L125 = L1 .* L2 .* L5; % MOVED out of the loop
% Compute Q terms
Q1 = U * V.'; % M x M
Q2 = X * Y.'; % M x M
Q3 = K * K.'; % M x M
QQ1 = (Q1 - (Q2./Q3)).*Ep; % M x M
QQ2 = (Q1 + (Q2./Q3)).*En; % M x M
% Combine Q terms
QQ = QQ1 + QQ2; % M x M
for wi = 1:numW
% Compute L terms
% L3 = W(wi)^(-1/2); % 1 x 1 scalar
L3 = 1 / sqrt(W(wi));
L4 = exp(K * (1i * W(wi))); % 1 x M
% Combine L terms
LL = L125 .* (L3 .* L4); % M x M: 40200 multiplications
% combine L and Q terms
WW = LL .* QQ; % M x M
% Compute sum
WSUM(wi) = sum(WW, 'all'); % sum terms
end % end for loop
Now I am at 0.15 seconds on my i7 / Matlab R2018b. The original code before moving the contants out takes 1.5 seconds.

1 Comment

Jan,
Just tested the suggested changes and it yielded an additional 14% improvement in overall execution time. This adds up to a total of 81% improvement in speed when combined with dpb's suggestion of moving invariant lines of code outside of the loop.
This really highlights that improving execution time is much more than just ridding one's code of loops. I would have never thought that 1./sqrt() was faster than raising a quantity to the power of (-1/2). Awesome catch!
Much thanks to both of you for your time and assistance!
CW

Sign in to comment.

More Answers (0)

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Products

Release

R2019b

Asked:

on 1 May 2021

Commented:

on 1 May 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!