how to vectorize a for-loop or use parfor to speed it up?
Show older comments
I have this for loop right now:
for x = 2:Nx_max-1
for y = 2:Ny_max-1
lp(x,y,t1) = (1/(2*h^2))*dxy(x,y)*p(x+1,y+1,t1) + ...
(dxx(x,y)/(h^2) - vx(x,y)/(2*h) - (x-Nx-1)/(2*gamma))* p(x+1,y,t1) - ...
(1/(2*h^2))*dxy(x,y)*p(x+1,y-1,t1) + (dyy(x,y)/(h^2) - ...
vy(x,y)/(2*h) - (y-Ny-1)/(2*gamma))*p(x,y+1,t1) - ...
(2*dxx(x,y) + 2*dyy(x,y))*(1/h^2)*p(x,y,t1) + ...
(dyy(x,y)/(h^2) + vy(x,y)/(2*h) +(y-Ny-1)/(2*gamma))*p(x,y-1,t1) - ...
(dxy(x,y)/(2*h^2))*p(x-1,y+1,t1) +(dxx(x,y)/(h^2) +vx(x,y)/(2*h) +...
(x-Nx-1)/(2*gamma))*p(x-1,y,t1) + (dxy(x,y)/(2*h^2))*p(x-1,y-1,t1);
end
end
I asked a question previously where I didn't actually write out the full function here, and I was recommended to vectorize it for speed by replacing x by 2:(Nx_max-1) and y by 2:(Ny_max-1). This is causing problems because it looks like the multiplications would cause it to be matrix multiplication instead of multiplying the specific elements of the matrix that I'm interested in mylitplying together. And I also have a section where there ends up being a vector (2:(Nx_max-1)) multiplying a matrix. I don't want the end result to be a vector (as it would in regular matlab notation), I want each element of the matrix in that case to be multiplied by its x value.
Is there a way to vectorize this or somehow speed up the process?
I had thought about parallel computation, but it looks like I can't nest parfor loops.
Any suggestions would be appreciated
Answers (1)
An elementwise multiplication is performed by the .* operator, while * is a matrix multiplication. This should solve your problem already. Please post your trial to vectorize the code instead of letting us create it from scratch.
An optimization of code depends on the input values. It matters if Nx_max is 10 or 1e7. It matters if dxy and gamma are arrays or functions. So please post a version of your code, which can be started by copy&paste. Otherwise suggestions contains a lot of guessing.
But if guessing is required, consider the standard tips fopr loop optimization at first:
- avoid repeated calculations
- pre-allocate the output
E.g. (1/(2*h^2)) is calculated nearly Ny_max * Nx_max times. So better create a variable before the loop to carry the result. So befpre creating the vectorized version, measure the speed with a cleaner version like this:
hs = h^2;
h2 = 2 * h;
hs2 = 2 * hs;
c1 = 1/hs2;
g2 = 2 * gamma;
lp = zeros(Nx_max-1, Ny_max-1, t1);
for x = 2:Nx_max-1
c2 = (x-Nx-1)/g2;
for y = 2:Ny_max-1
lp(x,y,t1) = c1 * dxy(x,y) * p(x+1,y+1,t1) + ...
(dxx(x,y) / hs - vx(x,y) / h2 - c2) * p(x+1,y,t1) - ...
c1 * dxy(x,y) * p(x+1,y-1,t1) + ...
(dyy(x,y)/hs - vy(x,y)/h2 - (y-Ny-1)/g2) * p(x,y+1,t1) - ...
(2 * dxx(x,y) + 2 * dyy(x,y)) * p(x,y,t1) / hs + ...
(dyy(x,y)/hs + vy(x,y)/h2 +(y-Ny-1)/g2) * p(x,y-1,t1) - ...
dxy(x,y) / hs2 * p(x-1,y+1,t1) + ...
(dxx(x,y)/hs + vx(x,y) / h2 + c2) * p(x-1,y,t1) + ...
dxy(x,y) / hs2 * p(x-1,y-1,t1);
end
end
Then, in a next step, replace the loop over x by replacing all "x" by "(2:Nx_max-1)" and the operators "*" and "/" by their elementwise versions ".*" and "./". Test, if the result is still not changed except for rounding errors. Fix "(2:Nx_max-1)+1" to the faster "3:Nx_max", because in the 2nd version the index vector is not calculated exlicitly.
Finally you can try to replace the inner loop also, but perform a speed test afterwards. A complete vectorization is not necessarily the fastest version.
4 Comments
sasha
on 10 Aug 2014
Jan
on 10 Aug 2014
I still do not understand, why the elementwise mutliplications "did not work". Please post your code and the corresponding error message. Actually replacing the outer loop by inserting the index vector should be trivial. And I'd actually did this for you, if you post a running example, such that I could debug my suggestion.
Jan
on 10 Aug 2014
Here a guessed version of a vectorized outer loop:
hs = h^2;
h2 = 2 * h;
hs2 = 2 * hs;
g2 = 2 * gamma;
Nx1 = Nx_max - 1;
Nx2 = Nx_max - 2;
c2 = ((2 - Nx-1):(Nx_max - 2 - Nx)) / g2;
lp = zeros(Nx1, Ny_max-1, t1);
for y = 2:Ny_max-1
c3 = dxy(2:Nx1,y) ./ hs2;
c4 = dxy(2:Nx1,y) ./ hs2;
c5 = vy(2:Nx1,y) ./ h2;
c6 = vx(2:Nx1,y) ./ h2;
c7 = dyy(2:Nx1,y) ./ hs;
lp(2:Nx1, y, t1) = ...
c3 .* p(3:Nx_max,y+1,t1) + ...
(dxx(2:Nx1,y) ./ hs - c6 - c2) .* p(3:Nx_max,y,t1) - ...
c3 .* p(3:Nx_max,y-1,t1) + ...
(c7 - c5 - (y-Ny-1) ./ g2) .* p(2:Nx1,y+1,t1) - ...
(dxx(2:Nx1,y) + dyy(2:Nx1,y)) .* p(2:Nx1,y,t1) .* (2 ./ hs) + ...
(c7 + c5 + (y-Ny-1) ./ g2) .* p(2:Nx1,y-1,t1) - ...
c4 .* p(1:Nx2,y+1,t1) + ...
(dxx(2:Nx1,y) ./ hs + c6 + c2) .* p(1:Nx2,y,t1) + ...
c4 .* p(1:Nx2,y-1,t1);
end
Unfortunately I'm in doubt that this produces still the same results, because this kind of editing is prone to typos. But you see the strategy and can replace the repeated calculations step by step by your own and check, if teh results are not changed.
Categories
Find more on Loops and Conditional Statements in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!