How to replace a "for loop" with vectorization?

Hi everyone, I have built in Simulink a user-defined function block for solving a set of algebraic nonlinear equations (with sine and cosine). It works alright. However, I used for loop to construct the equations, which is very slow when the system size increases. It takes up to one hour to solve a system consisting of 136 equations with a reasonable accuracy and step size. I have read that vectorizing can enormously speed up the simulation. I was not able to do so, however. Any help is greatly appreciated. This is my code (part of it):
function outt = DAE_Full01(x)
n=68;m=16;
%======= Defining the variables=====================
S=1;
for i1=1:m
Id(i1)=x(S);S=S+1;
end
for i2=1:m
Iq(i2)=x(S);S=S+1;
end
for i3=1:n
V(i3)=x(S);S=S+1;
end
for i4=1:n
TH(i4)=x(S);S=S+1;
end
%===End of defining the variables===============
%=== Construct the equations ================
for i6=1:m
SE1(i6,:)=Edp(i6)-V(i6)*sin(D(i6)-TH(i6))-Rs(i6)*Id(i6)+Xqp(i6)*Iq(i6);
SE2(i6,:)=Eqp(i6)-V(i6)*cos(D(i6)-TH(i6))-Rs(i6)*Iq(i6)-Xdp(i6)*Id(i6);
end
.
.
.
[xx ]=solve...
Where Rs, Xdp, Xqp are constant values. Id, Iq, Edp, Eqp, and D are inputs to the block (variables) with size m. x0 is a vector of initial values already defined. How one can replace the above for loop with vectorization form?

 Accepted Answer

Stephan
Stephan on 3 May 2018
Edited: Stephan on 3 May 2018
Hi,
I finished ;-)
You could change the code as follows:
1.) To replace the nested loop this is needed:
[b, a] = ndgrid(1:m, 1:n);
2.) I made a function for this calculation because the Performance of a function against a script is usually better. I tested both of it and found the function faster. Call the function this way:
[PV1, PV2] = vectorized_run(a, b, m,n, V, TH, Iq, Id, IC, Yabs, Yang, D);
3.) Due to matlab wants defintions of a function behind all other code, define the function as follows:
function [PV1, PV2] = vectorized_run(a, b, m, n, V, TH, Iq, Id, IC, Yabs, Yang, D)
% The first step saves calculation time, because the first part of both nested
% loops is the same for calculating sum1 and sum2 - same for PV_common
sum_common = V(b).*V(a) .* Yabs(1:m,1:n);
sum1 = sum((sum_common .* cos(TH(b)-TH(a)-Yang(1:m,1:n))),2);
sum2 = sum((sum_common .* sin(TH(b)-TH(a)-Yang(1:m,1:n))),2);
PV_common = (Id(1:m) .* V(1:m));
PV1 = diag(PV_common .* (sin(D(1:m)-TH(1:m))) + Iq(1:m).*V(1:m) .* (cos(D(1:m)-TH(1:m))) - IC(1:m,5) - sum1(1:m));
PV2 = diag(PV_common .* (cos(D(1:m)-TH(1:m))) - Iq(1:m).*V(1:m) .* (sin(D(1:m)-TH(1:m))) - IC(1:m,6) - sum2(1:m));
end
Thats it ;-)
When i tested with different sizes of the problem the vectorized code took 33...40% less time to calculate PV1 and PV2. I guess that's a pretty good result.
Please test for your purposes and give me a Feedback to this.
Best regards
Stephan

11 Comments

Hi again,
i tested calculation times for different sizes on my machine:
  • m = 16, n = 68 --> ~ 40...45% faster
  • m = 160, n = 680 --> ~ 45...55% faster
  • m = 1600, n = 6800 --> ~ 20...30% faster
m =
160
n =
680
t_for =
0.0245
t_vec =
0.0122
Improvement_in_percent =
-50.4252
or
m =
1600
n =
6800
t_for =
1.4106
t_vec =
0.9789
Improvement_in_percent =
-30.6025
Best regards
Stephan
Thank you Stephasn for your time and efforts. Could you please upload your modified files so that I look into it? Thanks
Stephan
Stephan on 3 May 2018
Edited: Stephan on 3 May 2018
Hello Ishmael,
When I last wrote, I was in the car and could not answer in detail. Between my comment from this morning, in which I described the worse performance of the vectorized version of your code and the solution that occurred to me, there were not two hours. During this time, I sat in a meeting and could not clarify in time that the problem described is no more.
Unfortunately, you had deleted the files in the meantime and I had no gullibility to look at these. I had not downloaded it yet.
For me that was unimportant at first, because then I was successful to revise your code by means of vectorization. So I did not edit your files, but used your code snippet as a template and created the vectorized version in parallel.
Together with the matrices and vectors (created by random and a little try), which were required for testing, I could then test whether:
1.) I get correct results that match the calculations of your code
2.) How big (or small) is the performance gain over your variant with the nested loops
I have inserted the essential code excerpts in my answer. If you want, I can also look at the whole again in your files. Then you would have to make this available to me again. Another variation would be to just upload what I've written - but that's a bit dirty due to the debugging activity I've done and maybe not necessarily obvious to others.
It does not matter to me, you can tell me what you prefer.
In this context, I would like to point out a question asked by me in the forum, which takes up your problem - but so that your code does not appear there. There was a very interesting answer with possible approaches to further improvements in performance given by John D'Errico which is a power user of Matlab.
You can find it here .
P.S .: If it is important for your dissertation that certain data does not become public before you publish your work, you can also send me an e-mail and send the files along the way. About my user profile, it should be possible to send me an e-mail.
Best regards
Stephan
Dear Stephan, I am uploading the code one more time, only to get some possible improvements but not for sharing. If you come up with an alternative method better than mine, please upload it to this comment at any time. Yes, I am a novice and need help from anyone can help. Please look into it one more time, thankfully. If you found yourself not interested in my problem, that is ok, I appreciate your efforts and time spent on it, thank you for it. If you hopefully find a faster and efficient way than the for loops, please let me know even after a week or a month. If you want to ask the group about it, you are also free. Meanwhile, I will try to read and make it better. Thanks
Hi,
i have now downloaded it - feel free to delete the files.
I tend to finish accepted tasks - I'll get in touch when I'm ready.
In the meantime, it would be interesting to know what your dissertation is about. A short - understandable to laymen - description of your doctoral project would be interesting. I proudly told my wife that I could help you with your work ;-)
Best regards
Stephan
Hi,
first (easier) part is done. As you said, this part is not particularly involved in accelerating the calculation. The performance gain for this small example was in the range of a few %. The share of DAE clearly outweighs. I'll take a look at this file tomorrow - now I'm going to bed.
Best regards
Stephan
Hello,
I'm done with the DAE file. On my computer, the processing time for running the DAE module with size m=4 and n=11 has improved from 856ms to 626ms. That means 230ms faster and is about 26.8%:
I'd expect the changes to be more significant for bigger issues - but you'll have to answer that when you've tested it.
In many places vectorization did not make sense in other places it had a positive impact - the effect of the preallocation of the variables (proposed by John - on my question) is significantly greater. With my knowledge, I can not improve the code any more - maybe others here can do some more.
My other suggestions are:
1.) At the moment the two lines of code which declare the global variables (I made 2 of them) have the largest share - with almost 42% (corresponding to 262ms (!) For the problem size of m = 4, n = 11 - on my machine) of the computation time in the DAE file. Maybe you should try to find a more efficient approach to this - I would ask that in your place in the forum. That would save you a lot of computing time. This is currently the biggest problem in your code in terms of performance.
2.) I also wanted to solve the problem with sum sum3 and sum4 like with sum1 and sum2 (which worked very well) - but I did not succeed. If you solve this problem, you will win about 7.5% (equivalent to 47ms for the problem size of m = 4, n = 11 - on my machine).
3.) What you could still try out would be to rewrite the existing for loops in the DAE file with the parallel computing toolbox in parfor loops and see if that solves the problem more efficiently. But I have no experience - you just have to try it.
If you only could solve the problems 1.) and 2.) i guess you can save about 200...250ms or even more.
Please keep me up to date on the performance of major problems and whether you have come along with the approach of parallel calculations.
I will send you the edited file by e-mail..
Bye for now
Stephan
Thank you so much Stephan. That is a lot of work and time you spent on it, really appreciate it. I like your above analysis too. Note that I have already tried parfor but it was worse than using for, at least for my code. Thanks one more time.
Maybe you hear from me next time again. Got some ideas but i want to try first and think about it
Best regards
Stephan
Stephan
Stephan on 6 May 2018
Edited: Stephan on 6 May 2018
Hi Ismael,
i have some pretty good news for you. i could solve the problems regarding the global variables and improve the performance of sum3 and sum4 so that we got this result:
To reach this all funtions are now in only .m-file. The overall execution time reaced <1s and the part for solving DAE is now clearly below 600ms - in the attached test it is 576ms.
You should still keep both variants, because my computer today obviously has a good day and the variant of previously also achieved very similar results. I can not say which of the two variants works better in the end for larger problems - please test. Therefore, I will send you 3 files by email:
1.) Main_vec.m & DAE_vec.m
These now additonaly include the improved code for sum3 and sum4 for optimal performance, but still work with global variables.
2.) solve_DAE_fast.m
This file contains all the improvements that have been incorporated into the other variant, but eliminates the global variables. If you want to know more about how this works, have a look at this link:
So my suggestions for the improvement of your code are implemented and now I really have nothing more to improve ... (or maybe even later? ;-)
Best regards and keep me up to date please
Stephan

Sign in to comment.

More Answers (1)

Stephan
Stephan on 1 May 2018
Edited: Stephan on 3 May 2018
Hi,
You can find a good start to this topic in the link below.
You assign the decision variables x(1) ... x(168) in the 4 loops by increasing S successively. Instead you could omit S and take advantage of the known quantities of m and n like this:
Id(1:m)=x(1:m); % x(1) ... x(16)
Iq(1:m)=x(m+1:2*m); % x(17) ... x(32)
V(1:n)=x(2*m+1:2*m+n); % x(33) ... x(100)
TH(1:n)=x(2*m+1+n:2*m+2*n); % x(101) ... x(168))
I hope that's what you wanted to achieve in your code.
The equations below in the code should also work without the for loop:
SE1(1:m,:)=Edp(1:m)-V(1:m)*sin(D(1:m)-TH(1:m))-Rs(1:m)*Id(1:m)+Xqp(1:m)*Iq(1:m);
SE2(1:m,:)=Eqp(1:m)-V(1:m)*cos(D(1:m)-TH(1:m))-Rs(1:m)*Iq(1:m)-Xdp(1:m)*Id(1:m);
Since I could not test the code, I would be very grateful for any feedback as to whether it works. I have found that with every help that I offer here in the forum, I am learning and would be happy if you contributed with your feedback.
I would also be interested to know if you have seen an improvement in the performance of your code and how large it is.
Best regards
Stephan

10 Comments

Thank you Stephan for your brilliant idea. To run my entire system, I need to modify this last for-loop as what you proposed for the first loop:
for i6=1:m
sum1=0;sum2=0;
for k=1:n
sum1=sum1+ V(i6)*V(k)*Yabs(i6,k)*cos(TH(i6)-TH(k)-Yang(i6,k));
sum2=sum2+ V(i6)*V(k)*Yabs(i6,k)*sin(TH(i6)-TH(k)-Yang(i6,k));
end
sum1(i6)=sum1;sum2(i6)=sum2;
PV1(i6,:)=Id(i6)*V(i6)*sin(D(i6)-TH(i6))+Iq(i6)*V(i6)*cos(D(i6)-TH(i6))-IC(i6,5)-sum1(i6);
PV2(i6,:)=Id(i6)*V(i6)*cos(D(i6)-TH(i6))-Iq(i6)*V(i6)*sin(D(i6)-TH(i6))-IC(i6,6)-sum2(i6);
end
Note that sum1 and sum2 are used later after closing the loop. Note also IC is initial condition matrix with size nX6, Yabs and Yang are constant matrices with sizes nXn. If you can find a similar alternative to the above loop, you will solve a major problem in my system. I will have to include your name in my dissertation as an acknowledgment letter. Please your help Thanks in advance
Hi,
nice that i could help.
I'll think about this. But now I have to go to work. I probably will not come to this today - but I'll stick to it.
Best regards
Stephan
I appreciate it, Stephan. Please at your convenient time. Thanks
Hi,
i have some questions about the code, as I suspect there might be inconsistencies. Please help me to understand:
1.) sum1 & sum2 are scalars and should be scalars?
2.) after every run through the inner loop is finished your code says:
sum1(i6)=sum1
sum2(i6)=sum2
this will turn your scalar values for sum1 and sum2 into a vector which looks like this:
i6 = 1:
sum1 = [Value_of_sum1]
sum2 = [Value_of_sum2]
i6 = 2:
sum1 = [Value_of_sum1, Value_of_sum1]
sum2 = [Value_of_sum2, Value_of_sum2]
i6 = 3
sum1 = [Value_of_sum1, 0, Value_of_sum1]
sum2 = [Value_of_sum2, 0, Value_of_sum2]
i6 = 4
sum1 = [Value_of_sum1, 0, 0, Value_of_sum1]
sum2 = [Value_of_sum2, 0, 0, Value_of_sum2]
...
i6 = 16:
sum1 = [Value_of_sum1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, Value_of_sum1]
sum2 = [Value_of_sum2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, Value_of_sum2]
Is this what you wanted to do?
At the moment, this does not play a decisive role in the coding you are using because you also call this index in the formulas for PV1 and PV2:
PV1 (i6, :) = ... -sum1 (i6)
PV2 (i6, :) = ... -sum2 (i6)
If I understood that correctly, you calculate the scalars sum1 and sum2 over 68 loop passes in the inner loop. Repeat this procedure 16 times with the outer loop, with the summations sum1 and sum2 starting again from zero. With these scalars you go into the calculation of the values for PV1 and PV2, where each one comes out a vector that is 16x1 large.
If that was really understood by me, you could therefore consider the sums sum1 and sum2 independently of the rest of the calculation and represent the sums as two vectors, each 16x1 large and can be traversed in the calculation of PV1 and PV2 , I suspect that is also thought so. Currently this does not happen in the form - see above.
Before it makes any sense to continue thinking about the vectorization, I want to understand the whole and to eliminate possible errors that are contained.
Please try to explain it to me clearly.
Best regards
Stephan
Dear Stephan, Thank you for your detailed reply. Please note that sum1 sum2 are vectors. Yes, they are scalar inside the loop. To make it easy for you, I attached my simplified code for a small system (11 nodes), which takes only a few seconds to get it solved. However, I want to solve a system with 2007 nodes, and with the detailed model which is double than the attached one. Also, my program must be working in Simulink, it is working but takes hours. So, my attempt is to vectorize the for-loops inside the second file called DAE. The loops in the initial values calculation phase can be easily converted to vector as you proposed; I can do that part later. Please note that for some reasons, I want to delete the attached files after the end of this discussion, if allowed.
Hi,
there is already progress in vectorizing your code, but I'm not quite finished yet. At the same time, the topic described by me on sum1 and sum2 as a "waste product" is also solved. I'll take a look at the files and also delete them when I do not need them anymore.
Best regards
Stephan
Ismaeel
Ismaeel on 2 May 2018
Edited: Ismaeel on 2 May 2018
That is good news, so excited to see it vectorized. Please note that, for some reasons, I do not get a notification when there is a new comment to my post; so I have to check it regularly, sorry if I lately reply. Thanks
Hi,
I'm done, but I'm still not satisfied with the result. I have generated needed matrices and vectors by means of a random generator and then saved these values as a .mat file. It turns out that the vectorized code does not run faster, or even slower, to be precise. I'm amazed about that and would like to understand why that is. I am also considering whether to post these two code snippets together with my .mat file as a question here in the forum, because it can be that the very experienced power users here can solve this new unexpected problem - provided you agree. I had not expected such a result. First, I'll see if I can find a solution myself.
Best regards
Stephan
Thanks, Stephan. You did your best, and sorry for taking your time. Please just delete the files, appreciate it. I don't think generating matrices is precise or practical. If you want to ask about the problem, please don't mention this post or a piece of the code. Thanks once again.
Stephan
Stephan on 3 May 2018
Edited: Stephan on 3 May 2018
Hi,
i could find the issue... the sums sum1 and sum2 work fine now. i saved about 50% calculation time for the loops for the m=68 and n=16 dimensions. Excited to hear about the runtime behavior for larger problems.
This works for both sums, but still i have to look at the PV values.
The matrices and vectors i have made are only for comparision of the results with your formulas to eliminate possible errors that i could have made. If my calculation brings the same values as your calculation i made no errors. I guess that should work for this purpose.
I keep you updated ;-)
Best regards
Stephan

Sign in to comment.

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Asked:

on 30 Apr 2018

Edited:

on 8 Jun 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!