How to replace a "for loop" with vectorization?

Question

0 votes

Hi everyone, I have built in Simulink a user-defined function block for solving a set of algebraic nonlinear equations (with sine and cosine). It works alright. However, I used for loop to construct the equations, which is very slow when the system size increases. It takes up to one hour to solve a system consisting of 136 equations with a reasonable accuracy and step size. I have read that vectorizing can enormously speed up the simulation. I was not able to do so, however. Any help is greatly appreciated. This is my code (part of it):

function outt = DAE_Full01(x)
n=68;m=16;
%======= Defining the variables=====================
S=1;
for i1=1:m
Id(i1)=x(S);S=S+1;
end
for i2=1:m
Iq(i2)=x(S);S=S+1;
end
for i3=1:n
V(i3)=x(S);S=S+1;
end
for i4=1:n
TH(i4)=x(S);S=S+1;
end
%===End of defining the variables===============
%=== Construct the equations ================
for i6=1:m
SE1(i6,:)=Edp(i6)-V(i6)*sin(D(i6)-TH(i6))-Rs(i6)*Id(i6)+Xqp(i6)*Iq(i6);
SE2(i6,:)=Eqp(i6)-V(i6)*cos(D(i6)-TH(i6))-Rs(i6)*Iq(i6)-Xdp(i6)*Id(i6);
end
.
.
.
[xx ]=solve...

Where Rs, Xdp, Xqp are constant values. Id, Iq, Edp, Eqp, and D are inputs to the block (variables) with size m. x0 is a vector of initial values already defined. How one can replace the above for loop with vectorization form?

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Stephan on 3 May 2018

Edited: Stephan on 3 May 2018

Open in MATLAB Online

1 vote

Hi,

I finished ;-)

You could change the code as follows:

1.) To replace the nested loop this is needed:

[b, a] = ndgrid(1:m, 1:n);

2.) I made a function for this calculation because the Performance of a function against a script is usually better. I tested both of it and found the function faster. Call the function this way:

[PV1, PV2] = vectorized_run(a, b, m,n, V, TH, Iq, Id, IC, Yabs, Yang, D);

3.) Due to matlab wants defintions of a function behind all other code, define the function as follows:

function [PV1, PV2] = vectorized_run(a, b, m, n, V, TH, Iq, Id, IC, Yabs, Yang, D)
% The first step saves calculation time, because the first part of both nested
% loops is the same for calculating sum1 and sum2 - same for PV_common
sum_common = V(b).*V(a) .* Yabs(1:m,1:n);
sum1 = sum((sum_common .* cos(TH(b)-TH(a)-Yang(1:m,1:n))),2);
sum2 = sum((sum_common .* sin(TH(b)-TH(a)-Yang(1:m,1:n))),2);
PV_common = (Id(1:m) .* V(1:m));
PV1 = diag(PV_common .* (sin(D(1:m)-TH(1:m))) + Iq(1:m).*V(1:m) .* (cos(D(1:m)-TH(1:m))) - IC(1:m,5) - sum1(1:m));
PV2 = diag(PV_common .* (cos(D(1:m)-TH(1:m))) - Iq(1:m).*V(1:m) .* (sin(D(1:m)-TH(1:m))) - IC(1:m,6) - sum2(1:m));
end

Thats it ;-)

When i tested with different sizes of the problem the vectorized code took 33...40% less time to calculate PV1 and PV2. I guess that's a pretty good result.

Please test for your purposes and give me a Feedback to this.

Best regards

Stephan

11 Comments
Show 9 older comments Hide 9 older comments

Stephan on 3 May 2018

Edited: Stephan on 3 May 2018

Hello Ishmael,

When I last wrote, I was in the car and could not answer in detail. Between my comment from this morning, in which I described the worse performance of the vectorized version of your code and the solution that occurred to me, there were not two hours. During this time, I sat in a meeting and could not clarify in time that the problem described is no more.

Unfortunately, you had deleted the files in the meantime and I had no gullibility to look at these. I had not downloaded it yet.

For me that was unimportant at first, because then I was successful to revise your code by means of vectorization. So I did not edit your files, but used your code snippet as a template and created the vectorized version in parallel.

Together with the matrices and vectors (created by random and a little try), which were required for testing, I could then test whether:

1.) I get correct results that match the calculations of your code

2.) How big (or small) is the performance gain over your variant with the nested loops

I have inserted the essential code excerpts in my answer. If you want, I can also look at the whole again in your files. Then you would have to make this available to me again. Another variation would be to just upload what I've written - but that's a bit dirty due to the debugging activity I've done and maybe not necessarily obvious to others.

It does not matter to me, you can tell me what you prefer.

In this context, I would like to point out a question asked by me in the forum, which takes up your problem - but so that your code does not appear there. There was a very interesting answer with possible approaches to further improvements in performance given by John D'Errico which is a power user of Matlab.

You can find it here .

P.S .: If it is important for your dissertation that certain data does not become public before you publish your work, you can also send me an e-mail and send the files along the way. About my user profile, it should be possible to send me an e-mail.

Best regards

Stephan

Stephan on 4 May 2018

Hello,

I'm done with the DAE file. On my computer, the processing time for running the DAE module with size m=4 and n=11 has improved from 856ms to 626ms. That means 230ms faster and is about 26.8%:

I'd expect the changes to be more significant for bigger issues - but you'll have to answer that when you've tested it.

In many places vectorization did not make sense in other places it had a positive impact - the effect of the preallocation of the variables (proposed by John - on my question) is significantly greater. With my knowledge, I can not improve the code any more - maybe others here can do some more.

My other suggestions are:

1.) At the moment the two lines of code which declare the global variables (I made 2 of them) have the largest share - with almost 42% (corresponding to 262ms (!) For the problem size of m = 4, n = 11 - on my machine) of the computation time in the DAE file. Maybe you should try to find a more efficient approach to this - I would ask that in your place in the forum. That would save you a lot of computing time. This is currently the biggest problem in your code in terms of performance.

2.) I also wanted to solve the problem with sum sum3 and sum4 like with sum1 and sum2 (which worked very well) - but I did not succeed. If you solve this problem, you will win about 7.5% (equivalent to 47ms for the problem size of m = 4, n = 11 - on my machine).

3.) What you could still try out would be to rewrite the existing for loops in the DAE file with the parallel computing toolbox in parfor loops and see if that solves the problem more efficiently. But I have no experience - you just have to try it.

If you only could solve the problems 1.) and 2.) i guess you can save about 200...250ms or even more.

Please keep me up to date on the performance of major problems and whether you have come along with the approach of parallel calculations.

I will send you the edited file by e-mail..

Bye for now

Stephan

Stephan on 4 May 2018

Maybe you hear from me next time again. Got some ideas but i want to try first and think about it

Best regards

Stephan

Stephan on 6 May 2018

Edited: Stephan on 6 May 2018

Hi Ismael,

i have some pretty good news for you. i could solve the problems regarding the global variables and improve the performance of sum3 and sum4 so that we got this result:

To reach this all funtions are now in only .m-file. The overall execution time reaced <1s and the part for solving DAE is now clearly below 600ms - in the attached test it is 576ms.

You should still keep both variants, because my computer today obviously has a good day and the variant of previously also achieved very similar results. I can not say which of the two variants works better in the end for larger problems - please test. Therefore, I will send you 3 files by email:

1.) Main_vec.m & DAE_vec.m

These now additonaly include the improved code for sum3 and sum4 for optimal performance, but still work with global variables.

2.) solve_DAE_fast.m

This file contains all the improvements that have been incorporated into the other variant, but eliminates the global variables. If you want to know more about how this works, have a look at this link:

https://de.mathworks.com/help/matlab/math/parameterizing-functions.html

So my suggestions for the improvement of your code are implemented and now I really have nothing more to improve ... (or maybe even later? ;-)

Best regards and keep me up to date please

Stephan

Sign in to comment.

Answer 2

Stephan on 1 May 2018

Edited: Stephan on 3 May 2018

Open in MATLAB Online

0 votes

Hi,

You can find a good start to this topic in the link below.

https://de.mathworks.com/help/matlab/matlab_prog/vectorization.html?searchHighlight=vectorization&s_tid=doc_srchtitle

You assign the decision variables x(1) ... x(168) in the 4 loops by increasing S successively. Instead you could omit S and take advantage of the known quantities of m and n like this:

Id(1:m)=x(1:m);               % x(1) ... x(16)
Iq(1:m)=x(m+1:2*m);           % x(17) ... x(32)
V(1:n)=x(2*m+1:2*m+n);        % x(33) ... x(100)
TH(1:n)=x(2*m+1+n:2*m+2*n);   % x(101) ... x(168))

I hope that's what you wanted to achieve in your code.

The equations below in the code should also work without the for loop:

SE1(1:m,:)=Edp(1:m)-V(1:m)*sin(D(1:m)-TH(1:m))-Rs(1:m)*Id(1:m)+Xqp(1:m)*Iq(1:m);
SE2(1:m,:)=Eqp(1:m)-V(1:m)*cos(D(1:m)-TH(1:m))-Rs(1:m)*Iq(1:m)-Xdp(1:m)*Id(1:m);

Since I could not test the code, I would be very grateful for any feedback as to whether it works. I have found that with every help that I offer here in the forum, I am learning and would be happy if you contributed with your feedback.

I would also be interested to know if you have seen an improvement in the performance of your code and how large it is.

Best regards

Stephan

10 Comments
Show 8 older comments Hide 8 older comments

Stephan on 2 May 2018

Edited: Stephan on 2 May 2018

Open in MATLAB Online

Hi,

i have some questions about the code, as I suspect there might be inconsistencies. Please help me to understand:

1.) sum1 & sum2 are scalars and should be scalars?

2.) after every run through the inner loop is finished your code says:

    sum1(i6)=sum1
    sum2(i6)=sum2

this will turn your scalar values for sum1 and sum2 into a vector which looks like this:

i6 = 1:

sum1 = [Value_of_sum1]
sum2 = [Value_of_sum2]

i6 = 2:

sum1 = [Value_of_sum1, Value_of_sum1]
sum2 = [Value_of_sum2, Value_of_sum2]

i6 = 3

sum1 = [Value_of_sum1, 0, Value_of_sum1]
sum2 = [Value_of_sum2, 0, Value_of_sum2]

i6 = 4

sum1 = [Value_of_sum1, 0, 0, Value_of_sum1]
sum2 = [Value_of_sum2, 0, 0, Value_of_sum2]

...

i6 = 16:

sum1 = [Value_of_sum1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, Value_of_sum1]
sum2 = [Value_of_sum2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, Value_of_sum2]

Is this what you wanted to do?

At the moment, this does not play a decisive role in the coding you are using because you also call this index in the formulas for PV1 and PV2:

PV1 (i6, :) = ... -sum1 (i6)
PV2 (i6, :) = ... -sum2 (i6)

If I understood that correctly, you calculate the scalars sum1 and sum2 over 68 loop passes in the inner loop. Repeat this procedure 16 times with the outer loop, with the summations sum1 and sum2 starting again from zero. With these scalars you go into the calculation of the values for PV1 and PV2, where each one comes out a vector that is 16x1 large.

If that was really understood by me, you could therefore consider the sums sum1 and sum2 independently of the rest of the calculation and represent the sums as two vectors, each 16x1 large and can be traversed in the calculation of PV1 and PV2 , I suspect that is also thought so. Currently this does not happen in the form - see above.

Before it makes any sense to continue thinking about the vectorization, I want to understand the whole and to eliminate possible errors that are contained.

Please try to explain it to me clearly.

Best regards

Stephan

Ismaeel on 3 May 2018

Thanks, Stephan. You did your best, and sorry for taking your time. Please just delete the files, appreciate it. I don't think generating matrices is precise or practical. If you want to ask about the problem, please don't mention this post or a piece of the code. Thanks once again.

Stephan on 3 May 2018

Edited: Stephan on 3 May 2018

Hi,

i could find the issue... the sums sum1 and sum2 work fine now. i saved about 50% calculation time for the loops for the m=68 and n=16 dimensions. Excited to hear about the runtime behavior for larger problems.

This works for both sums, but still i have to look at the PV values.

The matrices and vectors i have made are only for comparision of the results with your formulas to eliminate possible errors that i could have made. If my calculation brings the same values as your calculation i made no errors. I guess that should work for this purpose.

I keep you updated ;-)

Best regards

Stephan

Sign in to comment.

How to replace a "for loop" with vectorization?

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

11 Comments
Show 9 older comments Hide 9 older comments

More Answers (1)

10 Comments
Show 8 older comments Hide 8 older comments

Categories

Products

Tags

Community Treasure Hunt

How to replace a "for loop" with vectorization?

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

11 Comments Show 9 older comments Hide 9 older comments

More Answers (1)

10 Comments Show 8 older comments Hide 8 older comments

Categories

Products

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

11 Comments
Show 9 older comments Hide 9 older comments

10 Comments
Show 8 older comments Hide 8 older comments