Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Avoiding loops

Subject: Avoiding loops

From: Jose

Date: 14 Jan, 2009 09:58:02

Message: 1 of 7

Can anyone help me with is: I want to do this code more efficient,
can I vectorize the first loop of the parameter mu?

Thanks,

Jose.

   nv=1000;
   nx=2000;
    x=rand(nx,2); % data 2D
    sigma=cov(x)
    y=zeros(1,nx);
        

    for j=1:nv % Loop in free parameter mu
    mu= [rand(1,1) rand(1,1)];
    
    for i=1:nx % Loop to calculate the likelihood function (lf)
    ir=round(1+(nx-1)*rand(1,1));
    y(i)=(x(ir,:)-mu)*inv(sigma)*(x(ir,:)-mu)';
    end
    
    lf=0.5*sum(y)+nx*0.5*log((det(sigma)))+nx*0.5*d*log(2.0*pi); % likelihood
    p1(j)=lf;
    end
    p=exp(-sum(p1(j))/nv)
    

Subject: Avoiding loops

From: Doug Hull

Date: 19 Jan, 2009 19:56:01

Message: 2 of 7

This code did not work, but I added
d= 1;

to the beginning, then it did.

using the profiler, saw that this line:

> y(i)=(x(ir,:)-mu)*inv(sigma)*(x(ir,:)-mu)';

was the problem, I did not know what part was slow, so I broke it into a smaller set of lines:

    aaa = (x(ir,:)-mu);
    bbb = inv(sigma);
    ccc = (x(ir,:)-mu)';
    y(i)=aaa*bbb*ccc;

Now, the profiler can see that the bottleneck is on line bbb.

This value never changes, there i no reason to calculated it every time through the nested for loops.

I moved it outside the loops, and cut 50% of the time.

Doug

FINAL CODE:

-------------
d = 1;
nv=500;
   nx=400;
    x=rand(nx,2); % data 2D
    sigma=cov(x);
    bbb = inv(sigma); %%%%% MOVED
    y=zeros(1,nx);
        

    for j=1:nv % Loop in free parameter mu
    mu= [rand(1,1) rand(1,1)];
    
    for i=1:nx % Loop to calculate the likelihood function (lf)
    ir=round(1+(nx-1)*rand(1,1));
    aaa = (x(ir,:)-mu);

    ccc = (x(ir,:)-mu)';
    y(i)=aaa*bbb*ccc;
    end
    
    lf=0.5*sum(y)+nx*0.5*log((det(sigma)))+nx*0.5*d*log(2.0*pi); % likelihood
    p1(j)=lf;
    end
    p=exp(-sum(p1(j))/nv)
---------------------

"Jose " <jose.l.vega@gmail.com> wrote in message <gkkcva$4ka$1@fred.mathworks.com>...
> Can anyone help me with is: I want to do this code more efficient,
> can I vectorize the first loop of the parameter mu?
>
> Thanks,
>
> Jose.
>
> nv=1000;
> nx=2000;
> x=rand(nx,2); % data 2D
> sigma=cov(x)
> y=zeros(1,nx);
>
>
> for j=1:nv % Loop in free parameter mu
> mu= [rand(1,1) rand(1,1)];
>
> for i=1:nx % Loop to calculate the likelihood function (lf)
> ir=round(1+(nx-1)*rand(1,1));
> y(i)=(x(ir,:)-mu)*inv(sigma)*(x(ir,:)-mu)';
> end
>
> lf=0.5*sum(y)+nx*0.5*log((det(sigma)))+nx*0.5*d*log(2.0*pi); % likelihood
> p1(j)=lf;
> end
> p=exp(-sum(p1(j))/nv)
>

Subject: Avoiding loops

From: Jose

Date: 20 Jan, 2009 12:30:04

Message: 3 of 7

"Doug Hull" <hull@mathworks.SPAMPROOFcom> wrote in message <gl2lsh$1cv$1@fred.mathworks.com>...
> This code did not work, but I added
> d= 1;
>
> to the beginning, then it did.
>
> using the profiler, saw that this line:
>
> > y(i)=(x(ir,:)-mu)*inv(sigma)*(x(ir,:)-mu)';
>
> was the problem, I did not know what part was slow, so I broke it into a smaller set of lines:
>
> aaa = (x(ir,:)-mu);
> bbb = inv(sigma);
> ccc = (x(ir,:)-mu)';
> y(i)=aaa*bbb*ccc;
>
> Now, the profiler can see that the bottleneck is on line bbb.
>
> This value never changes, there i no reason to calculated it every time through the nested for loops.
>
> I moved it outside the loops, and cut 50% of the time.
>
> Doug
>
> FINAL CODE:
>
> -------------
> d = 1;
> nv=500;
> nx=400;
> x=rand(nx,2); % data 2D
> sigma=cov(x);
> bbb = inv(sigma); %%%%% MOVED
> y=zeros(1,nx);
>
>
> for j=1:nv % Loop in free parameter mu
> mu= [rand(1,1) rand(1,1)];
>
> for i=1:nx % Loop to calculate the likelihood function (lf)
> ir=round(1+(nx-1)*rand(1,1));
> aaa = (x(ir,:)-mu);
>
> ccc = (x(ir,:)-mu)';
> y(i)=aaa*bbb*ccc;
> end
>
> lf=0.5*sum(y)+nx*0.5*log((det(sigma)))+nx*0.5*d*log(2.0*pi); % likelihood
> p1(j)=lf;
> end
> p=exp(-sum(p1(j))/nv)
> ---------------------
>
> "Jose " <jose.l.vega@gmail.com> wrote in message <gkkcva$4ka$1@fred.mathworks.com>...
> > Can anyone help me with is: I want to do this code more efficient,
> > can I vectorize the first loop of the parameter mu?
> >
> > Thanks,
> >
> > Jose.
> >
> > nv=1000;
> > nx=2000;
> > x=rand(nx,2); % data 2D
> > sigma=cov(x)
> > y=zeros(1,nx);
> >
> >
> > for j=1:nv % Loop in free parameter mu
> > mu= [rand(1,1) rand(1,1)];
> >
> > for i=1:nx % Loop to calculate the likelihood function (lf)
> > ir=round(1+(nx-1)*rand(1,1));
> > y(i)=(x(ir,:)-mu)*inv(sigma)*(x(ir,:)-mu)';
> > end
> >
> > lf=0.5*sum(y)+nx*0.5*log((det(sigma)))+nx*0.5*d*log(2.0*pi); % likelihood
> > p1(j)=lf;
> > end
> > p=exp(-sum(p1(j))/nv)


Dear Doug,
thanks very much for your help and correct my msprint,
now this code works more efficiently.
But, still I would like to reduce the time of my code,
do you think using (parfor) instead of (for) in one loop with the
computer parallel toolbox can reduce time in this code?
Thanks in advance,
Jose.
> >

Subject: Avoiding loops

From: Doug Hull

Date: 20 Jan, 2009 14:42:01

Message: 4 of 7


> Dear Doug,
> thanks very much for your help and correct my msprint,
> now this code works more efficiently.
> But, still I would like to reduce the time of my code,
> do you think using (parfor) instead of (for) in one loop with the
> computer parallel toolbox can reduce time in this code?
> Thanks in advance,
> Jose.

Jose,

I would caution against premature optimization. Get your algorithm working and then see if it needs optimization. People very often can work at optimizing code without knowing if it really needs it.

That being said, your code is a candidate for "embarrassingly parallel". Essentially, you are doing a set number of iterations, 2,000,000 (nv * nx). Each of these iterations are completely independent of one another. Thus, you could do them in any order, or all at the same time (if you had 2,000,000 MATLAB sessions)

Basically, the more processors you throw at this, the faster it will go.

Here is a job time estimator that I made a few years ago. It should still be a reasonable guess at how long a job will take on a cluster.

http://www.mathworks.com/matlabcentral/fileexchange/13075

-Doug

Subject: Avoiding loops

From: Jose

Date: 21 Jan, 2009 15:45:03

Message: 5 of 7

"Doug Hull" <hull@mathworks.SPAMPROOFcom> wrote in message <gl4nrp$omn$1@fred.mathworks.com>...
>
> > Dear Doug,
> > thanks very much for your help and correct my msprint,
> > now this code works more efficiently.
> > But, still I would like to reduce the time of my code,
> > do you think using (parfor) instead of (for) in one loop with the
> > computer parallel toolbox can reduce time in this code?
> > Thanks in advance,
> > Jose.
>
> Jose,
>
> I would caution against premature optimization. Get your algorithm working and then see if it needs optimization. People very often can work at optimizing code without knowing if it really needs it.
>
> That being said, your code is a candidate for "embarrassingly parallel". Essentially, you are doing a set number of iterations, 2,000,000 (nv * nx). Each of these iterations are completely independent of one another. Thus, you could do them in any order, or all at the same time (if you had 2,000,000 MATLAB sessions)
>
> Basically, the more processors you throw at this, the faster it will go.
>
> Here is a job time estimator that I made a few years ago. It should still be a reasonable guess at how long a job will take on a cluster.
>
> http://www.mathworks.com/matlabcentral/fileexchange/13075
>
> -Doug

Hello Doug, thanks for your reply.
Now, I installed parallel computing toolbox to check it, but something is wrong
in my approach to get better time parallelizing it:

%code without parallelization

clc
clear all
d = 1;
nv=1000;
nx=2000;
x=rand(nx,2); % data 2D
sigma=cov(x);
bbb = inv(sigma); %%%%% MOVED
y=zeros(1,nx);

tic
 for j=1:nv % Loop in free parameter mu
 mu= [rand(1,1) rand(1,1)];

 for i=1:nx % Loop to calculate the likelihood function (lf)
 ir=round(1+(nx-1)*rand(1,1));
 aaa = (x(ir,:)-mu);

 ccc = (x(ir,:)-mu)';
 y(i)=aaa*bbb*ccc;
 end

 lf=0.5*sum(y)+nx*0.5*log((det(sigma)))+nx*0.5*d*log(2.0*pi); % likelihood
 p1(j)=lf;
 end
 p=exp(-sum(p1(j))/nv)
 
 toc

22.60 sec.



%code using parallelization

Now, i start matlabpool and suply (parfor) by (for) in the second loop of nx:


Starting matlabpool using the parallel configuration 'local'.
Waiting for parallel job to start...
Connected to a matlabpool session with 4 labs.

clc
clear all
d = 1;
nv=1000;
nx=2000;
x=rand(nx,2); % data 2D
sigma=cov(x);
bbb = inv(sigma); %%%%% MOVED
y=zeros(1,nx);

tic
 for j=1:nv % Loop in free parameter mu
 mu= [rand(1,1) rand(1,1)];

 parfor (i=1:nx) % ONLY MODIFIED IT IN MY CODE.
 ir=round(1+(nx-1)*rand(1,1));
 aaa = (x(ir,:)-mu);

 ccc = (x(ir,:)-mu)';
 y(i)=aaa*bbb*ccc;
 end

 lf=0.5*sum(y)+nx*0.5*log((det(sigma)))+nx*0.5*d*log(2.0*pi); % likelihood
 p1(j)=lf;
 end
 p=exp(-sum(p1(j))/nv)
 
 toc

101.39 sec

The time is more slower paralelizing it,
Please Doug, do you know what I am doing wrong?

My laptop is a centrino duo ...i.e with two processors.

Thanks in advance.

Jose.
 

Subject: Avoiding loops

From: Doug Hull

Date: 21 Jan, 2009 16:41:02

Message: 6 of 7


> toc
>
> 22.60 sec.

> %code using parallelization
>
> Now, i start matlabpool and suply (parfor) by (for) in the second loop of nx:

> 101.39 sec
>
> The time is more slower paralelizing it,
> Please Doug, do you know what I am doing wrong?
>
> My laptop is a centrino duo ...i.e with two processors.

Jose,

There are several things that can lead to this.

1.) You are parallelizing over the INNER loop. I would have done it across the outer loop. You are setting up 1000 jobs of 1000 tasks each. There is fixed overhead with each job and task. I would recommend the outer loop, so you have 1 job of 1000 tasks.

2.) Your problem is very small by PCT standards, the fixed overhead may be too much to overcome.

3.) I am not sure how many MATLAB labs are working in your MATLAB pool.

Try changing the structure as specified in the first point, we can see what happens from there.

-Doug

Subject: Avoiding loops

From: Jose

Date: 21 Jan, 2009 17:07:01

Message: 7 of 7

"Doug Hull" <hull@mathworks.SPAMPROOFcom> wrote in message <gl7j6u$auh$1@fred.mathworks.com>...
>
> > toc
> >
> > 22.60 sec.
>
> > %code using parallelization
> >
> > Now, i start matlabpool and suply (parfor) by (for) in the second loop of nx:
>
> > 101.39 sec
> >
> > The time is more slower paralelizing it,
> > Please Doug, do you know what I am doing wrong?
> >
> > My laptop is a centrino duo ...i.e with two processors.
>
> Jose,
>
> There are several things that can lead to this.
>
> 1.) You are parallelizing over the INNER loop. I would have done it across the outer loop. You are setting up 1000 jobs of 1000 tasks each. There is fixed overhead with each job and task. I would recommend the outer loop, so you have 1 job of 1000 tasks.
>
> 2.) Your problem is very small by PCT standards, the fixed overhead may be too much to overcome.
>
> 3.) I am not sure how many MATLAB labs are working in your MATLAB pool.
>
> Try changing the structure as specified in the first point, we can see what happens from there.
>
> -Doug

Hello Doug, You right, now we improved a lot respect to the inner loop...
time=21.39, but still not so big diiferences wit my code without paralelization:
time=22.83.
As you say, my problem is very small for using PCT,
then, i will try to approach in other way to solve my montecarlo integration
of a more efficient form.
Thanks a lot for your help Doug,
Jose.

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us