Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Distributed Computing

Subject: Distributed Computing

From: Steffen

Date: 15 Mar, 2009 17:27:01

Message: 1 of 8

Hello,

I am kind of confused by the stuff I read the last days about the Distributed/Parallel Computing Toolbox, perhaps someone can give me some general advices.

Very briefly my problem. I have a 300x300 movie and each pixel contains in its third dimension a spectrum which I need to fit. Basically I run for all 90000 pixels two functions, a fitting routine and some math to convert the fitted peaks. Since this takes me about 2 days, I thought of distributing the fitting/conversion routine to other computers (4 to 8 PCs).

for i=1:300
    for j=1:300
        spectrum=series(i,j,:);
        peaks=func_fit_spectrum(spectrum);
        handles=func_peak_conversion(peaks);
    end
end

1) Would that task be solveable via distributed computing or am I missing some major pitfalls here?
2) Am I right that the other computers require Matlab and the distributed toolbox to be installed? How about the functions that are called, just copying them to the workers or is the toolbox automatically taking care of that?
3) The example given in the help somehow suggests just to replace the for loops by parfor loops. Does that hold true for this particular case with two for loops? Or do I have to cut the main image into small pieces myself and then send it to the workers?

Many thanks in advance for any advice, much appreciated!

Steffen

Subject: Distributed Computing

From: russell.fung@gmail.com

Date: 15 Mar, 2009 17:51:58

Message: 2 of 8

On Mar 15, 12:27=A0pm, "Steffen" <rile...@gmail.com> wrote:
> Hello,
>
> I am kind of confused by the stuff I read the last days about the Distrib=
uted/Parallel Computing Toolbox, perhaps someone can give me some general a=
dvices.
>
> Very briefly my problem. I have a 300x300 movie and each pixel contains i=
n its third dimension a spectrum which I need to fit. Basically I run for a=
ll 90000 pixels two functions, a fitting routine and some math to convert t=
he fitted peaks. Since this takes me about 2 days, I thought of distributin=
g the fitting/conversion routine to other computers (4 to 8 PCs).
>
> for i=3D1:300
> =A0 =A0 for j=3D1:300
> =A0 =A0 =A0 =A0 spectrum=3Dseries(i,j,:);
> =A0 =A0 =A0 =A0 peaks=3Dfunc_fit_spectrum(spectrum);
> =A0 =A0 =A0 =A0 handles=3Dfunc_peak_conversion(peaks);
> =A0 =A0 end
> end
>
> 1) Would that task be solveable via distributed computing or am I missing=
 some major pitfalls here?
> 2) Am I right that the other computers require Matlab and the distributed=
 toolbox to be installed? How about the functions that are called, just cop=
ying them to the workers or is the toolbox automatically taking care of tha=
t?
> 3) The example given in the help somehow suggests just to replace the for=
 loops by parfor loops. Does that hold true for this particular case with t=
wo for loops? Or do I have to cut the main image into small pieces myself a=
nd then send it to the workers?
>
> Many thanks in advance for any advice, much appreciated!
>
> Steffen

1) It seems that you have 300*300=3D90000 independent sets of analysis
to perform, I would say distributed computing will speed up the
solution to your problem greatly.
2) Let's say you have a cluster of M computers and you would like to
access this cluster from N clients (meaning M computers that actually
do the work, and N computers that can send jobs), then I think you
need M worker licenses and N parallel computing toolbox licenses.
3) My understanding is that you can't have nested parfor loops unless
the second level of parfor is in a function. So you can have one level
of parfor loop that calls a function, and within the function you can
have a second level of parfor loop. You need the parallel computing
toolbox to use parfor.

Russell

Subject: Distributed Computing

From: Steffen

Date: 15 Mar, 2009 18:09:01

Message: 3 of 8

Hi Russel,

thanks for the comment.
Am I getting you right, that I just need the parallel toolbox for my single client which sends out the tasks to the workers (having a licence) and not the distributed toolbox? Or was the parallel toolbox for max of 4 workers and that could be extended via the distributed toolbox?

To make the code ready for parallel computing it would only be a replacement of the inner for- to parfor loop (basically meaning to look at a single row and send the corresponding columns to the workers, then the next row and so forth)? I copy the functions to the workers and that`s it?

Steffen

Subject: Distributed Computing

From: Ashish Uthama

Date: 15 Mar, 2009 21:16:58

Message: 4 of 8

On Sun, 15 Mar 2009 13:27:01 -0400, Steffen <rileksn@gmail.com> wrote:

> Hello,
>
> I am kind of confused by the stuff I read the last days about the
> Distributed/Parallel Computing Toolbox, perhaps someone can give me some
> general advices.
>
> Very briefly my problem. I have a 300x300 movie and each pixel contains
> in its third dimension a spectrum which I need to fit. Basically I run
> for all 90000 pixels two functions, a fitting routine and some math to
> convert the fitted peaks. Since this takes me about 2 days, I thought of
> distributing the fitting/conversion routine to other computers (4 to 8
> PCs).
>
> for i=1:300
> for j=1:300
> spectrum=series(i,j,:);
> peaks=func_fit_spectrum(spectrum);
> handles=func_peak_conversion(peaks);
> end
> end
>
> 1) Would that task be solveable via distributed computing or am I
> missing some major pitfalls here?
> 2) Am I right that the other computers require Matlab and the
> distributed toolbox to be installed? How about the functions that are
> called, just copying them to the workers or is the toolbox automatically
> taking care of that?
> 3) The example given in the help somehow suggests just to replace the
> for loops by parfor loops. Does that hold true for this particular case
> with two for loops? Or do I have to cut the main image into small pieces
> myself and then send it to the workers?
>
> Many thanks in advance for any advice, much appreciated!
>
> Steffen

1) I think parallel computing is a good bet to speed this up. The
'pitfall' might be the overhead in copying the data to and from workers,
this would depend on the size of your third dimension, the size of your
computed output etc.

2)
Yes, you are right. The other computers need to be set up to be 'workers'.
This is not the same as installing 'MATLAB' since they (workers) can only
be started/stopped/communicated with by the MATLAB Distributed computing
engine (what you seem to refer to as the 'distributed toolbox'). You will
not be able to run a MATLAB session like your desktop on them.

Have a look at:
http://www.mathworks.com/cmsimages/dm_workflow_wl_18819.gif
The 'workers' on the left (Desktop System) refer to process on your
machine, you just need the PCT for this. If you only wanted to harness
multiple processors on your desktop, the right side box could be omitted.
The 'workers' on the right refer to process on other machines. There could
be more than one on a single physical machine depending on how you set it
up. This setup requires the Distributed server license (in addition to the
PCT on your desktop) and product to be set up.

More:
http://www.mathworks.com/products/parallel-computing/

3) Nested PARFOR are not supported. You could easily convert your nested
FOR's to a single for.

air code:

parfor ind=1:300*300
         [i,j]=ind2sub([300 300],ind); %look up doc ind2sub to get more
info
         spectrum=series(i,j,:);
         peaks=func_fit_spectrum(spectrum); %arent you overwriting these
variables?
         handles=func_peak_conversion(peaks);
end

Subject: Distributed Computing

From: Edric M Ellis

Date: 16 Mar, 2009 09:01:05

Message: 5 of 8

"Steffen" <rileksn@gmail.com> writes:

> Am I getting you right, that I just need the parallel toolbox for my single
> client which sends out the tasks to the workers (having a licence) and not the
> distributed toolbox? Or was the parallel toolbox for max of 4 workers and that
> could be extended via the distributed toolbox?

On your client machine, you always need MATLAB + PCT, and with that you get 4
local workers.

If you want more workers, or workers on 1 or more separate machines, you need
MDCS licences for those machines - 1 licence per worker.

> To make the code ready for parallel computing it would only be a replacement
> of the inner for- to parfor loop (basically meaning to look at a single row
> and send the corresponding columns to the workers, then the next row and so
> forth)? I copy the functions to the workers and that`s it?

Looking at your code, I think that should work, however I would be tempted to
make the outer loop the PARFOR - it's generally better to make the largest
possible chunk of parallelisation (unless you need to handle cases where the
execution times for the iterations are vastly different).

There are several different ways of getting your code to run on the workers - if
they can see the same filesystem as your client, you can simply add the path;
otherwise, PCT/MCDS have mechanisms for copying your code to the workers so they
can find it.

Cheers,

Edric.

Subject: Distributed Computing

From: Steffen

Date: 16 Mar, 2009 10:11:12

Message: 6 of 8

Thanks for all the input so far!

Since I want to distribute the task to multiple other machines I need MDCS as well as the Parallel Toolbox.

Admitting that I have not totally understood the use of ind2sub, it seems that I can squeeze the 2 for loops into 1 and thereby use a single parfor, right?

parfor ind=1:300*300
        [i,j]=ind2sub([300 300],ind);
         spectrum=series(i,j,:);
         peaks=func_fit_spectrum(spectrum);
         handles=func_peak_conversion(peaks);
end

Is my understanding correct that parfor sends now for each 'ind' the spectrum plus the two funtions to one worker (eg. ind=1, worker1, ind=2 worker2, etc.)? This somehow seems wrong to me, since this would once again be serial computing, right?
Currently, the structuring element 'handle' within the func_peak_conversion is growing for each run (peaks can be overwritten, as it is just the input for the 2nd fcn). I presume/hope the growing of the handle would still work when each worker adds a value separately?

Thanks for the help!!

Steffen

Subject: Distributed Computing

From: Edric M Ellis

Date: 16 Mar, 2009 11:22:17

Message: 7 of 8

"Steffen" <rileksn@gmail.com> writes:

> Admitting that I have not totally understood the use of ind2sub, it seems that
> I can squeeze the 2 for loops into 1 and thereby use a single parfor, right?
>
> parfor ind=1:300*300
> [i,j]=ind2sub([300 300],ind);
> spectrum=series(i,j,:);
> peaks=func_fit_spectrum(spectrum);
> handles=func_peak_conversion(peaks);
> end
>
> Is my understanding correct that parfor sends now for each 'ind' the spectrum
> plus the two funtions to one worker (eg. ind=1, worker1, ind=2 worker2, etc.)?
> This somehow seems wrong to me, since this would once again be serial
> computing, right?

Each worker works on different values of "ind" in parallel, and then sends the
results back to the client. (In practice, we start by sending out "ind" in large
chunks, which diminish as we get closer to the end of the PARFOR loop).

Unfortunately, using the "ind2sub" approach is not desirable here, since the
PARFOR machinery cannot then tell that you only need a subset of "series" for
each iteration (the machinery can only deduce that if you index "series" using
the loop index, and other constant terms). Something like the following allows
PARFOR to minimise the data transfer:

parfor r = 1:300
  thisrow = series( r, :, : ); % index "series" using only "r" and ":"
  for c = 1:300
    spectrum = thisrow( 1, c, : );
    ...
  end
end

However, if the total size of "series" is small, then the overhead of sending it
to the workers might not be significant.

> Currently, the structuring element 'handle' within the func_peak_conversion is
> growing for each run (peaks can be overwritten, as it is just the input for
> the 2nd fcn). I presume/hope the growing of the handle would still work when
> each worker adds a value separately?

Ah, sorry, I didn't look at your code closely enough to spot that detail.

Because the iterations of the PARFOR loop are executed on different machines in
totally separate MATLAB instances, there can be no interdependence between the
iterations of your PARFOR loop - so, to use PARFOR, you might need to
restructure the way you're using "peaks" and "handles".

In general though, a PARFOR loop can return data structures which grow providing
you use explicit concatenation, like this:

x = [];
parfor ii=1:10
  if ii > 3
    x = [x, ii];
  end
end

Cheers,

Edric.

Subject: Distributed Computing

From: Steffen

Date: 16 Mar, 2009 16:55:15

Message: 8 of 8

Hi Edric and also Ashish Uthama,

thanks a lot for the input. I?ll give it a try and play a bit around with code, hopefully it is working... ;)

Cheers,
Steffen

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us