Path: news.mathworks.com!not-for-mail
From: Edric M Ellis <eellis@mathworks.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: spmd overhead and sharing small amounts of data
Date: Thu, 20 Dec 2012 07:44:59 +0000
Organization: The Mathworks, Ltd.
Lines: 61
Message-ID: <ytw623xcglg.fsf@uk-eellis0l.dhcp.mathworks.com>
References: <kasr64$a73$1@newscl01ah.mathworks.com>
NNTP-Posting-Host: uk-eellis0l.dhcp.mathworks.com
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: newscl01ah.mathworks.com 1355989500 12017 172.16.27.246 (20 Dec 2012 07:45:00 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Thu, 20 Dec 2012 07:45:00 +0000 (UTC)
X-Face: $Ahg}Iylezql"r1WV1Me5&)ng"a4v%D>==KMs-elCfj"o}$bh-VOt7lVXgLWsC?9mZ`mINT
 G6PDvca;nrgs$lfcr0l1ew'N]>nXKl}m|Zpg>,6*gLp~-N0N2*+b.iwv=u>@R$L4SEG&NYUU;lSR@u
 IHphdAy
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux)
Cancel-Lock: sha1:DbtkDWEqmJ0xKdMi/m7yNXoV17U=
Xref: news.mathworks.com comp.soft-sys.matlab:785120

"Chuck37 " <chuck3737@yahooremovethis.com> writes:

> I think my processing time is being chewed up by spmd overhead.  I
> have a simulation that looks like this:
>
> for x = 1:N
>   spmd
>      D = expensiveFunction(D)
>   end
>   <gather small bits from workers>
>   <fast simple function>
>   spmd
>     D = expensiveFunction2(D)
>   end
>
>   <repeat the above basic idea 3-4 times>
>
> end
>
> Watching my processors, they are seldom above 20% utilization, so it
> makes me think that I'm suffering from going in and out of spmd and
> maybe gathering even small pieces of data is a problem.

What size MATLABPOOL did you open? One thing to note: MATLAB doesn't
consider hyperthreaded cores when choosing how many processes to make
the default for the local scheduler - so even if all the workers were
completely busy, you might not see more than 50% utilisation. (That's
what happens here on my system - the OS thinks it has 12 cores, but
really it has 6 hyperthreaded cores, so the default is for MATLAB to
launch 6 local workers).

> Any alternatives?  For one, is there a way to gather directly worker
> to worker without exiting spmd and bringing it all to the master?

You can use functions like either labBroadcast to send data from one
worker to all the others, and gcat or gop to perform 'reduction'
operations. Very briefly, here's how they work

% assume matlabpool size 4
spmd
  data = rand();

  % broadcast from lab 1, x1 gets the value from 'data' on lab 1.
  x1 = labBroadcast(1, data); 

  % concatenation - each lab gets x2 = [1 2 3 4];
  x2 = gcat(labindex);

  % general reduction - in this case, each lab gets
  % x3 = 1 + 2 + 3 + 4
  x3 = gop(@plus, labindex);
end

> The data is small and functions cheap, so maybe I'd be ahead to let
> everyone do the same computations just to keep things flowing.

That sounds like a very good idea.

Cheers,

Edric.