Path: news.mathworks.com!not-for-mail
From: Edric M Ellis <eellis@mathworks.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Codistributed arrays performance
Date: Wed, 11 Nov 2009 08:40:18 +0000
Organization: The Mathworks, Ltd.
Lines: 53
Message-ID: <ytweio5cv4d.fsf@uk-eellis-deb5-64.mathworks.co.uk>
References: <hd70vh$n62$1@fred.mathworks.com> <ytwws20dqco.fsf@uk-eellis-deb5-64.mathworks.co.uk> <ytw7htzeou2.fsf@uk-eellis-deb5-64.mathworks.co.uk> <hd9m1i$oa3$1@fred.mathworks.com> <ytw3a4mer8y.fsf@uk-eellis-deb5-64.mathworks.co.uk> <hdcj6q$qcc$1@fred.mathworks.com>
NNTP-Posting-Host: uk-eellis-deb5-64.mathworks.co.uk
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: fred.mathworks.com 1257928818 27210 172.16.27.232 (11 Nov 2009 08:40:18 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Wed, 11 Nov 2009 08:40:18 +0000 (UTC)
X-Face: $Ahg}Iylezql"r1WV1Me5&)ng"a4v%D>==KMs-elCfj"o}$bh-VOt7lVXgLWsC?9mZ`mINT
 G6PDvca;nrgs$lfcr0l1ew'N]>nXKl}m|Zpg>,6*gLp~-N0N2*+b.iwv=u>@R$L4SEG&NYUU;lSR@u
 IHphdAy
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux)
Cancel-Lock: sha1:nBvjZRy/IipaFmzzFeW1RpEM17Q=
Xref: news.mathworks.com comp.soft-sys.matlab:584099


"Scott " <lorentz-spampadded@fastmail.fm> writes:

> Edric M Ellis <eellis@mathworks.com> wrote in message
> <ytw3a4mer8y.fsf@uk-eellis-deb5-64.mathworks.co.uk>...
>> "Scott " <lorentz-spampadded@fastmail.fm> writes:
>> 
>> > Ah, I see, good to know. Is parfor then the route to better performance when
>> > applicable, or is the parallel toolbox really just for large data sets at
>> > this point?
>>  In general, if a problem can be addressed using parfor, it will almost
>> certainly be quicker as there are fewer synchronisation points for
>> communication, and the dynamic scheduling attempts to get better
>> load-balancing.
>> 
>> Cheers,
>> 
>> Edric.
>
> My code is highly vectorized with very few for-loops. Would you expect the
> parfor performance to exceed that of a vectorized multi-threaded computation for
> large datasets? Or should I be considering semi-vectorized coding to take
> greater advantage of parfor? Seems like the array indexing necessary for that
> would slow it down, but I don't have a good handle on the performance
> trade-offs.

I'm afraid it's hard to say. The usual principles of using the profiler to work
out where time is being taken should help. Generally, to get speedup with
PARFOR, you need to ensure that the overheads of sending out and getting back
the data doesn't exceed the amount of computation required. Basically, it comes
down to having each loop iteration performing a largish amount of computation
compared to the amount of input and output data needed. A couple of extreme
examples:

y = rand( 1, N );
parfor ii=1:N
  x(ii) = y + 1;
end

In that case, all of x and y have to be sent to/from the workers, but the amount
of computation is trivial. This will be much slower than the obvious "x = y +
1".

parfor ii=1:N
  pause( ii );
end

This example is slightly silly, but should give almost perfect speedup compared
to the "for" version of the same loop, since the amount of data transferred is
zero, and the "work" done takes a long time compared to the PARFOR overheads.

Cheers,

Edric.