Path: news.mathworks.com!not-for-mail
From: Edric M Ellis <eellis@mathworks.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Codistributed arrays performance
Date: Mon, 09 Nov 2009 09:01:11 +0000
Organization: The Mathworks, Ltd.
Lines: 35
Message-ID: <ytwws20dqco.fsf@uk-eellis-deb5-64.mathworks.co.uk>
References: <hd70vh$n62$1@fred.mathworks.com>
NNTP-Posting-Host: uk-eellis-deb5-64.mathworks.co.uk
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Trace: fred.mathworks.com 1257757271 4275 172.16.27.232 (9 Nov 2009 09:01:11 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Mon, 9 Nov 2009 09:01:11 +0000 (UTC)
X-Face: $Ahg}Iylezql"r1WV1Me5&)ng"a4v%D>==KMs-elCfj"o}$bh-VOt7lVXgLWsC?9mZ`mINT
 G6PDvca;nrgs$lfcr0l1ew'N]>nXKl}m|Zpg>,6*gLp~-N0N2*+b.iwv=u>@R$L4SEG&NYUU;lSR@u
 IHphdAy
User-Agent: Gnus/5.11 (Gnus v5.11) Emacs/22.2 (gnu/linux)
Cancel-Lock: sha1:y0KZIDgkLafBZuQtEcajnLEFCyA=
Xref: news.mathworks.com comp.soft-sys.matlab:583490


"Scott " <lorentz-spampadded@fastmail.fm> writes:

> 5) Be disappointed by the fact that the parallel benchmarks clearly show a
> significantly reduced performance in elementwise binary operations,
> e.g. codistributed.times, codistributed.rdivide, codistributed.mtimes, etc.
>
> I have enough previous experience writing MPI in C++ to understand how to avoid
> communication bottlenecks and using the mpiprofile I was able to reduce the
> communication overhead to < 4.8% of the total execution time.
>
> The majority of the execution, 61% was taken by the element wise operations,
> codistributor1d.hElementwiseBinaryOpImpl, which seem to reduce the performance
> for an identical serial operation by at least 3x.
>
> Can someone explain why these operations, in the absence of communication
> overhead, are so much slower than an identical serial execution? I would buy the
> multi-threading argument if I hadn't made sure to find and push the data size
> beyond what multi-threading seems to handle efficiently before benchmarking.

The main overhead when doing "embarassingly parallel" low numerical intensity
operations such as "times" or "rdivide" is the time taken to get into and out of
the underlying operation.

As you have seen, "codistributed.times" and so on are implemented using MATLAB
objects. Unfortunately, the overhead of object method dispatch is larger than
the relatively small amount of numerical computation required. We are aware that
this is a problem, and are working to try and increase the performance of
codistributed arrays. For now, the main advantages of codistributed arrays are
that they allow you to work with data sizes that do not fit onto a single
machine, and that the more complex linear algebra routines (such as ldivide) can
show performance benefit.

Cheers,

Edric.