Path: news.mathworks.com!newsfeed-00.mathworks.com!nlpi057.nbdc.sbc.com!prodigy.net!border1.nntp.dca.giganews.com!nntp.giganews.com!postnews.google.com!r28g2000vbp.googlegroups.com!not-for-mail
From: Michael Johnston <mkjohnst@gmail.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Multithreading: Negative returns?? Puzzling benchmarks
Date: Tue, 17 Mar 2009 18:36:05 -0700 (PDT)
Organization: http://groups.google.com
Lines: 65
Message-ID: <826a059a-31c4-41c3-9196-1fcccdb0b765@r28g2000vbp.googlegroups.com>
References: <3b948ba5-b0c4-4b53-8352-de5641ddd313@o2g2000prl.googlegroups.com> 
	<gppg89$6v$1@fred.mathworks.com>
NNTP-Posting-Host: 99.241.32.70
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Trace: posting.google.com 1237340167 25211 127.0.0.1 (18 Mar 2009 01:36:07 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Wed, 18 Mar 2009 01:36:07 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: r28g2000vbp.googlegroups.com; posting-host=99.241.32.70; 
	posting-account=dwbQVQkAAACN_1BI7VOnXlWvTWi3ZdU4
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.7) 
	Gecko/2009030423 Ubuntu/8.10 (intrepid) Firefox/3.0.7,gzip(gfe),gzip(gfe)
Bytes: 4023
Xref: news.mathworks.com comp.soft-sys.matlab:525722


On Mar 17, 8:46=A0pm, "Derek O'Connor" <derekrocon...@eircom.net> wrote:
> Michael Johnston <mkjoh...@gmail.com> wrote in message <3b948ba5-b0c4-4b5=
3-8352-de5641ddd...@o2g2000prl.googlegroups.com>...
> > I just got a new computer with dual Xeon 5450s at 3ghz (8 CPUs,
> > total). I decided to see how well the multi-threading in BLAS and
> > LAPACK would work, so I ran a simple test: Multiply two matrices
> > together a bunch of times, and then do the same thing with the
> > division operator. Perform this test for a number of threads up to the
> > number of CPU cores. Then plot the percentage change in the execution
> > time relative to the single threaded case.
>
> > The result is strange. I certainly expected diminishing returns to
> > scale to multi-threading as the number of threads increased. But I
> > never expected to see *diminishing* returns to scale. While smaller
> > matrices perform relatively worse, presumably as a result of overhead
> > from thread creation, my benchmarks indicate that even for reasonably
> > sized matrices (e.g., 500-by-500) the returns to multi-threading
> > become negative surprisingly quickly.
>
> > I'm very surprised to see this on a new shared-memory system. Has
> > anyone else gotten benchmarks like this? I have posted a graph of the
> > plot, as well as the benchmark code I wrote, on my web site with more
> > information:http://michaelkjohnston.com/perm/mt8bench/
>
> > Any ideas?? Anecdotes? Theories?
>
> > Best regards,
>
> > Michael
>
> Dear Michael,
>
> The matrices used in your test above are tiny : 14x14 and 200x200.
>
> Take a look at these test results on a Dell Precision 690 with dual Xeon =
5345s at 2.3GHz, 8GB ram.http://www.derekroconnor.net/Software/Benchmarks.h=
tm
>
> These tests show substantial multicore speedups for Matmult and LU Decomp=
, but very little speedups for SVD or EIG .
>
> Regards,
>
> Derek O'Connor

Dear Derek,

Thanks very much for your reply. That's really helpful!  My prior was
that the BLAS+LAPACK libraries would make optimal decisions with
respect to threading -- I think documentation from Mathworks says
something to this effect -- so I was surprised to see run times
actually increase. The worst part of this is perhaps that CPU
utilization rises steadily until it hits 100% in all of these tests as
the number of threads increases. I tested a new 24-CPU Xeon machine
and found that, for one piece of code, run times were effectively the
same with multi-threading on and off, but that with it on CPU
utilization was 24x higher. I'll use your code to replicate your
benchmarks on my hardware tomorrow when I get back to work and post an
update.

Best regards,

Michael