|
On Mar 17, 8:46=A0pm, "Derek O'Connor" <derekrocon...@eircom.net> wrote:
> Michael Johnston <mkjoh...@gmail.com> wrote in message <3b948ba5-b0c4-4b5=
3-8352-de5641ddd...@o2g2000prl.googlegroups.com>...
> > I just got a new computer with dual Xeon 5450s at 3ghz (8 CPUs,
> > total). I decided to see how well the multi-threading in BLAS and
> > LAPACK would work, so I ran a simple test: Multiply two matrices
> > together a bunch of times, and then do the same thing with the
> > division operator. Perform this test for a number of threads up to the
> > number of CPU cores. Then plot the percentage change in the execution
> > time relative to the single threaded case.
>
> > The result is strange. I certainly expected diminishing returns to
> > scale to multi-threading as the number of threads increased. But I
> > never expected to see *diminishing* returns to scale. While smaller
> > matrices perform relatively worse, presumably as a result of overhead
> > from thread creation, my benchmarks indicate that even for reasonably
> > sized matrices (e.g., 500-by-500) the returns to multi-threading
> > become negative surprisingly quickly.
>
> > I'm very surprised to see this on a new shared-memory system. Has
> > anyone else gotten benchmarks like this? I have posted a graph of the
> > plot, as well as the benchmark code I wrote, on my web site with more
> > information:http://michaelkjohnston.com/perm/mt8bench/
>
> > Any ideas?? Anecdotes? Theories?
>
> > Best regards,
>
> > Michael
>
> Dear Michael,
>
> The matrices used in your test above are tiny : 14x14 and 200x200.
>
> Take a look at these test results on a Dell Precision 690 with dual Xeon =
5345s at 2.3GHz, 8GB ram.http://www.derekroconnor.net/Software/Benchmarks.h=
tm
>
> These tests show substantial multicore speedups for Matmult and LU Decomp=
, but very little speedups for SVD or EIG .
>
> Regards,
>
> Derek O'Connor
Dear Derek,
Thanks very much for your reply. That's really helpful! My prior was
that the BLAS+LAPACK libraries would make optimal decisions with
respect to threading -- I think documentation from Mathworks says
something to this effect -- so I was surprised to see run times
actually increase. The worst part of this is perhaps that CPU
utilization rises steadily until it hits 100% in all of these tests as
the number of threads increases. I tested a new 24-CPU Xeon machine
and found that, for one piece of code, run times were effectively the
same with multi-threading on and off, but that with it on CPU
utilization was 24x higher. I'll use your code to replicate your
benchmarks on my hardware tomorrow when I get back to work and post an
update.
Best regards,
Michael
|