Got Questions? Get Answers.
Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Multithreading: Negative returns?? Puzzling benchmarks

Subject: Multithreading: Negative returns?? Puzzling benchmarks

From: Michael Johnston

Date: 17 Mar, 2009 22:46:01

Message: 1 of 3

I just got a new computer with dual Xeon 5450s at 3ghz (8 CPUs,
total). I decided to see how well the multi-threading in BLAS and
LAPACK would work, so I ran a simple test: Multiply two matrices
together a bunch of times, and then do the same thing with the
division operator. Perform this test for a number of threads up to the
number of CPU cores. Then plot the percentage change in the execution
time relative to the single threaded case.

The result is strange. I certainly expected diminishing returns to
scale to multi-threading as the number of threads increased. But I
never expected to see *diminishing* returns to scale. While smaller
matrices perform relatively worse, presumably as a result of overhead
from thread creation, my benchmarks indicate that even for reasonably
sized matrices (e.g., 500-by-500) the returns to multi-threading
become negative surprisingly quickly.

I'm very surprised to see this on a new shared-memory system. Has
anyone else gotten benchmarks like this? I have posted a graph of the
plot, as well as the benchmark code I wrote, on my web site with more
information: http://michaelkjohnston.com/perm/mt8bench/

Any ideas?? Anecdotes? Theories?

Best regards,

Michael

Subject: Multithreading: Negative returns?? Puzzling benchmarks

From: Derek O'Connor

Date: 18 Mar, 2009 00:46:01

Message: 2 of 3

Michael Johnston <mkjohnst@gmail.com> wrote in message <3b948ba5-b0c4-4b53-8352-de5641ddd313@o2g2000prl.googlegroups.com>...
> I just got a new computer with dual Xeon 5450s at 3ghz (8 CPUs,
> total). I decided to see how well the multi-threading in BLAS and
> LAPACK would work, so I ran a simple test: Multiply two matrices
> together a bunch of times, and then do the same thing with the
> division operator. Perform this test for a number of threads up to the
> number of CPU cores. Then plot the percentage change in the execution
> time relative to the single threaded case.
>
> The result is strange. I certainly expected diminishing returns to
> scale to multi-threading as the number of threads increased. But I
> never expected to see *diminishing* returns to scale. While smaller
> matrices perform relatively worse, presumably as a result of overhead
> from thread creation, my benchmarks indicate that even for reasonably
> sized matrices (e.g., 500-by-500) the returns to multi-threading
> become negative surprisingly quickly.
>
> I'm very surprised to see this on a new shared-memory system. Has
> anyone else gotten benchmarks like this? I have posted a graph of the
> plot, as well as the benchmark code I wrote, on my web site with more
> information: http://michaelkjohnston.com/perm/mt8bench/
>
> Any ideas?? Anecdotes? Theories?
>
> Best regards,
>
> Michael




Dear Michael,

The matrices used in your test above are tiny : 14x14 and 200x200.

Take a look at these test results on a Dell Precision 690 with dual Xeon 5345s at 2.3GHz, 8GB ram.
http://www.derekroconnor.net/Software/Benchmarks.htm

These tests show substantial multicore speedups for Matmult and LU Decomp, but very little speedups for SVD or EIG .

Regards,

Derek O'Connor

Subject: Multithreading: Negative returns?? Puzzling benchmarks

From: Michael Johnston

Date: 18 Mar, 2009 01:36:05

Message: 3 of 3

On Mar 17, 8:46=A0pm, "Derek O'Connor" <derekrocon...@eircom.net> wrote:
> Michael Johnston <mkjoh...@gmail.com> wrote in message <3b948ba5-b0c4-4b5=
3-8352-de5641ddd...@o2g2000prl.googlegroups.com>...
> > I just got a new computer with dual Xeon 5450s at 3ghz (8 CPUs,
> > total). I decided to see how well the multi-threading in BLAS and
> > LAPACK would work, so I ran a simple test: Multiply two matrices
> > together a bunch of times, and then do the same thing with the
> > division operator. Perform this test for a number of threads up to the
> > number of CPU cores. Then plot the percentage change in the execution
> > time relative to the single threaded case.
>
> > The result is strange. I certainly expected diminishing returns to
> > scale to multi-threading as the number of threads increased. But I
> > never expected to see *diminishing* returns to scale. While smaller
> > matrices perform relatively worse, presumably as a result of overhead
> > from thread creation, my benchmarks indicate that even for reasonably
> > sized matrices (e.g., 500-by-500) the returns to multi-threading
> > become negative surprisingly quickly.
>
> > I'm very surprised to see this on a new shared-memory system. Has
> > anyone else gotten benchmarks like this? I have posted a graph of the
> > plot, as well as the benchmark code I wrote, on my web site with more
> > information:http://michaelkjohnston.com/perm/mt8bench/
>
> > Any ideas?? Anecdotes? Theories?
>
> > Best regards,
>
> > Michael
>
> Dear Michael,
>
> The matrices used in your test above are tiny : 14x14 and 200x200.
>
> Take a look at these test results on a Dell Precision 690 with dual Xeon =
5345s at 2.3GHz, 8GB ram.http://www.derekroconnor.net/Software/Benchmarks.h=
tm
>
> These tests show substantial multicore speedups for Matmult and LU Decomp=
, but very little speedups for SVD or EIG .
>
> Regards,
>
> Derek O'Connor

Dear Derek,

Thanks very much for your reply. That's really helpful! My prior was
that the BLAS+LAPACK libraries would make optimal decisions with
respect to threading -- I think documentation from Mathworks says
something to this effect -- so I was surprised to see run times
actually increase. The worst part of this is perhaps that CPU
utilization rises steadily until it hits 100% in all of these tests as
the number of threads increases. I tested a new 24-CPU Xeon machine
and found that, for one piece of code, run times were effectively the
same with multi-threading on and off, but that with it on CPU
utilization was 24x higher. I'll use your code to replicate your
benchmarks on my hardware tomorrow when I get back to work and post an
update.

Best regards,

Michael

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us