Thread Subject: Multithreading: Negative returns?? Puzzling benchmarks

Subject: Multithreading: Negative returns?? Puzzling benchmarks

From: Michael Johnston

Date: 17 Mar, 2009 22:46:01

Message: 1 of 3

I just got a new computer with dual Xeon 5450s at 3ghz (8 CPUs,
total). I decided to see how well the multi-threading in BLAS and
LAPACK would work, so I ran a simple test: Multiply two matrices
together a bunch of times, and then do the same thing with the
division operator. Perform this test for a number of threads up to the
number of CPU cores. Then plot the percentage change in the execution
time relative to the single threaded case.

The result is strange. I certainly expected diminishing returns to
scale to multi-threading as the number of threads increased. But I
never expected to see *diminishing* returns to scale. While smaller
matrices perform relatively worse, presumably as a result of overhead
from thread creation, my benchmarks indicate that even for reasonably
sized matrices (e.g., 500-by-500) the returns to multi-threading
become negative surprisingly quickly.

I'm very surprised to see this on a new shared-memory system. Has
anyone else gotten benchmarks like this? I have posted a graph of the
plot, as well as the benchmark code I wrote, on my web site with more
information: http://michaelkjohnston.com/perm/mt8bench/

Any ideas?? Anecdotes? Theories?

Best regards,

Michael

Subject: Multithreading: Negative returns?? Puzzling benchmarks

From: Derek O'Connor

Date: 18 Mar, 2009 00:46:01

Message: 2 of 3

Michael Johnston <mkjohnst@gmail.com> wrote in message <3b948ba5-b0c4-4b53-8352-de5641ddd313@o2g2000prl.googlegroups.com>...
> I just got a new computer with dual Xeon 5450s at 3ghz (8 CPUs,
> total). I decided to see how well the multi-threading in BLAS and
> LAPACK would work, so I ran a simple test: Multiply two matrices
> together a bunch of times, and then do the same thing with the
> division operator. Perform this test for a number of threads up to the
> number of CPU cores. Then plot the percentage change in the execution
> time relative to the single threaded case.
>
> The result is strange. I certainly expected diminishing returns to
> scale to multi-threading as the number of threads increased. But I
> never expected to see *diminishing* returns to scale. While smaller
> matrices perform relatively worse, presumably as a result of overhead
> from thread creation, my benchmarks indicate that even for reasonably
> sized matrices (e.g., 500-by-500) the returns to multi-threading
> become negative surprisingly quickly.
>
> I'm very surprised to see this on a new shared-memory system. Has
> anyone else gotten benchmarks like this? I have posted a graph of the
> plot, as well as the benchmark code I wrote, on my web site with more
> information: http://michaelkjohnston.com/perm/mt8bench/
>
> Any ideas?? Anecdotes? Theories?
>
> Best regards,
>
> Michael




Dear Michael,

The matrices used in your test above are tiny : 14x14 and 200x200.

Take a look at these test results on a Dell Precision 690 with dual Xeon 5345s at 2.3GHz, 8GB ram.
http://www.derekroconnor.net/Software/Benchmarks.htm

These tests show substantial multicore speedups for Matmult and LU Decomp, but very little speedups for SVD or EIG .

Regards,

Derek O'Connor

Subject: Multithreading: Negative returns?? Puzzling benchmarks

From: Michael Johnston

Date: 18 Mar, 2009 01:36:05

Message: 3 of 3

On Mar 17, 8:46=A0pm, "Derek O'Connor" <derekrocon...@eircom.net> wrote:
> Michael Johnston <mkjoh...@gmail.com> wrote in message <3b948ba5-b0c4-4b5=
3-8352-de5641ddd...@o2g2000prl.googlegroups.com>...
> > I just got a new computer with dual Xeon 5450s at 3ghz (8 CPUs,
> > total). I decided to see how well the multi-threading in BLAS and
> > LAPACK would work, so I ran a simple test: Multiply two matrices
> > together a bunch of times, and then do the same thing with the
> > division operator. Perform this test for a number of threads up to the
> > number of CPU cores. Then plot the percentage change in the execution
> > time relative to the single threaded case.
>
> > The result is strange. I certainly expected diminishing returns to
> > scale to multi-threading as the number of threads increased. But I
> > never expected to see *diminishing* returns to scale. While smaller
> > matrices perform relatively worse, presumably as a result of overhead
> > from thread creation, my benchmarks indicate that even for reasonably
> > sized matrices (e.g., 500-by-500) the returns to multi-threading
> > become negative surprisingly quickly.
>
> > I'm very surprised to see this on a new shared-memory system. Has
> > anyone else gotten benchmarks like this? I have posted a graph of the
> > plot, as well as the benchmark code I wrote, on my web site with more
> > information:http://michaelkjohnston.com/perm/mt8bench/
>
> > Any ideas?? Anecdotes? Theories?
>
> > Best regards,
>
> > Michael
>
> Dear Michael,
>
> The matrices used in your test above are tiny : 14x14 and 200x200.
>
> Take a look at these test results on a Dell Precision 690 with dual Xeon =
5345s at 2.3GHz, 8GB ram.http://www.derekroconnor.net/Software/Benchmarks.h=
tm
>
> These tests show substantial multicore speedups for Matmult and LU Decomp=
, but very little speedups for SVD or EIG .
>
> Regards,
>
> Derek O'Connor

Dear Derek,

Thanks very much for your reply. That's really helpful! My prior was
that the BLAS+LAPACK libraries would make optimal decisions with
respect to threading -- I think documentation from Mathworks says
something to this effect -- so I was surprised to see run times
actually increase. The worst part of this is perhaps that CPU
utilization rises steadily until it hits 100% in all of these tests as
the number of threads increases. I tested a new 24-CPU Xeon machine
and found that, for one piece of code, run times were effectively the
same with multi-threading on and off, but that with it on CPU
utilization was 24x higher. I'll use your code to replicate your
benchmarks on my hardware tomorrow when I get back to work and post an
update.

Best regards,

Michael

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread
 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com