Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Performance bug with solving symmetric linear systems with backslash?

Subject: Performance bug with solving symmetric linear systems with backslash?

From: Grady

Date: 18 Oct, 2007 20:40:52

Message: 1 of 5

I just got my new server (Dell PowerEdge 2950 with two Quad
Core Intel Xeon X5355 processors, running 64-bit Linux
CentOS dist.) up and running with Matlab 7.5.0 R2007b
(64-bit version) and noticed performance issue when solving
(dense) symmetric linear system with backslash.

Here is a simple example to illustrate the issue.

First, tell matlab it can use all 8 of the cores:
>maxNumCompThreads = 8

Create a 3000-by-3000 dense symmetric matrix:
>A = rand(3000); A = (A + A.')/2;
Create 1000 right hand sides
>B = rand(3000,1000);
Time how long it takes to solve the systems AX=B using
backslash:
>tic; X = A\B; toc
The result is:
Elapsed time is 29.206247 seconds

Now create a non-symmetric 3000-by-3000 dense symmetric
matrix and do the same calculation:
>C = rand(3000);
>tic; X = C\B; toc
The result is:
Elapsed time is 1.79076 seconds

That is a huge difference between solving two linear systems
of the same size. I would expect the two times to be
roughly the same, with perhaps the symmetric version faster.

One thing I noticed while tracking the activity of the
processor during these calculations is that the version with
the symmetric solve only uses one core, while the
non-symmetric solve appears to use all eight. To see if
that is the only issue, I forced matlab to not multithread
the computations by turning off multithreading in the
file>preferences>general>multithreading box. Here are the
results:

Symmetric:
>tic; X = A\B; toc
The result is:
Elapsed time is 30.243275 seconds

Non-symmetric:
>tic; X = C\B; toc
The result is:
Elapsed time is 5.138421 seconds

This seems to indicate the problem is not solely from the
non-symmetric solve using multithreading and the symmetric
solve only using one thread (core).

To make absolutely sure the problem is with the choice of
solvers matlab is choosing in backslash (mldivide) function
and not with the particular A and C matrices, I also used
the linsolve command with the A matrix and told matlab which
solver to use. Here are the commands (note multithreading
is again turned off):

Use symmetric solver on AX=B
>opts.SYM=true; tic; X=linsolve(A,B,opts); toc
The result is:
Elapsed time is 29.817919 seconds

Use a non-symmetric solver on AX=B
>opts.SYM=false; tic; X=linsolve(A,B,opts); toc
The result is:
Elapsed time is 5.051546 seconds

According to the release notes for 2007b, the new function
ldl was added for decomposing symmetric indefinite linear
systems. I'm not sure if this function (or the
corresponding LAPACK function) is what is causing the
performance issue. I previously had 7.1R14SP3 (32-bit)
installed on this same machine and found that back slash
with the symmetric matrix performed as well as backslash on
a nonsymmetric matrix, although I don't have the exact
results any more.

I searched a bit on the MW website to see if this issue had
been commented on, but found no previous posts. Has any one
seen a similar performance problem on their systems and does
any one know if MW is aware of this issue?

-Grady

Subject: Performance bug with solving symmetric linear systems with backslash?

From: Bobby Cheng

Date: 19 Oct, 2007 20:05:33

Message: 2 of 5

Here is the surprise (even to me).

dsytrs.f in LAPACK is using only level 2 BLAS instead of the usual level 3
BLAS like in dgetrs.f. So with mulitple RHS, the performance difference
really shows.

So this is an implementation issue with LAPACK. So there is no quick fix for
this.

But I hope to address this at least in MATLAB in a future release.

Good catch and thanks,
---Bob.

"Grady " <rbfstuff@hotmail.com> wrote in message
news:ff8gck$404$1@fred.mathworks.com...
>I just got my new server (Dell PowerEdge 2950 with two Quad
> Core Intel Xeon X5355 processors, running 64-bit Linux
> CentOS dist.) up and running with Matlab 7.5.0 R2007b
> (64-bit version) and noticed performance issue when solving
> (dense) symmetric linear system with backslash.
>
> Here is a simple example to illustrate the issue.
>
> First, tell matlab it can use all 8 of the cores:
>>maxNumCompThreads = 8
>
> Create a 3000-by-3000 dense symmetric matrix:
>>A = rand(3000); A = (A + A.')/2;
> Create 1000 right hand sides
>>B = rand(3000,1000);
> Time how long it takes to solve the systems AX=B using
> backslash:
>>tic; X = A\B; toc
> The result is:
> Elapsed time is 29.206247 seconds
>
> Now create a non-symmetric 3000-by-3000 dense symmetric
> matrix and do the same calculation:
>>C = rand(3000);
>>tic; X = C\B; toc
> The result is:
> Elapsed time is 1.79076 seconds
>
> That is a huge difference between solving two linear systems
> of the same size. I would expect the two times to be
> roughly the same, with perhaps the symmetric version faster.
>
> One thing I noticed while tracking the activity of the
> processor during these calculations is that the version with
> the symmetric solve only uses one core, while the
> non-symmetric solve appears to use all eight. To see if
> that is the only issue, I forced matlab to not multithread
> the computations by turning off multithreading in the
> file>preferences>general>multithreading box. Here are the
> results:
>
> Symmetric:
>>tic; X = A\B; toc
> The result is:
> Elapsed time is 30.243275 seconds
>
> Non-symmetric:
>>tic; X = C\B; toc
> The result is:
> Elapsed time is 5.138421 seconds
>
> This seems to indicate the problem is not solely from the
> non-symmetric solve using multithreading and the symmetric
> solve only using one thread (core).
>
> To make absolutely sure the problem is with the choice of
> solvers matlab is choosing in backslash (mldivide) function
> and not with the particular A and C matrices, I also used
> the linsolve command with the A matrix and told matlab which
> solver to use. Here are the commands (note multithreading
> is again turned off):
>
> Use symmetric solver on AX=B
>>opts.SYM=true; tic; X=linsolve(A,B,opts); toc
> The result is:
> Elapsed time is 29.817919 seconds
>
> Use a non-symmetric solver on AX=B
>>opts.SYM=false; tic; X=linsolve(A,B,opts); toc
> The result is:
> Elapsed time is 5.051546 seconds
>
> According to the release notes for 2007b, the new function
> ldl was added for decomposing symmetric indefinite linear
> systems. I'm not sure if this function (or the
> corresponding LAPACK function) is what is causing the
> performance issue. I previously had 7.1R14SP3 (32-bit)
> installed on this same machine and found that back slash
> with the symmetric matrix performed as well as backslash on
> a nonsymmetric matrix, although I don't have the exact
> results any more.
>
> I searched a bit on the MW website to see if this issue had
> been commented on, but found no previous posts. Has any one
> seen a similar performance problem on their systems and does
> any one know if MW is aware of this issue?
>
> -Grady
>

Subject: Performance bug with solving symmetric linear systems with backslash?

From: Olaf Bousche

Date: 20 Dec, 2007 09:37:33

Message: 3 of 5

"Bobby Cheng" <bcheng@mathworks.com> wrote in message
<ffb2md$3p9$1@fred.mathworks.com>...
> Here is the surprise (even to me).
>
> dsytrs.f in LAPACK is using only level 2 BLAS instead of
the usual level 3
> BLAS like in dgetrs.f. So with mulitple RHS, the
performance difference
> really shows.
>
> So this is an implementation issue with LAPACK. So there
is no quick fix for
> this.
>
> But I hope to address this at least in MATLAB in a future
release.
>
> Good catch and thanks,
> ---Bob.
>
> "Grady " <rbfstuff@hotmail.com> wrote in message
> news:ff8gck$404$1@fred.mathworks.com...
> >I just got my new server (Dell PowerEdge 2950 with two
Quad
> > Core Intel Xeon X5355 processors, running 64-bit Linux
> > CentOS dist.) up and running with Matlab 7.5.0 R2007b
> > (64-bit version) and noticed performance issue when
solving
> > (dense) symmetric linear system with backslash.
> >
> > Here is a simple example to illustrate the issue.
> >
> > First, tell matlab it can use all 8 of the cores:
> >>maxNumCompThreads = 8
> >
> > Create a 3000-by-3000 dense symmetric matrix:
> >>A = rand(3000); A = (A + A.')/2;
> > Create 1000 right hand sides
> >>B = rand(3000,1000);
> > Time how long it takes to solve the systems AX=B using
> > backslash:
> >>tic; X = A\B; toc
> > The result is:
> > Elapsed time is 29.206247 seconds
> >
> > Now create a non-symmetric 3000-by-3000 dense symmetric
> > matrix and do the same calculation:
> >>C = rand(3000);
> >>tic; X = C\B; toc
> > The result is:
> > Elapsed time is 1.79076 seconds
> >
> > That is a huge difference between solving two linear
systems
> > of the same size. I would expect the two times to be
> > roughly the same, with perhaps the symmetric version
faster.
> >
> > One thing I noticed while tracking the activity of the
> > processor during these calculations is that the version
with
> > the symmetric solve only uses one core, while the
> > non-symmetric solve appears to use all eight. To see if
> > that is the only issue, I forced matlab to not
multithread
> > the computations by turning off multithreading in the
> > file>preferences>general>multithreading box. Here are
the
> > results:
> >
> > Symmetric:
> >>tic; X = A\B; toc
> > The result is:
> > Elapsed time is 30.243275 seconds
> >
> > Non-symmetric:
> >>tic; X = C\B; toc
> > The result is:
> > Elapsed time is 5.138421 seconds
> >
> > This seems to indicate the problem is not solely from
the
> > non-symmetric solve using multithreading and the
symmetric
> > solve only using one thread (core).
> >
> > To make absolutely sure the problem is with the choice
of
> > solvers matlab is choosing in backslash (mldivide)
function
> > and not with the particular A and C matrices, I also
used
> > the linsolve command with the A matrix and told matlab
which
> > solver to use. Here are the commands (note
multithreading
> > is again turned off):
> >
> > Use symmetric solver on AX=B
> >>opts.SYM=true; tic; X=linsolve(A,B,opts); toc
> > The result is:
> > Elapsed time is 29.817919 seconds
> >
> > Use a non-symmetric solver on AX=B
> >>opts.SYM=false; tic; X=linsolve(A,B,opts); toc
> > The result is:
> > Elapsed time is 5.051546 seconds
> >
> > According to the release notes for 2007b, the new
function
> > ldl was added for decomposing symmetric indefinite
linear
> > systems. I'm not sure if this function (or the
> > corresponding LAPACK function) is what is causing the
> > performance issue. I previously had 7.1R14SP3 (32-bit)
> > installed on this same machine and found that back slash
> > with the symmetric matrix performed as well as
backslash on
> > a nonsymmetric matrix, although I don't have the exact
> > results any more.
> >
> > I searched a bit on the MW website to see if this issue
had
> > been commented on, but found no previous posts. Has
any one
> > seen a similar performance problem on their systems and
does
> > any one know if MW is aware of this issue?
> >
> > -Grady
> >
>
>


I found something similar just a few days ago. We have some
old code running under version 2006a. We ported the code to
2007b and suddenly the program ran 4 times slower on a dual
core machine then on the old single core machine. After
profiling we were able to find the offending statement. The
simplified code can be seen here:

n = 1000;
k = rand(n-1,1);
a = diag(k,-1)+diag(k,1)+diag(-[0;k]-[k;k(end)]);
f = rand(n)+i*rand(n);
tic; x = a\f; toc

This runs slow. The matrix a is symmetric and tridiagonal.
The fix I had for Grady's code (yes there is a quick fix!!)

opt.SYM = false;
x = linsolve(a,f,opt);

doesn't help here because this only works with a full matrix

However in our case adding

aa = sparse(a);
x = aa\f;

works in some cases more the 10 times faster!


Instead of waiting for a full new release, wouldn't it be
possible to write a quick and dirty mex file that calls
right parts of blas and lapack directly? Or just fix the
lapack dll?

Or does this problem run much deeper.

Olaf

Subject: Performance bug with solving symmetric linear systems with backslash?

From: Derek O'Connor

Date: 19 Jan, 2008 10:41:01

Message: 4 of 5

"Grady " <rbfstuff@hotmail.com> wrote in message
<ff8gck$404$1@fred.mathworks.com>...
> I just got my new server (Dell PowerEdge 2950 with two Quad
> Core Intel Xeon X5355 processors, running 64-bit Linux
> CentOS dist.) up and running with Matlab 7.5.0 R2007b
> (64-bit version) and noticed performance issue when solving
> (dense) symmetric linear system with backslash.
>
>> Here is a simple example to illustrate the issue.
>
> First, tell matlab it can use all 8 of the cores:
> >maxNumCompThreads = 8
>
> Create a 3000-by-3000 dense symmetric matrix:
> >A = rand(3000); A = (A + A.')/2;
> Create 1000 right hand sides
> >B = rand(3000,1000);
> Time how long it takes to solve the systems AX=B using
> backslash:
> >tic; X = A\B; toc
> The result is:
> Elapsed time is 29.206247 seconds
>
> Now create a non-symmetric 3000-by-3000 dense symmetric
> matrix and do the same calculation:
> >C = rand(3000);
> >tic; X = C\B; toc
> The result is:
> Elapsed time is 1.79076 seconds
>
> That is a huge difference between solving two linear systems
> of the same size. I would expect the two times to be
> roughly the same, with perhaps the symmetric version faster.
>
> One thing I noticed while tracking the activity of the
> processor during these calculations is that the version with
> the symmetric solve only uses one core, while the
> non-symmetric solve appears to use all eight. To see if
> that is the only issue, I forced matlab to not multithread
> the computations by turning off multithreading in the
> file>preferences>general>multithreading box. Here are the
> results:
>
> Symmetric:
> >tic; X = A\B; toc
> The result is:
> Elapsed time is 30.243275 seconds
>
> Non-symmetric:
> >tic; X = C\B; toc
> The result is:
> Elapsed time is 5.138421 seconds
>
> This seems to indicate the problem is not solely from the
> non-symmetric solve using multithreading and the symmetric
> solve only using one thread (core).
>
> To make absolutely sure the problem is with the choice of
> solvers matlab is choosing in backslash (mldivide) function
> and not with the particular A and C matrices, I also used
> the linsolve command with the A matrix and told matlab which
> solver to use. Here are the commands (note multithreading
> is again turned off):
>
> Use symmetric solver on AX=B
> >opts.SYM=true; tic; X=linsolve(A,B,opts); toc
> The result is:
> Elapsed time is 29.817919 seconds
>
> Use a non-symmetric solver on AX=B
> >opts.SYM=false; tic; X=linsolve(A,B,opts); toc
> The result is:
> Elapsed time is 5.051546 seconds
>
> According to the release notes for 2007b, the new function
> ldl was added for decomposing symmetric indefinite linear
> systems. I'm not sure if this function (or the
> corresponding LAPACK function) is what is causing the
> performance issue. I previously had 7.1R14SP3 (32-bit)
> installed on this same machine and found that back slash
> with the symmetric matrix performed as well as backslash on
> a nonsymmetric matrix, although I don't have the exact
> results any more.
>
> I searched a bit on the MW website to see if this issue had
> been commented on, but found no previous posts. Has any one
> seen a similar performance problem on their systems and does
> any one know if MW is aware of this issue?
>
> -Grady
>

Dear Grady,

I got exactly the same results as yours on my Dell Precision
690 2x Intel (QuadCore) Xeon CPU E5345 @ 2.33GHZ running
64-bit Windows Vista with Matlab 7.5.0 R2007b 64-bit version)

Here is a statement from a MathWorker that may have some
bearing on the problem :

>Subject: Re: MEX File built with MKL crashes

>From: Duncan Po (MathWorker)

>Date: 13 Dec, 2007 12:22:21

>Is there any reason why you need to use the LAPACK in MKL?
>In MATLAB, we only support using the LAPACK version 3.1
>that can be downloaded from www.netlib.org. We have never
>tested using the LAPACK from the MKL package, therefore we
>cannot guarantee it works without crashing.

It would be useful to know what parts of (Intel's MKL) Blas
& Lapack Matlab uses in its matrix functions. Also it would
be nice to have VER command tell us what version of MKL is
in use. The latest from Intel is MKL 10.1
 


Derek O'Connor

Subject: Performance bug with solving symmetric linear systems with backslash?

From: Olaf Bousche

Date: 14 Apr, 2008 13:01:03

Message: 5 of 5

Just checked with Matlab 2008a

Unfortunately, the problem is still there

Olaf

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us