Path: news.mathworks.com!not-for-mail
From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Performance bug with solving symmetric linear systems with backslash?
Date: Thu, 20 Dec 2007 09:37:33 +0000 (UTC)
Organization: KROHNE Altometer
Lines: 172
Message-ID: <fkdd4t$7o8$1@fred.mathworks.com>
References: <ff8gck$404$1@fred.mathworks.com> <ffb2md$3p9$1@fred.mathworks.com>
Reply-To: <HIDDEN>
NNTP-Posting-Host: webapp-02-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1198143453 7944 172.30.248.37 (20 Dec 2007 09:37:33 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Thu, 20 Dec 2007 09:37:33 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 1043626
Xref: news.mathworks.com comp.soft-sys.matlab:443138



"Bobby Cheng" <bcheng@mathworks.com> wrote in message 
<ffb2md$3p9$1@fred.mathworks.com>...
> Here is the surprise (even to me).
> 
> dsytrs.f in LAPACK is using only level 2 BLAS instead of 
the usual level 3 
> BLAS like in dgetrs.f. So with mulitple RHS, the 
performance difference 
> really shows.
> 
> So this is an implementation issue with LAPACK. So there 
is no quick fix for 
> this.
> 
> But I hope to address this at least in MATLAB in a future 
release.
> 
> Good catch and thanks,
> ---Bob.
> 
> "Grady " <rbfstuff@hotmail.com> wrote in message 
> news:ff8gck$404$1@fred.mathworks.com...
> >I just got my new server (Dell PowerEdge 2950 with two 
Quad
> > Core Intel Xeon X5355 processors, running 64-bit Linux
> > CentOS dist.) up and running with Matlab 7.5.0 R2007b
> > (64-bit version) and noticed performance issue when 
solving
> > (dense) symmetric linear system with backslash.
> >
> > Here is a simple example to illustrate the issue.
> >
> > First, tell matlab it can use all 8 of the cores:
> >>maxNumCompThreads = 8
> >
> > Create a 3000-by-3000 dense symmetric matrix:
> >>A = rand(3000); A = (A + A.')/2;
> > Create 1000 right hand sides
> >>B = rand(3000,1000);
> > Time how long it takes to solve the systems AX=B using
> > backslash:
> >>tic; X = A\B; toc
> > The result is:
> > Elapsed time is 29.206247 seconds
> >
> > Now create a non-symmetric 3000-by-3000 dense symmetric
> > matrix and do the same calculation:
> >>C = rand(3000);
> >>tic; X = C\B; toc
> > The result is:
> > Elapsed time is 1.79076 seconds
> >
> > That is a huge difference between solving two linear 
systems
> > of the same size.  I would expect the two times to be
> > roughly the same, with perhaps the symmetric version 
faster.
> >
> > One thing I noticed while tracking the activity of the
> > processor during these calculations is that the version 
with
> > the symmetric solve only uses one core, while the
> > non-symmetric solve appears to use all eight.  To see if
> > that is the only issue, I forced matlab to not 
multithread
> > the computations by turning off multithreading in the
> > file>preferences>general>multithreading box.  Here are 
the
> > results:
> >
> > Symmetric:
> >>tic; X = A\B; toc
> > The result is:
> > Elapsed time is 30.243275 seconds
> >
> > Non-symmetric:
> >>tic; X = C\B; toc
> > The result is:
> > Elapsed time is 5.138421 seconds
> >
> > This seems to indicate the problem is not solely from 
the
> > non-symmetric solve using multithreading and the 
symmetric
> > solve only using one thread (core).
> >
> > To make absolutely sure the problem is with the choice 
of
> > solvers matlab is choosing in backslash (mldivide) 
function
> > and not with the particular A and C matrices, I also 
used
> > the linsolve command with the A matrix and told matlab 
which
> > solver to use.  Here are the commands (note 
multithreading
> > is again turned off):
> >
> > Use symmetric solver on AX=B
> >>opts.SYM=true; tic; X=linsolve(A,B,opts); toc
> > The result is:
> > Elapsed time is 29.817919 seconds
> >
> > Use a non-symmetric solver on AX=B
> >>opts.SYM=false; tic; X=linsolve(A,B,opts); toc
> > The result is:
> > Elapsed time is 5.051546 seconds
> >
> > According to the release notes for 2007b, the new 
function
> > ldl was added for decomposing symmetric indefinite 
linear
> > systems.  I'm not sure if this function (or the
> > corresponding LAPACK function) is what is causing the
> > performance issue.  I previously had 7.1R14SP3 (32-bit)
> > installed on this same machine and found that back slash
> > with the symmetric matrix performed as well as 
backslash on
> > a nonsymmetric matrix, although I don't have the exact
> > results any more.
> >
> > I searched a bit on the MW website to see if this issue 
had
> > been commented on, but found no previous posts.  Has 
any one
> > seen a similar performance problem on their systems and 
does
> > any one know if MW is aware of this issue?
> >
> > -Grady
> > 
> 
> 


I found something similar just a few days ago. We have some 
old code running under version 2006a. We ported the code to 
2007b and suddenly the program ran 4 times slower on a dual 
core machine then on the old single core machine. After 
profiling we were able to find the offending statement. The 
simplified code can be seen here:

n = 1000;
k = rand(n-1,1);
a = diag(k,-1)+diag(k,1)+diag(-[0;k]-[k;k(end)]);
f = rand(n)+i*rand(n);
tic; x = a\f; toc

This runs slow. The matrix a is symmetric and tridiagonal.
The fix I had for Grady's code (yes there is a quick fix!!)

opt.SYM = false;
x = linsolve(a,f,opt);

doesn't help here because this only works with a full matrix

However in our case adding

aa = sparse(a);
x = aa\f;

works in some cases more the 10 times faster!


Instead of waiting for a full new release, wouldn't it be 
possible to write a quick and dirty mex file that calls 
right parts of blas and lapack directly? Or just fix the 
lapack dll?

Or does this problem run much deeper.

Olaf