"Bobby Cheng" <bcheng@mathworks.com> wrote in message
<ffb2md$3p9$1@fred.mathworks.com>...
> Here is the surprise (even to me).
>
> dsytrs.f in LAPACK is using only level 2 BLAS instead of
the usual level 3
> BLAS like in dgetrs.f. So with mulitple RHS, the
performance difference
> really shows.
>
> So this is an implementation issue with LAPACK. So there
is no quick fix for
> this.
>
> But I hope to address this at least in MATLAB in a future
release.
>
> Good catch and thanks,
> Bob.
>
> "Grady " <rbfstuff@hotmail.com> wrote in message
> news:ff8gck$404$1@fred.mathworks.com...
> >I just got my new server (Dell PowerEdge 2950 with two
Quad
> > Core Intel Xeon X5355 processors, running 64bit Linux
> > CentOS dist.) up and running with Matlab 7.5.0 R2007b
> > (64bit version) and noticed performance issue when
solving
> > (dense) symmetric linear system with backslash.
> >
> > Here is a simple example to illustrate the issue.
> >
> > First, tell matlab it can use all 8 of the cores:
> >>maxNumCompThreads = 8
> >
> > Create a 3000by3000 dense symmetric matrix:
> >>A = rand(3000); A = (A + A.')/2;
> > Create 1000 right hand sides
> >>B = rand(3000,1000);
> > Time how long it takes to solve the systems AX=B using
> > backslash:
> >>tic; X = A\B; toc
> > The result is:
> > Elapsed time is 29.206247 seconds
> >
> > Now create a nonsymmetric 3000by3000 dense symmetric
> > matrix and do the same calculation:
> >>C = rand(3000);
> >>tic; X = C\B; toc
> > The result is:
> > Elapsed time is 1.79076 seconds
> >
> > That is a huge difference between solving two linear
systems
> > of the same size. I would expect the two times to be
> > roughly the same, with perhaps the symmetric version
faster.
> >
> > One thing I noticed while tracking the activity of the
> > processor during these calculations is that the version
with
> > the symmetric solve only uses one core, while the
> > nonsymmetric solve appears to use all eight. To see if
> > that is the only issue, I forced matlab to not
multithread
> > the computations by turning off multithreading in the
> > file>preferences>general>multithreading box. Here are
the
> > results:
> >
> > Symmetric:
> >>tic; X = A\B; toc
> > The result is:
> > Elapsed time is 30.243275 seconds
> >
> > Nonsymmetric:
> >>tic; X = C\B; toc
> > The result is:
> > Elapsed time is 5.138421 seconds
> >
> > This seems to indicate the problem is not solely from
the
> > nonsymmetric solve using multithreading and the
symmetric
> > solve only using one thread (core).
> >
> > To make absolutely sure the problem is with the choice
of
> > solvers matlab is choosing in backslash (mldivide)
function
> > and not with the particular A and C matrices, I also
used
> > the linsolve command with the A matrix and told matlab
which
> > solver to use. Here are the commands (note
multithreading
> > is again turned off):
> >
> > Use symmetric solver on AX=B
> >>opts.SYM=true; tic; X=linsolve(A,B,opts); toc
> > The result is:
> > Elapsed time is 29.817919 seconds
> >
> > Use a nonsymmetric solver on AX=B
> >>opts.SYM=false; tic; X=linsolve(A,B,opts); toc
> > The result is:
> > Elapsed time is 5.051546 seconds
> >
> > According to the release notes for 2007b, the new
function
> > ldl was added for decomposing symmetric indefinite
linear
> > systems. I'm not sure if this function (or the
> > corresponding LAPACK function) is what is causing the
> > performance issue. I previously had 7.1R14SP3 (32bit)
> > installed on this same machine and found that back slash
> > with the symmetric matrix performed as well as
backslash on
> > a nonsymmetric matrix, although I don't have the exact
> > results any more.
> >
> > I searched a bit on the MW website to see if this issue
had
> > been commented on, but found no previous posts. Has
any one
> > seen a similar performance problem on their systems and
does
> > any one know if MW is aware of this issue?
> >
> > Grady
> >
>
>
I found something similar just a few days ago. We have some
old code running under version 2006a. We ported the code to
2007b and suddenly the program ran 4 times slower on a dual
core machine then on the old single core machine. After
profiling we were able to find the offending statement. The
simplified code can be seen here:
n = 1000;
k = rand(n1,1);
a = diag(k,1)+diag(k,1)+diag([0;k][k;k(end)]);
f = rand(n)+i*rand(n);
tic; x = a\f; toc
This runs slow. The matrix a is symmetric and tridiagonal.
The fix I had for Grady's code (yes there is a quick fix!!)
opt.SYM = false;
x = linsolve(a,f,opt);
doesn't help here because this only works with a full matrix
However in our case adding
aa = sparse(a);
x = aa\f;
works in some cases more the 10 times faster!
Instead of waiting for a full new release, wouldn't it be
possible to write a quick and dirty mex file that calls
right parts of blas and lapack directly? Or just fix the
lapack dll?
Or does this problem run much deeper.
Olaf
