<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/157962</link>
    <title>MATLAB Central Newsreader - Performance bug with solving symmetric linear systems with backslash?</title>
    <description>Feed for thread: Performance bug with solving symmetric linear systems with backslash?</description>
    <language>en-us</language>
    <copyright>&amp;copy;1994-2012 by MathWorks, Inc.</copyright>
    <webmaster>webmaster@mathworks.com</webmaster>
    <generator>MATLAB Central Newsreader</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <ttl>60</ttl>
    <image>
      <title>MathWorks</title>
      <url>http://www.mathworks.com/images/membrane_icon.gif</url>
    </image>
    <item>
      <pubDate>Thu, 18 Oct 2007 20:40:52 -0400</pubDate>
      <title>Performance bug with solving symmetric linear systems with backslash?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/157962#397421</link>
      <author>Grady </author>
      <description>I just got my new server (Dell PowerEdge 2950 with two Quad&lt;br&gt;
Core Intel Xeon X5355 processors, running 64-bit Linux&lt;br&gt;
CentOS dist.) up and running with Matlab 7.5.0 R2007b&lt;br&gt;
(64-bit version) and noticed performance issue when solving&lt;br&gt;
(dense) symmetric linear system with backslash.&lt;br&gt;
&lt;br&gt;
Here is a simple example to illustrate the issue.&lt;br&gt;
&lt;br&gt;
First, tell matlab it can use all 8 of the cores:&lt;br&gt;
&amp;gt;maxNumCompThreads = 8&lt;br&gt;
&lt;br&gt;
Create a 3000-by-3000 dense symmetric matrix:&lt;br&gt;
&amp;gt;A = rand(3000); A = (A + A.')/2;&lt;br&gt;
Create 1000 right hand sides &lt;br&gt;
&amp;gt;B = rand(3000,1000);&lt;br&gt;
Time how long it takes to solve the systems AX=B using&lt;br&gt;
backslash:&lt;br&gt;
&amp;gt;tic; X = A\B; toc&lt;br&gt;
The result is:&lt;br&gt;
Elapsed time is 29.206247 seconds&lt;br&gt;
&lt;br&gt;
Now create a non-symmetric 3000-by-3000 dense symmetric&lt;br&gt;
matrix and do the same calculation:&lt;br&gt;
&amp;gt;C = rand(3000);&lt;br&gt;
&amp;gt;tic; X = C\B; toc&lt;br&gt;
The result is:&lt;br&gt;
Elapsed time is 1.79076 seconds&lt;br&gt;
&lt;br&gt;
That is a huge difference between solving two linear systems&lt;br&gt;
of the same size.  I would expect the two times to be&lt;br&gt;
roughly the same, with perhaps the symmetric version faster.&lt;br&gt;
&lt;br&gt;
One thing I noticed while tracking the activity of the&lt;br&gt;
processor during these calculations is that the version with&lt;br&gt;
the symmetric solve only uses one core, while the&lt;br&gt;
non-symmetric solve appears to use all eight.  To see if&lt;br&gt;
that is the only issue, I forced matlab to not multithread&lt;br&gt;
the computations by turning off multithreading in the&lt;br&gt;
file&amp;gt;preferences&amp;gt;general&amp;gt;multithreading box.  Here are the&lt;br&gt;
results:&lt;br&gt;
&lt;br&gt;
Symmetric:&lt;br&gt;
&amp;gt;tic; X = A\B; toc&lt;br&gt;
The result is:&lt;br&gt;
Elapsed time is 30.243275 seconds&lt;br&gt;
&lt;br&gt;
Non-symmetric:&lt;br&gt;
&amp;gt;tic; X = C\B; toc&lt;br&gt;
The result is:&lt;br&gt;
Elapsed time is 5.138421 seconds&lt;br&gt;
&lt;br&gt;
This seems to indicate the problem is not solely from the&lt;br&gt;
non-symmetric solve using multithreading and the symmetric&lt;br&gt;
solve only using one thread (core).&lt;br&gt;
&lt;br&gt;
To make absolutely sure the problem is with the choice of&lt;br&gt;
solvers matlab is choosing in backslash (mldivide) function&lt;br&gt;
and not with the particular A and C matrices, I also used&lt;br&gt;
the linsolve command with the A matrix and told matlab which&lt;br&gt;
solver to use.  Here are the commands (note multithreading&lt;br&gt;
is again turned off):&lt;br&gt;
&lt;br&gt;
Use symmetric solver on AX=B&lt;br&gt;
&amp;gt;opts.SYM=true; tic; X=linsolve(A,B,opts); toc&lt;br&gt;
The result is:&lt;br&gt;
Elapsed time is 29.817919 seconds&lt;br&gt;
&lt;br&gt;
Use a non-symmetric solver on AX=B&lt;br&gt;
&amp;gt;opts.SYM=false; tic; X=linsolve(A,B,opts); toc&lt;br&gt;
The result is:&lt;br&gt;
Elapsed time is 5.051546 seconds&lt;br&gt;
&lt;br&gt;
According to the release notes for 2007b, the new function&lt;br&gt;
ldl was added for decomposing symmetric indefinite linear&lt;br&gt;
systems.  I'm not sure if this function (or the&lt;br&gt;
corresponding LAPACK function) is what is causing the&lt;br&gt;
performance issue.  I previously had 7.1R14SP3 (32-bit)&lt;br&gt;
installed on this same machine and found that back slash&lt;br&gt;
with the symmetric matrix performed as well as backslash on&lt;br&gt;
a nonsymmetric matrix, although I don't have the exact&lt;br&gt;
results any more.&lt;br&gt;
&lt;br&gt;
I searched a bit on the MW website to see if this issue had&lt;br&gt;
been commented on, but found no previous posts.  Has any one&lt;br&gt;
seen a similar performance problem on their systems and does&lt;br&gt;
any one know if MW is aware of this issue?&lt;br&gt;
&lt;br&gt;
-Grady</description>
    </item>
    <item>
      <pubDate>Fri, 19 Oct 2007 20:05:33 -0400</pubDate>
      <title>Re: Performance bug with solving symmetric linear systems with backslash?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/157962#397575</link>
      <author>Bobby Cheng</author>
      <description>Here is the surprise (even to me).&lt;br&gt;
&lt;br&gt;
dsytrs.f in LAPACK is using only level 2 BLAS instead of the usual level 3 &lt;br&gt;
BLAS like in dgetrs.f. So with mulitple RHS, the performance difference &lt;br&gt;
really shows.&lt;br&gt;
&lt;br&gt;
So this is an implementation issue with LAPACK. So there is no quick fix for &lt;br&gt;
this.&lt;br&gt;
&lt;br&gt;
But I hope to address this at least in MATLAB in a future release.&lt;br&gt;
&lt;br&gt;
Good catch and thanks,&lt;br&gt;
---Bob.&lt;br&gt;
&lt;br&gt;
&quot;Grady &quot; &amp;lt;rbfstuff@hotmail.com&amp;gt; wrote in message &lt;br&gt;
news:ff8gck$404$1@fred.mathworks.com...&lt;br&gt;
&amp;gt;I just got my new server (Dell PowerEdge 2950 with two Quad&lt;br&gt;
&amp;gt; Core Intel Xeon X5355 processors, running 64-bit Linux&lt;br&gt;
&amp;gt; CentOS dist.) up and running with Matlab 7.5.0 R2007b&lt;br&gt;
&amp;gt; (64-bit version) and noticed performance issue when solving&lt;br&gt;
&amp;gt; (dense) symmetric linear system with backslash.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Here is a simple example to illustrate the issue.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; First, tell matlab it can use all 8 of the cores:&lt;br&gt;
&amp;gt;&amp;gt;maxNumCompThreads = 8&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Create a 3000-by-3000 dense symmetric matrix:&lt;br&gt;
&amp;gt;&amp;gt;A = rand(3000); A = (A + A.')/2;&lt;br&gt;
&amp;gt; Create 1000 right hand sides&lt;br&gt;
&amp;gt;&amp;gt;B = rand(3000,1000);&lt;br&gt;
&amp;gt; Time how long it takes to solve the systems AX=B using&lt;br&gt;
&amp;gt; backslash:&lt;br&gt;
&amp;gt;&amp;gt;tic; X = A\B; toc&lt;br&gt;
&amp;gt; The result is:&lt;br&gt;
&amp;gt; Elapsed time is 29.206247 seconds&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Now create a non-symmetric 3000-by-3000 dense symmetric&lt;br&gt;
&amp;gt; matrix and do the same calculation:&lt;br&gt;
&amp;gt;&amp;gt;C = rand(3000);&lt;br&gt;
&amp;gt;&amp;gt;tic; X = C\B; toc&lt;br&gt;
&amp;gt; The result is:&lt;br&gt;
&amp;gt; Elapsed time is 1.79076 seconds&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; That is a huge difference between solving two linear systems&lt;br&gt;
&amp;gt; of the same size.  I would expect the two times to be&lt;br&gt;
&amp;gt; roughly the same, with perhaps the symmetric version faster.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; One thing I noticed while tracking the activity of the&lt;br&gt;
&amp;gt; processor during these calculations is that the version with&lt;br&gt;
&amp;gt; the symmetric solve only uses one core, while the&lt;br&gt;
&amp;gt; non-symmetric solve appears to use all eight.  To see if&lt;br&gt;
&amp;gt; that is the only issue, I forced matlab to not multithread&lt;br&gt;
&amp;gt; the computations by turning off multithreading in the&lt;br&gt;
&amp;gt; file&amp;gt;preferences&amp;gt;general&amp;gt;multithreading box.  Here are the&lt;br&gt;
&amp;gt; results:&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Symmetric:&lt;br&gt;
&amp;gt;&amp;gt;tic; X = A\B; toc&lt;br&gt;
&amp;gt; The result is:&lt;br&gt;
&amp;gt; Elapsed time is 30.243275 seconds&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Non-symmetric:&lt;br&gt;
&amp;gt;&amp;gt;tic; X = C\B; toc&lt;br&gt;
&amp;gt; The result is:&lt;br&gt;
&amp;gt; Elapsed time is 5.138421 seconds&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; This seems to indicate the problem is not solely from the&lt;br&gt;
&amp;gt; non-symmetric solve using multithreading and the symmetric&lt;br&gt;
&amp;gt; solve only using one thread (core).&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; To make absolutely sure the problem is with the choice of&lt;br&gt;
&amp;gt; solvers matlab is choosing in backslash (mldivide) function&lt;br&gt;
&amp;gt; and not with the particular A and C matrices, I also used&lt;br&gt;
&amp;gt; the linsolve command with the A matrix and told matlab which&lt;br&gt;
&amp;gt; solver to use.  Here are the commands (note multithreading&lt;br&gt;
&amp;gt; is again turned off):&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Use symmetric solver on AX=B&lt;br&gt;
&amp;gt;&amp;gt;opts.SYM=true; tic; X=linsolve(A,B,opts); toc&lt;br&gt;
&amp;gt; The result is:&lt;br&gt;
&amp;gt; Elapsed time is 29.817919 seconds&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Use a non-symmetric solver on AX=B&lt;br&gt;
&amp;gt;&amp;gt;opts.SYM=false; tic; X=linsolve(A,B,opts); toc&lt;br&gt;
&amp;gt; The result is:&lt;br&gt;
&amp;gt; Elapsed time is 5.051546 seconds&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; According to the release notes for 2007b, the new function&lt;br&gt;
&amp;gt; ldl was added for decomposing symmetric indefinite linear&lt;br&gt;
&amp;gt; systems.  I'm not sure if this function (or the&lt;br&gt;
&amp;gt; corresponding LAPACK function) is what is causing the&lt;br&gt;
&amp;gt; performance issue.  I previously had 7.1R14SP3 (32-bit)&lt;br&gt;
&amp;gt; installed on this same machine and found that back slash&lt;br&gt;
&amp;gt; with the symmetric matrix performed as well as backslash on&lt;br&gt;
&amp;gt; a nonsymmetric matrix, although I don't have the exact&lt;br&gt;
&amp;gt; results any more.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; I searched a bit on the MW website to see if this issue had&lt;br&gt;
&amp;gt; been commented on, but found no previous posts.  Has any one&lt;br&gt;
&amp;gt; seen a similar performance problem on their systems and does&lt;br&gt;
&amp;gt; any one know if MW is aware of this issue?&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; -Grady&lt;br&gt;
&amp;gt; </description>
    </item>
    <item>
      <pubDate>Thu, 20 Dec 2007 09:37:33 -0500</pubDate>
      <title>Re: Performance bug with solving symmetric linear systems with backslash?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/157962#406844</link>
      <author>Olaf Bousche</author>
      <description>&quot;Bobby Cheng&quot; &amp;lt;bcheng@mathworks.com&amp;gt; wrote in message &lt;br&gt;
&amp;lt;ffb2md$3p9$1@fred.mathworks.com&amp;gt;...&lt;br&gt;
&amp;gt; Here is the surprise (even to me).&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; dsytrs.f in LAPACK is using only level 2 BLAS instead of &lt;br&gt;
the usual level 3 &lt;br&gt;
&amp;gt; BLAS like in dgetrs.f. So with mulitple RHS, the &lt;br&gt;
performance difference &lt;br&gt;
&amp;gt; really shows.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; So this is an implementation issue with LAPACK. So there &lt;br&gt;
is no quick fix for &lt;br&gt;
&amp;gt; this.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; But I hope to address this at least in MATLAB in a future &lt;br&gt;
release.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Good catch and thanks,&lt;br&gt;
&amp;gt; ---Bob.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &quot;Grady &quot; &amp;lt;rbfstuff@hotmail.com&amp;gt; wrote in message &lt;br&gt;
&amp;gt; news:ff8gck$404$1@fred.mathworks.com...&lt;br&gt;
&amp;gt; &amp;gt;I just got my new server (Dell PowerEdge 2950 with two &lt;br&gt;
Quad&lt;br&gt;
&amp;gt; &amp;gt; Core Intel Xeon X5355 processors, running 64-bit Linux&lt;br&gt;
&amp;gt; &amp;gt; CentOS dist.) up and running with Matlab 7.5.0 R2007b&lt;br&gt;
&amp;gt; &amp;gt; (64-bit version) and noticed performance issue when &lt;br&gt;
solving&lt;br&gt;
&amp;gt; &amp;gt; (dense) symmetric linear system with backslash.&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Here is a simple example to illustrate the issue.&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; First, tell matlab it can use all 8 of the cores:&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt;maxNumCompThreads = 8&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Create a 3000-by-3000 dense symmetric matrix:&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt;A = rand(3000); A = (A + A.')/2;&lt;br&gt;
&amp;gt; &amp;gt; Create 1000 right hand sides&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt;B = rand(3000,1000);&lt;br&gt;
&amp;gt; &amp;gt; Time how long it takes to solve the systems AX=B using&lt;br&gt;
&amp;gt; &amp;gt; backslash:&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt;tic; X = A\B; toc&lt;br&gt;
&amp;gt; &amp;gt; The result is:&lt;br&gt;
&amp;gt; &amp;gt; Elapsed time is 29.206247 seconds&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Now create a non-symmetric 3000-by-3000 dense symmetric&lt;br&gt;
&amp;gt; &amp;gt; matrix and do the same calculation:&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt;C = rand(3000);&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt;tic; X = C\B; toc&lt;br&gt;
&amp;gt; &amp;gt; The result is:&lt;br&gt;
&amp;gt; &amp;gt; Elapsed time is 1.79076 seconds&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; That is a huge difference between solving two linear &lt;br&gt;
systems&lt;br&gt;
&amp;gt; &amp;gt; of the same size.  I would expect the two times to be&lt;br&gt;
&amp;gt; &amp;gt; roughly the same, with perhaps the symmetric version &lt;br&gt;
faster.&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; One thing I noticed while tracking the activity of the&lt;br&gt;
&amp;gt; &amp;gt; processor during these calculations is that the version &lt;br&gt;
with&lt;br&gt;
&amp;gt; &amp;gt; the symmetric solve only uses one core, while the&lt;br&gt;
&amp;gt; &amp;gt; non-symmetric solve appears to use all eight.  To see if&lt;br&gt;
&amp;gt; &amp;gt; that is the only issue, I forced matlab to not &lt;br&gt;
multithread&lt;br&gt;
&amp;gt; &amp;gt; the computations by turning off multithreading in the&lt;br&gt;
&amp;gt; &amp;gt; file&amp;gt;preferences&amp;gt;general&amp;gt;multithreading box.  Here are &lt;br&gt;
the&lt;br&gt;
&amp;gt; &amp;gt; results:&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Symmetric:&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt;tic; X = A\B; toc&lt;br&gt;
&amp;gt; &amp;gt; The result is:&lt;br&gt;
&amp;gt; &amp;gt; Elapsed time is 30.243275 seconds&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Non-symmetric:&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt;tic; X = C\B; toc&lt;br&gt;
&amp;gt; &amp;gt; The result is:&lt;br&gt;
&amp;gt; &amp;gt; Elapsed time is 5.138421 seconds&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; This seems to indicate the problem is not solely from &lt;br&gt;
the&lt;br&gt;
&amp;gt; &amp;gt; non-symmetric solve using multithreading and the &lt;br&gt;
symmetric&lt;br&gt;
&amp;gt; &amp;gt; solve only using one thread (core).&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; To make absolutely sure the problem is with the choice &lt;br&gt;
of&lt;br&gt;
&amp;gt; &amp;gt; solvers matlab is choosing in backslash (mldivide) &lt;br&gt;
function&lt;br&gt;
&amp;gt; &amp;gt; and not with the particular A and C matrices, I also &lt;br&gt;
used&lt;br&gt;
&amp;gt; &amp;gt; the linsolve command with the A matrix and told matlab &lt;br&gt;
which&lt;br&gt;
&amp;gt; &amp;gt; solver to use.  Here are the commands (note &lt;br&gt;
multithreading&lt;br&gt;
&amp;gt; &amp;gt; is again turned off):&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Use symmetric solver on AX=B&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt;opts.SYM=true; tic; X=linsolve(A,B,opts); toc&lt;br&gt;
&amp;gt; &amp;gt; The result is:&lt;br&gt;
&amp;gt; &amp;gt; Elapsed time is 29.817919 seconds&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Use a non-symmetric solver on AX=B&lt;br&gt;
&amp;gt; &amp;gt;&amp;gt;opts.SYM=false; tic; X=linsolve(A,B,opts); toc&lt;br&gt;
&amp;gt; &amp;gt; The result is:&lt;br&gt;
&amp;gt; &amp;gt; Elapsed time is 5.051546 seconds&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; According to the release notes for 2007b, the new &lt;br&gt;
function&lt;br&gt;
&amp;gt; &amp;gt; ldl was added for decomposing symmetric indefinite &lt;br&gt;
linear&lt;br&gt;
&amp;gt; &amp;gt; systems.  I'm not sure if this function (or the&lt;br&gt;
&amp;gt; &amp;gt; corresponding LAPACK function) is what is causing the&lt;br&gt;
&amp;gt; &amp;gt; performance issue.  I previously had 7.1R14SP3 (32-bit)&lt;br&gt;
&amp;gt; &amp;gt; installed on this same machine and found that back slash&lt;br&gt;
&amp;gt; &amp;gt; with the symmetric matrix performed as well as &lt;br&gt;
backslash on&lt;br&gt;
&amp;gt; &amp;gt; a nonsymmetric matrix, although I don't have the exact&lt;br&gt;
&amp;gt; &amp;gt; results any more.&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; I searched a bit on the MW website to see if this issue &lt;br&gt;
had&lt;br&gt;
&amp;gt; &amp;gt; been commented on, but found no previous posts.  Has &lt;br&gt;
any one&lt;br&gt;
&amp;gt; &amp;gt; seen a similar performance problem on their systems and &lt;br&gt;
does&lt;br&gt;
&amp;gt; &amp;gt; any one know if MW is aware of this issue?&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; -Grady&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &lt;br&gt;
&lt;br&gt;
&lt;br&gt;
I found something similar just a few days ago. We have some &lt;br&gt;
old code running under version 2006a. We ported the code to &lt;br&gt;
2007b and suddenly the program ran 4 times slower on a dual &lt;br&gt;
core machine then on the old single core machine. After &lt;br&gt;
profiling we were able to find the offending statement. The &lt;br&gt;
simplified code can be seen here:&lt;br&gt;
&lt;br&gt;
n = 1000;&lt;br&gt;
k = rand(n-1,1);&lt;br&gt;
a = diag(k,-1)+diag(k,1)+diag(-[0;k]-[k;k(end)]);&lt;br&gt;
f = rand(n)+i*rand(n);&lt;br&gt;
tic; x = a\f; toc&lt;br&gt;
&lt;br&gt;
This runs slow. The matrix a is symmetric and tridiagonal.&lt;br&gt;
The fix I had for Grady's code (yes there is a quick fix!!)&lt;br&gt;
&lt;br&gt;
opt.SYM = false;&lt;br&gt;
x = linsolve(a,f,opt);&lt;br&gt;
&lt;br&gt;
doesn't help here because this only works with a full matrix&lt;br&gt;
&lt;br&gt;
However in our case adding&lt;br&gt;
&lt;br&gt;
aa = sparse(a);&lt;br&gt;
x = aa\f;&lt;br&gt;
&lt;br&gt;
works in some cases more the 10 times faster!&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
Instead of waiting for a full new release, wouldn't it be &lt;br&gt;
possible to write a quick and dirty mex file that calls &lt;br&gt;
right parts of blas and lapack directly? Or just fix the &lt;br&gt;
lapack dll?&lt;br&gt;
&lt;br&gt;
Or does this problem run much deeper.&lt;br&gt;
&lt;br&gt;
Olaf</description>
    </item>
    <item>
      <pubDate>Sat, 19 Jan 2008 10:41:01 -0500</pubDate>
      <title>Re: Performance bug with solving symmetric linear systems with backslash?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/157962#410153</link>
      <author>Derek O'Connor</author>
      <description>&quot;Grady &quot; &amp;lt;rbfstuff@hotmail.com&amp;gt; wrote in message&lt;br&gt;
&amp;lt;ff8gck$404$1@fred.mathworks.com&amp;gt;...&lt;br&gt;
&amp;gt; I just got my new server (Dell PowerEdge 2950 with two Quad&lt;br&gt;
&amp;gt; Core Intel Xeon X5355 processors, running 64-bit Linux&lt;br&gt;
&amp;gt; CentOS dist.) up and running with Matlab 7.5.0 R2007b&lt;br&gt;
&amp;gt; (64-bit version) and noticed performance issue when solving&lt;br&gt;
&amp;gt; (dense) symmetric linear system with backslash.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt;&amp;gt; Here is a simple example to illustrate the issue.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; First, tell matlab it can use all 8 of the cores:&lt;br&gt;
&amp;gt; &amp;gt;maxNumCompThreads = 8&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Create a 3000-by-3000 dense symmetric matrix:&lt;br&gt;
&amp;gt; &amp;gt;A = rand(3000); A = (A + A.')/2;&lt;br&gt;
&amp;gt; Create 1000 right hand sides &lt;br&gt;
&amp;gt; &amp;gt;B = rand(3000,1000);&lt;br&gt;
&amp;gt; Time how long it takes to solve the systems AX=B using&lt;br&gt;
&amp;gt; backslash:&lt;br&gt;
&amp;gt; &amp;gt;tic; X = A\B; toc&lt;br&gt;
&amp;gt; The result is:&lt;br&gt;
&amp;gt; Elapsed time is 29.206247 seconds&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Now create a non-symmetric 3000-by-3000 dense symmetric&lt;br&gt;
&amp;gt; matrix and do the same calculation:&lt;br&gt;
&amp;gt; &amp;gt;C = rand(3000);&lt;br&gt;
&amp;gt; &amp;gt;tic; X = C\B; toc&lt;br&gt;
&amp;gt; The result is:&lt;br&gt;
&amp;gt; Elapsed time is 1.79076 seconds&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; That is a huge difference between solving two linear systems&lt;br&gt;
&amp;gt; of the same size.  I would expect the two times to be&lt;br&gt;
&amp;gt; roughly the same, with perhaps the symmetric version faster.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; One thing I noticed while tracking the activity of the&lt;br&gt;
&amp;gt; processor during these calculations is that the version with&lt;br&gt;
&amp;gt; the symmetric solve only uses one core, while the&lt;br&gt;
&amp;gt; non-symmetric solve appears to use all eight.  To see if&lt;br&gt;
&amp;gt; that is the only issue, I forced matlab to not multithread&lt;br&gt;
&amp;gt; the computations by turning off multithreading in the&lt;br&gt;
&amp;gt; file&amp;gt;preferences&amp;gt;general&amp;gt;multithreading box.  Here are the&lt;br&gt;
&amp;gt; results:&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Symmetric:&lt;br&gt;
&amp;gt; &amp;gt;tic; X = A\B; toc&lt;br&gt;
&amp;gt; The result is:&lt;br&gt;
&amp;gt; Elapsed time is 30.243275 seconds&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Non-symmetric:&lt;br&gt;
&amp;gt; &amp;gt;tic; X = C\B; toc&lt;br&gt;
&amp;gt; The result is:&lt;br&gt;
&amp;gt; Elapsed time is 5.138421 seconds&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; This seems to indicate the problem is not solely from the&lt;br&gt;
&amp;gt; non-symmetric solve using multithreading and the symmetric&lt;br&gt;
&amp;gt; solve only using one thread (core).&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; To make absolutely sure the problem is with the choice of&lt;br&gt;
&amp;gt; solvers matlab is choosing in backslash (mldivide) function&lt;br&gt;
&amp;gt; and not with the particular A and C matrices, I also used&lt;br&gt;
&amp;gt; the linsolve command with the A matrix and told matlab which&lt;br&gt;
&amp;gt; solver to use.  Here are the commands (note multithreading&lt;br&gt;
&amp;gt; is again turned off):&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Use symmetric solver on AX=B&lt;br&gt;
&amp;gt; &amp;gt;opts.SYM=true; tic; X=linsolve(A,B,opts); toc&lt;br&gt;
&amp;gt; The result is:&lt;br&gt;
&amp;gt; Elapsed time is 29.817919 seconds&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Use a non-symmetric solver on AX=B&lt;br&gt;
&amp;gt; &amp;gt;opts.SYM=false; tic; X=linsolve(A,B,opts); toc&lt;br&gt;
&amp;gt; The result is:&lt;br&gt;
&amp;gt; Elapsed time is 5.051546 seconds&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; According to the release notes for 2007b, the new function&lt;br&gt;
&amp;gt; ldl was added for decomposing symmetric indefinite linear&lt;br&gt;
&amp;gt; systems.  I'm not sure if this function (or the&lt;br&gt;
&amp;gt; corresponding LAPACK function) is what is causing the&lt;br&gt;
&amp;gt; performance issue.  I previously had 7.1R14SP3 (32-bit)&lt;br&gt;
&amp;gt; installed on this same machine and found that back slash&lt;br&gt;
&amp;gt; with the symmetric matrix performed as well as backslash on&lt;br&gt;
&amp;gt; a nonsymmetric matrix, although I don't have the exact&lt;br&gt;
&amp;gt; results any more.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; I searched a bit on the MW website to see if this issue had&lt;br&gt;
&amp;gt; been commented on, but found no previous posts.  Has any one&lt;br&gt;
&amp;gt; seen a similar performance problem on their systems and does&lt;br&gt;
&amp;gt; any one know if MW is aware of this issue?&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; -Grady&lt;br&gt;
&amp;gt; &lt;br&gt;
&lt;br&gt;
Dear Grady,&lt;br&gt;
&lt;br&gt;
I got exactly the same results as yours on my Dell Precision&lt;br&gt;
690  2x Intel (QuadCore) Xeon CPU E5345 @ 2.33GHZ  running&lt;br&gt;
64-bit Windows Vista with Matlab 7.5.0 R2007b 64-bit version)&lt;br&gt;
&lt;br&gt;
Here is a statement from a MathWorker that may have some&lt;br&gt;
bearing on the problem :&lt;br&gt;
&lt;br&gt;
&amp;gt;Subject: Re: MEX File built with MKL crashes&lt;br&gt;
&lt;br&gt;
&amp;gt;From: Duncan Po (MathWorker)&lt;br&gt;
&lt;br&gt;
&amp;gt;Date: 13 Dec, 2007 12:22:21&lt;br&gt;
&lt;br&gt;
&amp;gt;Is there any reason why you need to use the LAPACK in MKL?&lt;br&gt;
&amp;gt;In MATLAB, we only support using the LAPACK version 3.1&lt;br&gt;
&amp;gt;that can be downloaded from www.netlib.org. We have never&lt;br&gt;
&amp;gt;tested using the LAPACK from the MKL package, therefore we&lt;br&gt;
&amp;gt;cannot guarantee it works without crashing.&lt;br&gt;
&lt;br&gt;
It would be useful to know what parts of (Intel's MKL) Blas&lt;br&gt;
&amp; Lapack Matlab uses in its matrix functions. Also it would&lt;br&gt;
be nice to have VER command tell us what version of MKL is&lt;br&gt;
in use. The latest from Intel is MKL 10.1&lt;br&gt;
&amp;nbsp;&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
Derek O'Connor</description>
    </item>
    <item>
      <pubDate>Mon, 14 Apr 2008 13:01:03 -0400</pubDate>
      <title>Re: Performance bug with solving symmetric linear systems with backslash?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/157962#426487</link>
      <author>Olaf Bousche</author>
      <description>Just checked with Matlab 2008a&lt;br&gt;
&lt;br&gt;
Unfortunately, the problem is still there&lt;br&gt;
&lt;br&gt;
Olaf</description>
    </item>
  </channel>
</rss>

