Thread Subject: Linear system

Subject: Linear system

From: Luigi Giaccari

Date: 6 Feb, 2009 10:17:01

Message: 1 of 13

What the fastest way to solve a 3x3 linear system of the form Ax=b?

A is not simmetric and sometimes ill-conditioned. In matlab of course the \ command can be used. But i would like to write a C-mex routine optimized for small sistems,
because I have a lot of small systems and solving them sequenzialy in a loop becomes quite slow.



Ps I have tried the LU factorization from numerical recipies. Turns out that a brute Cramer method is from 3 to 4 times faster. Maybe the LU is designed for big systems and has too much overhead time for small ones.

Subject: Linear system

From: John D'Errico

Date: 6 Feb, 2009 17:43:02

Message: 2 of 13

"Luigi Giaccari" <giaccariluigi@msn.com> wrote in message <gmh2mt$2hj$1@fred.mathworks.com>...
> What the fastest way to solve a 3x3 linear system of the form Ax=b?
>
> A is not simmetric and sometimes ill-conditioned. In matlab of course the \ command can be used. But i would like to write a C-mex routine optimized for small sistems,
> because I have a lot of small systems and solving them sequenzialy in a loop becomes quite slow.
>
>
>
> Ps I have tried the LU factorization from numerical recipies. Turns out that a brute Cramer method is from 3 to 4 times faster. Maybe the LU is designed for big systems and has too much overhead time for small ones.

Probably the reason is your code for the Cramer's
rule was more streamlined. The Numerical recipes
in C code is probably an f2c conversion of Fortran
code.

I would avoid Cramer's rule.

John

Subject: Linear system

From: Luigi Giaccari

Date: 6 Feb, 2009 18:15:03

Message: 3 of 13

"John D'Errico" <woodchips@rochester.rr.com> wrote in message <gmhsr6$7gl$1@fred.mathworks.com>...
> "Luigi Giaccari" <giaccariluigi@msn.com> wrote in message <gmh2mt$2hj$1@fred.mathworks.com>...
> > What the fastest way to solve a 3x3 linear system of the form Ax=b?
> >
> > A is not simmetric and sometimes ill-conditioned. In matlab of course the \ command can be used. But i would like to write a C-mex routine optimized for small sistems,
> > because I have a lot of small systems and solving them sequenzialy in a loop becomes quite slow.
> >
> >
> >
> > Ps I have tried the LU factorization from numerical recipies. Turns out that a brute Cramer method is from 3 to 4 times faster. Maybe the LU is designed for big systems and has too much overhead time for small ones.
>
> Probably the reason is your code for the Cramer's
> rule was more streamlined. The Numerical recipes
> in C code is probably an f2c conversion of Fortran
> code.
>
> I would avoid Cramer's rule.
>
> John


I thing this a tough question, Cramer rule allows to compute the solution with
4 straight multiplications. LU requires some loop, but maybe these can be avoided with an optimized version for 3x3 linear system.

At the moment i prefer to axcept little numerical inaccuracy to gain more speed.
On my 2.4 Ghz Pc Cramer's rule can solve 1e6 3x3 doubles systems in 0.35 seconds.
So the in last version Cramer version I coded the 3-4 factor becomes 20-30.

But of course in my numerical recipies version there are some classes that increase overhead times. My code is just (I don't know the english word I mean without classes just a linear flow o C coding) .

If somebody finds something faster please let me know.

Subject: Linear system

From: Peter Boettcher

Date: 6 Feb, 2009 18:48:32

Message: 4 of 13

"Luigi Giaccari" <giaccariluigi@msn.com> writes:

> What the fastest way to solve a 3x3 linear system of the form Ax=b?
>
> A is not simmetric and sometimes ill-conditioned. In matlab of course
> the \ command can be used. But i would like to write a C-mex routine
> optimized for small sistems, because I have a lot of small systems and
> solving them sequenzialy in a loop becomes quite slow.
>
>
>
> Ps I have tried the LU factorization from numerical recipies. Turns
> out that a brute Cramer method is from 3 to 4 times faster. Maybe the
> LU is designed for big systems and has too much overhead time for
> small ones.

See NDFUN, a mex file at http://www.mit.edu/~pwb/matlab/

It might do exactly what you need out-of-box. If not, it will give you
some ideas toward calling the LAPACK routines built into MATLAB that
will do the compute for you.

-Peter

Subject: Linear system

From: Luigi Giaccari

Date: 6 Feb, 2009 19:05:03

Message: 5 of 13

Peter Boettcher <boettcher@ll.mit.edu> wrote in message <muy7i43flkv.fsf@G99-Boettcher.llan.ll.mit.edu>...
> "Luigi Giaccari" <giaccariluigi@msn.com> writes:
>
> > What the fastest way to solve a 3x3 linear system of the form Ax=b?
> >
> > A is not simmetric and sometimes ill-conditioned. In matlab of course
> > the \ command can be used. But i would like to write a C-mex routine
> > optimized for small sistems, because I have a lot of small systems and
> > solving them sequenzialy in a loop becomes quite slow.
> >
> >
> >
> > Ps I have tried the LU factorization from numerical recipies. Turns
> > out that a brute Cramer method is from 3 to 4 times faster. Maybe the
> > LU is designed for big systems and has too much overhead time for
> > small ones.
>
> See NDFUN, a mex file at http://www.mit.edu/~pwb/matlab/
>
> It might do exactly what you need out-of-box. If not, it will give you
> some ideas toward calling the LAPACK routines built into MATLAB that
> will do the compute for you.
>
> -Peter


Thank you very much I will contact you in the next days, I am actually now going out of town and I will not have internet access.

Subject: Linear system

From: Matt

Date: 6 Feb, 2009 19:23:01

Message: 6 of 13

"Luigi Giaccari" <giaccariluigi@msn.com> wrote in message <gmh2mt$2hj$1@fred.mathworks.com>...

> But i would like to write a C-mex routine optimized for small sistems,
> because I have a lot of small systems and solving them sequenzialy in a loop becomes quite slow.

There are ways you could do this in MATLAB without a loop.

In particular, you could load your sequence of A matrices into a sparse block diagonal Atot and similarly concatenate your b matrix data into one long coumn vector btot.

The problem is then equivalent to the single equation Atot*xtot=btot

Subject: Linear system

From: Luigi Giaccari

Date: 8 Feb, 2009 11:41:01

Message: 7 of 13

"Matt " <xys@whatever.com> wrote in message <gmi2ml$7tr$1@fred.mathworks.com>...
> "Luigi Giaccari" <giaccariluigi@msn.com> wrote in message <gmh2mt$2hj$1@fred.mathworks.com>...
>
> > But i would like to write a C-mex routine optimized for small sistems,
> > because I have a lot of small systems and solving them sequenzialy in a loop becomes quite slow.
>
> There are ways you could do this in MATLAB without a loop.
>
> In particular, you could load your sequence of A matrices into a sparse block diagonal Atot and similarly concatenate your b matrix data into one long coumn vector btot.
>
> The problem is then equivalent to the single equation Atot*xtot=btot

I thought about that but i don't thing that is the solution, a loop a is still nedeed to assemble the system. This can be vectorized with the sparse command but there is another problem.
 Given two same size non sparse matrix, non sparse solver are generally faster than sparse ones.

In conclusion I think this not the fastest way.

Subject: Linear system

From: Matt

Date: 8 Feb, 2009 14:05:04

Message: 8 of 13

"Luigi Giaccari" <giaccariluigi@msn.com> wrote in message <gmmgcd$gja$1@fred.mathworks.com>...
>
>but there is another problem.
> Given two same size non sparse matrix, non sparse solver are generally faster than sparse ones.
--------------------------------------
 
Well, I don't know why, but it's not the case in MATLAB, apparently. See the following test

>> A=rand(3);Asp=sparse(A); b=rand(3,1);

>> tic; for ii=1:100000; A\b; end;toc
Elapsed time is 1.492591 seconds.

>> tic; for ii=1:100000; Asp\b; end;toc
Elapsed time is 0.672042 seconds.

Subject: Linear system

From: Luigi Giaccari

Date: 8 Feb, 2009 15:31:02

Message: 9 of 13

"Matt " <xys@whatever.com> wrote in message <gmmoqg$fub$1@fred.mathworks.com>...
> "Luigi Giaccari" <giaccariluigi@msn.com> wrote in message <gmmgcd$gja$1@fred.mathworks.com>...
> >
> >but there is another problem.
> > Given two same size non sparse matrix, non sparse solver are generally faster than sparse ones.
> --------------------------------------
>
> Well, I don't know why, but it's not the case in MATLAB, apparently. See the following test
>
> >> A=rand(3);Asp=sparse(A); b=rand(3,1);
>
> >> tic; for ii=1:100000; A\b; end;toc
> Elapsed time is 1.492591 seconds.
>
> >> tic; for ii=1:100000; Asp\b; end;toc
> Elapsed time is 0.672042 seconds.

Nice post!!!!
This very strange, even stranger is that it works for big systems like 100x100

A=rand(100);Asp=sparse(A); b=rand(100,1);
tic; for ii=1:1000; A\b; end;toc
tic; for ii=1:1000; Asp\b; end;toc
Elapsed time is 0.671274 seconds.
Elapsed time is 0.383006 seconds.

and 1000x1000

A=rand(1000);Asp=sparse(A); b=rand(1000,1);
tic; for ii=1:10; A\b; end;toc
tic; for ii=1:10; Asp\b; end;toc
Elapsed time is 1.463496 seconds.
Elapsed time is 1.411097 seconds.

At this point the question is what is the non sparse solver for? Since the sparce one is always faster. This could a new post message.

Subject: Linear system

From: Tim Davis

Date: 9 Feb, 2009 15:09:02

Message: 10 of 13

If you want to see what the sparse A\b solver is doing, just do

spparms('spumoni',2)

WIth A=sparse(rand(100)), the A\b solver sees that the matrix is "banded" with a bandwidth that takes up the whole matrix. It converts the matrix to banded form and uses the LAPACK band solver. full(A)\b does something a little different, using a different LAPACK solver (the general one, I forget the name).

So you're just comparing one LAPACK routine with another one, not comparing sparse vs full solvers.

Subject: Linear system

From: Luigi Giaccari

Date: 9 Feb, 2009 16:19:02

Message: 11 of 13

"Tim Davis" <davis@cise.ufl.edu> wrote in message <gmpgue$8ev$1@fred.mathworks.com>...
> If you want to see what the sparse A\b solver is doing, just do
>
> spparms('spumoni',2)
>
> WIth A=sparse(rand(100)), the A\b solver sees that the matrix is "banded" with a bandwidth that takes up the whole matrix. It converts the matrix to banded form and uses the LAPACK band solver. full(A)\b does something a little different, using a different LAPACK solver (the general one, I forget the name).
>
> So you're just comparing one LAPACK routine with another one, not comparing sparse vs full solvers.


Wow , I didn't imagine so many things behind the \ command, Thank you for your explanation.
 But actually we are going far away from the original post purpose that is:

Where to find the code that solve in fastest possibile way a 3x3 system?


I have heard about lapack routines, but they were too many and I simply got lost in them without finding the one that fits my problem. Moreover they are fortran coded and I am not confortable with fortran. Better with C++.


my purpose is to code a mex routine that solve a lots of small system in the minor possible time.

Subject: Linear system

From: Matt

Date: 11 Feb, 2009 02:47:01

Message: 12 of 13

"Luigi Giaccari" <giaccariluigi@msn.com> wrote in message <gmpl1m$em1$1@fred.mathworks.com>...
>
> But actually we are going far away from the original post purpose that is:
>
> Where to find the code that solve in fastest possibile way a 3x3 system?

You'll never be able to prove to yourself that you've found the "fastest possible way" and that there isn't something better out there. The best you can do is state a performance target and let us submit possibilities that might achieve it.

It might therefore be good for you to mention how many 3x3 systems you need to solve and within what time. Without that it's hard to see why non-MEX MATLAB tools are not going to be adequate.

In particular, the following is roughly like solving 10^6 such 3x3 systems in under 2 seconds. Still not good enough?

>> n=3e6; e=ones(n,1);A = spdiags(rand(n,5), -1:1, n, n);
>> tic; x=A\e; toc
Elapsed time is 1.449132 seconds.

Subject: Linear system

From: Luigi Giaccari

Date: 11 Feb, 2009 07:58:02

Message: 13 of 13

"Matt " <xys@whatever.com> wrote in message <gmte75$8cj$1@fred.mathworks.com>...
> "Luigi Giaccari" <giaccariluigi@msn.com> wrote in message <gmpl1m$em1$1@fred.mathworks.com>...
> >
> > But actually we are going far away from the original post purpose that is:
> >
> > Where to find the code that solve in fastest possibile way a 3x3 system?
>
> You'll never be able to prove to yourself that you've found the "fastest possible way" and that there isn't something better out there. The best you can do is state a performance target and let us submit possibilities that might achieve it.
>
> It might therefore be good for you to mention how many 3x3 systems you need to solve and within what time. Without that it's hard to see why non-MEX MATLAB tools are not going to be adequate.
>
> In particular, the following is roughly like solving 10^6 such 3x3 systems in under 2 seconds. Still not good enough?
>
> >> n=3e6; e=ones(n,1);A = spdiags(rand(n,5), -1:1, n, n);
> >> tic; x=A\e; toc
> Elapsed time is 1.449132 seconds.

ANSWER

Thank you for reply.
Actually you did not solve 3x3 systems but just a tridiagonal matrix, I used to process this way, maybe a little rough but it works:

clc
clearvars
close all

nsyst=1e6;%number of systems
n=nsyst*3;%size of the sparse matrix


elements=rand(3,n);%elemts of nsyst 3x3 systems
b=rand(n,1);%known vector

tic

%get the index of the elements
ind1=1:3:n;%one plus 3 at each system
ind2=2:3:n;%two plus 3 at each system
ind3=3:3:n;%trhee plus 3 at each system


%Get the coordinates of the elements
index=zeros(nsyst*9,2);%can not be converted to integer type to save memory since
                       %the sparse command requires doubles

index(1:9:end,1)=ind1;%(1,1)
index(1:9:end,2)=ind1;

index(2:9:end,1)=ind1;%(1,2)
index(2:9:end,2)=ind2;%(1,2)


index(3:9:end,1)=ind1;%(1,3)
index(3:9:end,2)=ind3;

index(4:9:end,1)=ind2;%(2,1)
index(4:9:end,2)=ind1;


index(5:9:end,1)=ind2;%(2,2)
index(5:9:end,2)=ind2;

index(6:9:end,1)=ind2;%(2,3)
index(6:9:end,2)=ind3;


index(7:9:end,1)=ind3;%(3,1)
index(7:9:end,2)=ind1;

index(8:9:end,1)=ind3;%(3,2)
index(8:9:end,2)=ind2;


index(9:9:end,1)=ind3;%(3,3)
index(9:9:end,2)=ind3;


A=sparse(index(:,1),index(:,2),elements(:),n,n);%assembling the system
clear index
% spy(A)%to check if well assembled

x=A\b;%solve the system

toc


Elapsed time is 4.019588 seconds.

So it handles a million systems in four seconds, this not bad at all but it can be much better.

I have a C solver based on cramer's rule that can handle the same size input in less than 0.3 seconds. So more than 10 times faster. If I have time in future I wil put it on Matlab central.

 I thought with an optimized LU or Gauss it could be even faster but this is not so easy. They requires if statements and zero values check: loss of time. But I want to say that I never wrote down a code about linear systems so, I thought maybe a professional can do much better than me.

Morover my problem is find circumcenters of tetraedrons, they can even be of the 1e7 order. Since the program purpose is speed, every seconds counts.
Also memory error is a big problem in the Matlab version but this I am not sure can be solved with mex implementation.Can it be?

In conclusion I thing if an improvement can be done, it can be done with an LU or Gauss specifically designed for 3x3 input. In the next days I will study the algorithms and try to find my own solution.

Thank you again

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
sparse Matt J 8 Feb, 2009 11:19:27
algebra Luigi Giaccari 6 Feb, 2009 05:20:03
linear Luigi Giaccari 6 Feb, 2009 05:20:03
rssFeed for this Thread

Contact us at files@mathworks.com