Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Sparsity pattern problems solving x = A\b

Subject: Sparsity pattern problems solving x = A\b

From: Thomas Clark

Date: 22 Dec, 2008 15:20:05

Message: 1 of 6

Hi all,

I'm solving the Poisson equation (with simple boundary conditions) in both 2D and 3D, for relatively small domains. This boils down to constructing a highly sparse coefficients matrix, then I use the backslash operator to solve.

2D and 3D problems produce very similar sparsity patterns (tridiagonal with fringes). I've posted jpg images of these at:
www2.eng.cam.ac.uk/~thc29/sparsity_2d_4x4.jpg
and
www2.eng.cam.ac.uk/~thc29/sparsity_3d_3x3x3.jpg

With the 2D problem, I can solve a very large domain - the (square) coefficients matrix can have up to 14.3m nonzeros in 2.8m rows before I get an out-of-memory error from mldivide, which is fine.

However, mldivide with the 3D sparsity pattern requires a great deal more memory - with my max capacity I can only solve a coefficients matrix with 1.2m nonzeros in 0.19m rows.

Can anyone help me to understand why the slight change in sparsity pattern makes so much difference please?

If anyone can suggest a different method to backslash, which uses less memory during the solution, I'd be very grateful... I really don't want to have to implement a grid based method (trying to keep my code as simple as possible).

My codes (for 2D and 3D solutions) are at:
 www2.eng.cam.ac.uk/~thc29/poisson_2d.m
and
www2.eng.cam.ac.uk/~thc29/poisson_3d.m

Scripts to create artificial inputs and run the solvers* are at:
 www2.eng.cam.ac.uk/~thc29/test_poisson2d.m
and
www2.eng.cam.ac.uk/~thc29/test_poisson3d.m

* NB I have 4Gb of RAM in my machine... you may need to reduce the domain sizes (m,n and p in the test scripts) to run without getting out-of-memory errors.

Thank you in advance (and merry Christmas!)

Tom Clark

Subject: Sparsity pattern problems solving x = A\b

From: Pat Quillen

Date: 23 Dec, 2008 01:40:05

Message: 2 of 6

"Thomas Clark" <t.clark@remove.spamcantab.net> wrote in message <giob75$eei$1@fred.mathworks.com>...
<SNIP---for brevity>
> Can anyone help me to understand why the slight change in sparsity pattern makes so much difference please?
>
> If anyone can suggest a different method to backslash, which uses less memory during the solution, I'd be very grateful... I really don't want to have to implement a grid based method (trying to keep my code as simple as possible).
<SNIP---END>

Hi Tom.

For starters, it appears that your matrix is not s.p.d. Instead, it looks to be negative definite. So, if you try to solve Ax = b via backslash, MATLAB will not be using the Cholesky factorization as you might hope, but instead, it will use a symmetric indefinite solver which is just a shade slower. If instead, you try to solve -Ax = -b, then you can use the MATLAB function pcg with -A, as that matrix is s.p.d. pcg will use *much* less memory than backslash and you can get just as much accuracy as you like. Try this:

% Get coefficient matrix and rhs from poisson3d. Call them A, b respectively.
% No need to copy and negate A; use anonymous function to negate A*x.
>> f = @(x) -(A*x);
>> tic; x = pcg(f,-b,1e-11,1000); toc % See help pcg for more info.
pcg converged at iteration 265 to a solution with relative residual 9.8e-12
Elapsed time is 10.863882 seconds.
>> norm(A*x-b)

ans =

   2.5098e-09

>> tic; y = (-A)\(-b); toc % Compare to backslash...
 Elapsed time is 63.965740 seconds.

This is exactly the matrix which pcg loves... If you use a decent preconditioner, you can get the solution more quickly, but you have to construct the preconditioner. Under normal circumstances, I'd recommend the zero-fill incomplete Cholesky decomposition, but the current one in MATLAB is not as optimal as it could be. Instead, I point you to the ilu function (incomplete LU factorization). You can use this instead. This is sort of cheating because in theory, one should only use s.p.d. preconditioners with pcg, but in practice pcg tends to not notice.

>> tic; [L,U] = ilu(A,struct('type','nofill')); toc
Elapsed time is 0.059273 seconds.
>> % Since we're going to apply pcg to -A, need -L (or -U, but not both).
>> tic; x = pcg(f,-b,1e-11,1000,-L,U); toc
pcg converged at iteration 94 to a solution with relative residual 8.5e-12
Elapsed time is 6.227631 seconds.
>> % Notice, with this preconditioner, pcg took about 1/3 the iterations and
>> % about 2/3 the time. Try the modified ilu:
>> tic; [L,U] = ilu(A,struct('type','nofill','milu','row')); toc
Elapsed time is 0.194986 seconds.
>> tic; x = pcg(f,-b,1e-11,1000,-L, U); toc
pcg converged at iteration 64 to a solution with relative residual 7.7e-12
Elapsed time is 4.116877 seconds.

The MILU is a bit more expensive to cook up, but it pays off as pcg with the MILU preconditioner uses a lot fewer iterations and about 40% of the time. Of course, this is about 1/16 the time of \, but with slightly less accuracy. Try asking pcg for a smaller residual norm:
>> tic; x = pcg(f,-b,1e-13,1000,-L, U); toc
pcg stopped at iteration 91 without converging to the desired tolerance 1e-13
because the method stagnated.
The iterate returned (number 77) has relative residual 2.2e-13
Elapsed time is 5.829291 seconds.
>> norm(A*x - b)

ans =

   5.5067e-11

>> norm(A*y - b) % y = (-A)\(-b);

ans =

   4.2224e-11

These two are equivalently good in terms of residual norm, and x was much cheaper in terms of time and memory to come by. The bit about stagnation basically tells us that this is as good as pcg can do. If you would like to see the residual norm histories, try this:
>> [x,fl,rr,it,rv] = pcg(f,-b,1e-13,1000,-L, U);
>> semilogy(rv);

Now, to actually attempt an answer to your question, I believe that AMD, the default method for reordering sparse matrices within MATLAB doesn't give as high quality orderings for matrices arising from 3d geometries as it does for those arising from 2d geometries. Essentially the change in sparsity pattern caused much more fill to occur in the factors of your matrix. Tim Davis is the guy to really answer your question, and I'm sure he'll be along soon. Also, part of the issue is that the symmetric indefinite solver uses more memory than the Cholesky solver. Solving -Ax = -b required about 400MB less on my machine:

>> tic; A\b; toc
Elapsed time is 118.416031 seconds.
>> % Approximate peak memory usage ~2.05 GB
>> tic; (-A)\(-b); toc
Elapsed time is 64.046205 seconds.
>> % Approximate peak memory usage ~1.65 GB

Note that all of these timings are from MATLAB 7.7 (R2008b) on a 64-bit Linux machine.

Take care.
Pat.

Subject: Sparsity pattern problems solving x = A\b

From: Thomas Clark

Date: 23 Dec, 2008 10:51:02

Message: 3 of 6

Pat,

That really is something very special indeed! I had anticipated a decrease in speed in exchange for a memory reduction. Instead, you've given me a huge factor of improvement in speed; (58^3 domain compute time down to ~6s from ~170s on my machine) and substantially reduced the memory requirement.

Problem solved!

Although, of course, a new problem has arisen - my glaring lack of knowledge about this sort of thing. I'll keep an eye on the newsreader, and start learning about the topic from other people's posts.

Tim, if you do come across this, I'd appreciate your comments on the problem.

Thanks so much for your time, Pat, I hope you have a good Christmas (if Christmas is your thing!)

Kind Regards

Tom Clark

Subject: Sparsity pattern problems solving x = A\b

From: Tim Davis

Date: 24 Dec, 2008 14:17:04

Message: 4 of 6

A lengthy note with specific issues, so I'm in-posting, below:

"Thomas Clark" <t.clark@remove.spamcantab.net> wrote in message <giob75$eei$1@fred.mathworks.com>...
> Hi all,
>
> I'm solving the Poisson equation (with simple boundary conditions) in both 2D and 3D, for relatively small domains. This boils down to constructing a highly sparse coefficients matrix, then I use the backslash operator to solve.
>
> 2D and 3D problems produce very similar sparsity patterns (tridiagonal with fringes). I've
posted jpg images of these at:

No, the pictures may look similar but the structure is vastly different.
3D meshes are a LOT harder.

> www2.eng.cam.ac.uk/~thc29/sparsity_2d_4x4.jpg
> and
> www2.eng.cam.ac.uk/~thc29/sparsity_3d_3x3x3.jpg
>
> With the 2D problem, I can solve a very large domain - the (square) coefficients matrix can have up to 14.3m nonzeros in 2.8m rows before I get an out-of-memory error from mldivide, which is fine.

From my book, on the nested dissection ordering:

If applied to a 2D s-by-s mesh with node separators selected along a mesh
line that most evenly divides the graph, nested dissection leads to an asymptotically
optimal ordering, with 31(n log2 n)/8 + O(n) nonzeros in L, and requiring
829(n^(3/2))/84+O(n log n) floating-point operations to compute, where n = s^2 is the
dimension of the matrix. For a 3D s-by-s-by-s mesh the dimension of the matrix is
n = s^3. There are O(n^(4/3)) nonzeros in L, and O(n^2) floating-point operations are
required to compute L, which is also asymptotically optimal.
>
> However, mldivide with the 3D sparsity pattern requires a great deal more memory - with my max capacity I can only solve a coefficients matrix with 1.2m nonzeros in 0.19m rows.
>
> Can anyone help me to understand why the slight change in sparsity pattern makes so much difference please?

See above. The patterns are very different.

>
> If anyone can suggest a different method to backslash, which uses less memory during the solution, I'd be very grateful... I really don't want to have to implement a grid based method (trying to keep my code as simple as possible).

You need to use nested dissection, not AMD. AMD is fine for 2D problems, giving about the same results as nested dissection. But it's much worse for large 3D problems. Backslash does not have nested dissection, just AMD.

Download SuiteSparse, and use the cholmod solver. It's what's used in backslash, but it also includes an interface to METIS (you need to download it seperately). You'll find that it gives much better results than x=A\b. MATLAB really needs a good nested dissection ordering. I've got one in the works (for my own research, not for The MathWorks) but it's far from being ready.

>
> My codes (for 2D and 3D solutions) are at:
> www2.eng.cam.ac.uk/~thc29/poisson_2d.m
> and
> www2.eng.cam.ac.uk/~thc29/poisson_3d.m
>
> Scripts to create artificial inputs and run the solvers* are at:
> www2.eng.cam.ac.uk/~thc29/test_poisson2d.m
> and
> www2.eng.cam.ac.uk/~thc29/test_poisson3d.m
>
> * NB I have 4Gb of RAM in my machine... you may need to reduce the domain sizes (m,n and p in the test scripts) to run without getting out-of-memory errors.
>
> Thank you in advance (and merry Christmas!)
>
> Tom Clark

Subject: Sparsity pattern problems solving x = A\b

From: Tim Davis

Date: 24 Dec, 2008 14:27:02

Message: 5 of 6

Thanks for the details, Pat. So if you want to use backslash, x=(-A)\(-b) would be the thing to use. Or x=cholmod(-A,-b) if I remember the syntax (and be sure to compile CHOLMOD with METIS).

I'm travelling just now, so I might not be able to check the matrix, but if I can I'll let you know what CHOLMOD+METIS can do for it.

Comment to Tom: large 3D problems are poster-children for PCG and the like. It would be interesting to see what CHOLMOD+METIS can do, though. I'll give it a crack.
And may I add them to my collection (http://www.cise.ufl.edu/research/sparse)?

"Pat Quillen" <pquillen@mathworks.com> wrote in message <gipfhl$f60$1@fred.mathworks.com>...
> "Thomas Clark" <t.clark@remove.spamcantab.net> wrote in message <giob75$eei$1@fred.mathworks.com>...
> <SNIP---for brevity>
> > Can anyone help me to understand why the slight change in sparsity pattern makes so much difference please?
> >
> > If anyone can suggest a different method to backslash, which uses less memory during the solution, I'd be very grateful... I really don't want to have to implement a grid based method (trying to keep my code as simple as possible).
> <SNIP---END>
>
> Hi Tom.
>
> For starters, it appears that your matrix is not s.p.d. Instead, it looks to be negative definite. So, if you try to solve Ax = b via backslash, MATLAB will not be using the Cholesky factorization as you might hope, but instead, it will use a symmetric indefinite solver which is just a shade slower. If instead, you try to solve -Ax = -b, then you can use the MATLAB function pcg with -A, as that matrix is s.p.d. pcg will use *much* less memory than backslash and you can get just as much accuracy as you like. Try this:
>
> % Get coefficient matrix and rhs from poisson3d. Call them A, b respectively.
> % No need to copy and negate A; use anonymous function to negate A*x.
> >> f = @(x) -(A*x);
> >> tic; x = pcg(f,-b,1e-11,1000); toc % See help pcg for more info.
> pcg converged at iteration 265 to a solution with relative residual 9.8e-12
> Elapsed time is 10.863882 seconds.
> >> norm(A*x-b)
>
> ans =
>
> 2.5098e-09
>
> >> tic; y = (-A)\(-b); toc % Compare to backslash...
> Elapsed time is 63.965740 seconds.
>
> This is exactly the matrix which pcg loves... If you use a decent preconditioner, you can get the solution more quickly, but you have to construct the preconditioner. Under normal circumstances, I'd recommend the zero-fill incomplete Cholesky decomposition, but the current one in MATLAB is not as optimal as it could be. Instead, I point you to the ilu function (incomplete LU factorization). You can use this instead. This is sort of cheating because in theory, one should only use s.p.d. preconditioners with pcg, but in practice pcg tends to not notice.
>
> >> tic; [L,U] = ilu(A,struct('type','nofill')); toc
> Elapsed time is 0.059273 seconds.
> >> % Since we're going to apply pcg to -A, need -L (or -U, but not both).
> >> tic; x = pcg(f,-b,1e-11,1000,-L,U); toc
> pcg converged at iteration 94 to a solution with relative residual 8.5e-12
> Elapsed time is 6.227631 seconds.
> >> % Notice, with this preconditioner, pcg took about 1/3 the iterations and
> >> % about 2/3 the time. Try the modified ilu:
> >> tic; [L,U] = ilu(A,struct('type','nofill','milu','row')); toc
> Elapsed time is 0.194986 seconds.
> >> tic; x = pcg(f,-b,1e-11,1000,-L, U); toc
> pcg converged at iteration 64 to a solution with relative residual 7.7e-12
> Elapsed time is 4.116877 seconds.
>
> The MILU is a bit more expensive to cook up, but it pays off as pcg with the MILU preconditioner uses a lot fewer iterations and about 40% of the time. Of course, this is about 1/16 the time of \, but with slightly less accuracy. Try asking pcg for a smaller residual norm:
> >> tic; x = pcg(f,-b,1e-13,1000,-L, U); toc
> pcg stopped at iteration 91 without converging to the desired tolerance 1e-13
> because the method stagnated.
> The iterate returned (number 77) has relative residual 2.2e-13
> Elapsed time is 5.829291 seconds.
> >> norm(A*x - b)
>
> ans =
>
> 5.5067e-11
>
> >> norm(A*y - b) % y = (-A)\(-b);
>
> ans =
>
> 4.2224e-11
>
> These two are equivalently good in terms of residual norm, and x was much cheaper in terms of time and memory to come by. The bit about stagnation basically tells us that this is as good as pcg can do. If you would like to see the residual norm histories, try this:
> >> [x,fl,rr,it,rv] = pcg(f,-b,1e-13,1000,-L, U);
> >> semilogy(rv);
>
> Now, to actually attempt an answer to your question, I believe that AMD, the default method for reordering sparse matrices within MATLAB doesn't give as high quality orderings for matrices arising from 3d geometries as it does for those arising from 2d geometries. Essentially the change in sparsity pattern caused much more fill to occur in the factors of your matrix. Tim Davis is the guy to really answer your question, and I'm sure he'll be along soon. Also, part of the issue is that the symmetric indefinite solver uses more memory than the Cholesky solver. Solving -Ax = -b required about 400MB less on my machine:

> >> tic; A\b; toc
> Elapsed time is 118.416031 seconds.
> >> % Approximate peak memory usage ~2.05 GB
> >> tic; (-A)\(-b); toc
> Elapsed time is 64.046205 seconds.
> >> % Approximate peak memory usage ~1.65 GB
>
> Note that all of these timings are from MATLAB 7.7 (R2008b) on a 64-bit Linux machine.
>
> Take care.
> Pat.

Subject: Sparsity pattern problems solving x = A\b

From: Thomas Clark

Date: 24 Dec, 2008 15:11:02

Message: 6 of 6

Tim,

Thanks for all your comments - they've really helped me understand the problem a bit better; and I'll give cholmod+metis a try (I already tried suitesparse, but wasn't using the correct solver).

You're very welcome to add the matrices to your sparse collection; they're probably quite useful as reference, being generic poisson solvers.

Kind regards, and safe travelling

Tom Clark


"Tim Davis" <davis@cise.ufl.edu> wrote in message <gitgrl$lq5$1@fred.mathworks.com>...
> Thanks for the details, Pat. So if you want to use backslash, x=(-A)\(-b) would be the thing to use. Or x=cholmod(-A,-b) if I remember the syntax (and be sure to compile CHOLMOD with METIS).
>
> I'm travelling just now, so I might not be able to check the matrix, but if I can I'll let you know what CHOLMOD+METIS can do for it.
>
> Comment to Tom: large 3D problems are poster-children for PCG and the like. It would be interesting to see what CHOLMOD+METIS can do, though. I'll give it a crack.
> And may I add them to my collection (http://www.cise.ufl.edu/research/sparse)?
>
> "Pat Quillen" <pquillen@mathworks.com> wrote in message <gipfhl$f60$1@fred.mathworks.com>...
> > "Thomas Clark" <t.clark@remove.spamcantab.net> wrote in message <giob75$eei$1@fred.mathworks.com>...
> > <SNIP---for brevity>
> > > Can anyone help me to understand why the slight change in sparsity pattern makes so much difference please?
> > >
> > > If anyone can suggest a different method to backslash, which uses less memory during the solution, I'd be very grateful... I really don't want to have to implement a grid based method (trying to keep my code as simple as possible).
> > <SNIP---END>
> >
> > Hi Tom.
> >
> > For starters, it appears that your matrix is not s.p.d. Instead, it looks to be negative definite. So, if you try to solve Ax = b via backslash, MATLAB will not be using the Cholesky factorization as you might hope, but instead, it will use a symmetric indefinite solver which is just a shade slower. If instead, you try to solve -Ax = -b, then you can use the MATLAB function pcg with -A, as that matrix is s.p.d. pcg will use *much* less memory than backslash and you can get just as much accuracy as you like. Try this:
> >
> > % Get coefficient matrix and rhs from poisson3d. Call them A, b respectively.
> > % No need to copy and negate A; use anonymous function to negate A*x.
> > >> f = @(x) -(A*x);
> > >> tic; x = pcg(f,-b,1e-11,1000); toc % See help pcg for more info.
> > pcg converged at iteration 265 to a solution with relative residual 9.8e-12
> > Elapsed time is 10.863882 seconds.
> > >> norm(A*x-b)
> >
> > ans =
> >
> > 2.5098e-09
> >
> > >> tic; y = (-A)\(-b); toc % Compare to backslash...
> > Elapsed time is 63.965740 seconds.
> >
> > This is exactly the matrix which pcg loves... If you use a decent preconditioner, you can get the solution more quickly, but you have to construct the preconditioner. Under normal circumstances, I'd recommend the zero-fill incomplete Cholesky decomposition, but the current one in MATLAB is not as optimal as it could be. Instead, I point you to the ilu function (incomplete LU factorization). You can use this instead. This is sort of cheating because in theory, one should only use s.p.d. preconditioners with pcg, but in practice pcg tends to not notice.
> >
> > >> tic; [L,U] = ilu(A,struct('type','nofill')); toc
> > Elapsed time is 0.059273 seconds.
> > >> % Since we're going to apply pcg to -A, need -L (or -U, but not both).
> > >> tic; x = pcg(f,-b,1e-11,1000,-L,U); toc
> > pcg converged at iteration 94 to a solution with relative residual 8.5e-12
> > Elapsed time is 6.227631 seconds.
> > >> % Notice, with this preconditioner, pcg took about 1/3 the iterations and
> > >> % about 2/3 the time. Try the modified ilu:
> > >> tic; [L,U] = ilu(A,struct('type','nofill','milu','row')); toc
> > Elapsed time is 0.194986 seconds.
> > >> tic; x = pcg(f,-b,1e-11,1000,-L, U); toc
> > pcg converged at iteration 64 to a solution with relative residual 7.7e-12
> > Elapsed time is 4.116877 seconds.
> >
> > The MILU is a bit more expensive to cook up, but it pays off as pcg with the MILU preconditioner uses a lot fewer iterations and about 40% of the time. Of course, this is about 1/16 the time of \, but with slightly less accuracy. Try asking pcg for a smaller residual norm:
> > >> tic; x = pcg(f,-b,1e-13,1000,-L, U); toc
> > pcg stopped at iteration 91 without converging to the desired tolerance 1e-13
> > because the method stagnated.
> > The iterate returned (number 77) has relative residual 2.2e-13
> > Elapsed time is 5.829291 seconds.
> > >> norm(A*x - b)
> >
> > ans =
> >
> > 5.5067e-11
> >
> > >> norm(A*y - b) % y = (-A)\(-b);
> >
> > ans =
> >
> > 4.2224e-11
> >
> > These two are equivalently good in terms of residual norm, and x was much cheaper in terms of time and memory to come by. The bit about stagnation basically tells us that this is as good as pcg can do. If you would like to see the residual norm histories, try this:
> > >> [x,fl,rr,it,rv] = pcg(f,-b,1e-13,1000,-L, U);
> > >> semilogy(rv);
> >
> > Now, to actually attempt an answer to your question, I believe that AMD, the default method for reordering sparse matrices within MATLAB doesn't give as high quality orderings for matrices arising from 3d geometries as it does for those arising from 2d geometries. Essentially the change in sparsity pattern caused much more fill to occur in the factors of your matrix. Tim Davis is the guy to really answer your question, and I'm sure he'll be along soon. Also, part of the issue is that the symmetric indefinite solver uses more memory than the Cholesky solver. Solving -Ax = -b required about 400MB less on my machine:
>
> > >> tic; A\b; toc
> > Elapsed time is 118.416031 seconds.
> > >> % Approximate peak memory usage ~2.05 GB
> > >> tic; (-A)\(-b); toc
> > Elapsed time is 64.046205 seconds.
> > >> % Approximate peak memory usage ~1.65 GB
> >
> > Note that all of these timings are from MATLAB 7.7 (R2008b) on a 64-bit Linux machine.
> >
> > Take care.
> > Pat.

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us