Sparse Matrix Operations

Efficiency of Operations

Computational Complexity

The computational complexity of sparse operations is proportional to nnz, the number of nonzero elements in the matrix. Computational complexity also depends linearly on the row size m and column size n of the matrix, but is independent of the product m*n, the total number of zero and nonzero elements.

The complexity of fairly complicated operations, such as the solution of sparse linear equations, involves factors like ordering and fill-in, which are discussed in the previous section. In general, however, the computer time required for a sparse matrix operation is proportional to the number of arithmetic operations on nonzero quantities.

Algorithmic Details

Sparse matrices propagate through computations according to these rules:

Permutations and Reordering

A permutation of the rows and columns of a sparse matrix S can be represented in two ways:

For example, the statements

p = [1 3 4 2 5]
I = eye(5,5);
P = I(p,:);
e = ones(4,1);
S = diag(11:11:55) + diag(e,1) + diag(e,-1)

produce:

p =

     1     3     4     2     5

P =

     1     0     0     0     0
     0     0     1     0     0
     0     0     0     1     0
     0     1     0     0     0
     0     0     0     0     1

S =

    11     1     0     0     0
     1    22     1     0     0
     0     1    33     1     0
     0     0     1    44     1
     0     0     0     1    55

You can now try some permutations using the permutation vector p and the permutation matrix P. For example, the statements S(p,:) and P*S produce

ans =

    11     1     0     0     0
     0     1    33     1     0
     0     0     1    44     1
     1    22     1     0     0
     0     0     0     1    55

Similarly, S(:,p) and S*P' produce

ans =

    11     0     0     1     0
     1     1     0    22     0
     0    33     1     1     0
     0     1    44     0     1
     0     0     1     0    55

If P is a sparse matrix, then both representations use storage proportional to n and you can apply either to S in time proportional to nnz(S). The vector representation is slightly more compact and efficient, so the various sparse matrix permutation routines all return full row vectors with the exception of the pivoting permutation in LU (triangular) factorization, which returns a matrix compatible with the full LU factorization.

To convert between the two representations, let I = speye(n) be an identity matrix of the appropriate size. Then,

P = I(p,:)
P' = I(:,p)
p = (1:n)*P'
p = (P*(1:n)')'

The inverse of P is simply R = P'. You can compute the inverse of p with r(p) = 1:n.

r(p) = 1:5

r =

     1     4     2     3     5

Reordering for Sparsity

Reordering the columns of a matrix can often make its LU or QR factors sparser. Reordering the rows and columns can often make its Cholesky factors sparser. The simplest such reordering is to sort the columns by nonzero count. This is sometimes a good reordering for matrices with very irregular structures, especially if there is great variation in the nonzero counts of rows or columns.

The function p = colperm(S) computes this column-count permutation. The colperm M-file has only a single line.

[ignore,p] = sort(sum(spones(S)));

This line performs these steps:

  1. The inner call to spones creates a sparse matrix with ones at the location of every nonzero element in S.

  2. The sum function sums down the columns of the matrix, producing a vector that contains the count of nonzeros in each column.

  3. full converts this vector to full storage format.

  4. sort sorts the values in ascending order. The second output argument from sort is the permutation that sorts this vector.

Reordering to Reduce Bandwidth

The reverse Cuthill-McKee ordering is intended to reduce the profile or bandwidth of the matrix. It is not guaranteed to find the smallest possible bandwidth, but it usually does. The function symrcm(A) actually operates on the nonzero structure of the symmetric matrix A + A', but the result is also useful for asymmetric matrices. This ordering is useful for matrices that come from one-dimensional problems or problems that are in some sense "long and thin."

Approximate Minimum Degree Ordering

The degree of a node in a graph is the number of connections to that node. This is the same as the number of off-diagonal nonzero elements in the corresponding row of the adjacency matrix. The approximate minimum degree algorithm generates an ordering based on how these degrees are altered during Gaussian elimination or Cholesky factorization. It is a complicated and powerful algorithm that usually leads to sparser factors than most other orderings, including column count and reverse Cuthill-McKee. Because the keeping track of the degree of each node is very time-consuming, the approximate minimum degree algorithm uses an approximation to the degree, rather than the exact degree.

The following MATLAB® functions implement the approximate minimum degree algorithm:

See Reordering and Factorization for an example using symamd.

You can change various parameters associated with details of the algorithms using the spparms function.

For details on the algorithms used by colamd and symamd, see [5]. The approximate degree the algorithms use is based on [1].

Factoring Sparse Matrices

LU Factorization

If S is a sparse matrix, the following command returns three sparse matrices L, U, and P such that P*S = L*U.

[L,U,P] = lu(S)

lu obtains the factors by Gaussian elimination with partial pivoting. The permutation matrix P has only n nonzero elements. As with dense matrices, the statement [L,U] = lu(S) returns a permuted unit lower triangular matrix and an upper triangular matrix whose product is S. By itself, lu(S) returns L and U in a single matrix without the pivot information.

The three-output syntax

[L,U,P] = lu(S)

selects P via numerical partial pivoting, but does not pivot to improve sparsity in the LU factors. On the other hand, the four-output syntax

[L,U,P,Q]=lu(S) 

selects P via threshold partial pivoting, and selects P and Q to improve sparsity in the LU factors.

You can control pivoting in sparse matrices using

lu(S,thresh)

where thresh is a pivot threshold in [0,1]. Pivoting occurs when the diagonal entry in a column has magnitude less than thresh times the magnitude of any sub-diagonal entry in that column. thresh = 0 forces diagonal pivoting. thresh = 1 is the default. (The default for thresh is 0.1 for the four-output syntax).

When you call lu with three or less outputs, MATLAB automatically allocates the memory necessary to hold the sparse L and U factors during the factorization. Except for the four-output syntax, MATLAB does not use any symbolic LU prefactorization to determine the memory requirements and set up the data structures in advance.

Reordering and Factorization.   If you obtain a good column permutation p that reduces fill-in, perhaps from symrcm or colamd, then computing lu(S(:,p)) takes less time and storage than computing lu(S). Two permutations are the symmetric reverse Cuthill-McKee ordering and the symmetric approximate minimum degree ordering.

r = symrcm(B);
m = symamd(B);

The three spy plots produced by the lines below show the three adjacency matrices of the Bucky Ball graph with these three different numberings. The local, pentagon-based structure of the original numbering is not evident in the other three.

spy(B)
spy(B(r,r))
spy(B(m,m))

three spy plots  show the adjacency matrices of the Bucky Ball graph with three different numberings

The reverse Cuthill-McKee ordering, r, reduces the bandwidth and concentrates all the nonzero elements near the diagonal. The approximate minimum degree ordering, m, produces a fractal-like structure with large blocks of zeros.

To see the fill-in generated in the LU factorization of the Bucky ball, use speye(n,n), the sparse identity matrix, to insert -3s on the diagonal of B.

B = B - 3*speye(n,n);

Since each row sum is now zero, this new B is actually singular, but it is still instructive to compute its LU factorization. When called with only one output argument, lu returns the two triangular factors, L and U, in a single sparse matrix. The number of nonzeros in that matrix is a measure of the time and storage required to solve linear systems involving B. Here are the nonzero counts for the three permutations being considered.

Original

lu(B)

1022

Reverse Cuthill-McKee

lu(B(r,r))

968

Approximate minimum degree

lu(B(m,m))

636

Even though this is a small example, the results are typical. The original numbering scheme leads to the most fill-in. The fill-in for the reverse Cuthill-McKee ordering is concentrated within the band, but it is almost as extensive as the first two orderings. For the approximate minimum degree ordering, the relatively large blocks of zeros are preserved during the elimination and the amount of fill-in is significantly less than that generated by the other orderings. The spy plots below reflect the characteristics of each reordering.

three spy plots showing the characteristics of each reordering

Cholesky Factorization

If S is a symmetric (or Hermitian), positive definite, sparse matrix, the statement below returns a sparse, upper triangular matrix R so that R'*R = S.

R = chol(S)

chol does not automatically pivot for sparsity, but you can compute approximate minimum degree and profile limiting permutations for use with chol(S(p,p)).

Since the Cholesky algorithm does not use pivoting for sparsity and does not require pivoting for numerical stability, chol does a quick calculation of the amount of memory required and allocates all the memory at the start of the factorization. You can use symbfact, which uses the same algorithm as chol, to calculate how much memory is allocated.

QR Factorization

MATLAB computes the complete QR factorization of a sparse matrix S with

 [Q,R] = qr(S)

but this is usually impractical. The orthogonal matrix Q often fails to have a high proportion of zero elements. A more practical alternative, sometimes known as "the Q-less QR factorization," is available.

With one sparse input argument and one output argument

R = qr(S)

returns just the upper triangular portion of the QR factorization. The matrix R provides a Cholesky factorization for the matrix associated with the normal equations:

R'*R = S'*S

However, the loss of numerical information inherent in the computation of S'*S is avoided.

With two input arguments having the same number of rows, and two output arguments, the statement

[C,R] = qr(S,B)

applies the orthogonal transformations to B, producing C = Q'*B without computing Q.

The Q-less QR factorization allows the solution of sparse least squares problems

minimize || A times x minus b ||

with two steps

[c,R] = qr(A,b)
x = R\c

If A is sparse, but not square, MATLAB uses these steps for the linear equation solving backslash operator:

x = A\b

Or, you can do the factorization yourself and examine R for rank deficiency.

It is also possible to solve a sequence of least squares linear systems with different right-hand sides, b, that are not necessarily known when R = qr(A) is computed. The approach solves the "semi-normal equations"

R'*R*x = A'*b

with

x = R\(R'\(A'*b))

and then employs one step of iterative refinement to reduce round off error:

r = b - A*x
e = R\(R'\(A'*r))
x = x + e

Incomplete Factorizations

The luinc, ilu, and cholinc functions provide approximate, incomplete factorizations, which are useful as preconditioners for sparse iterative methods.

The luinc function produces two different kinds of incomplete LU (ILU) factorizations, one involving a drop tolerance and one involving fill-in level. The ilu function produces three incomplete LU factorizations, the ILU with level 0 fill-in (ILU(0)), the Crout version of ILU (ILUC), and the ILU with threshold and pivoting (ILUTP).

For example:

A = gallery('neumann', 1600) + speye(1600);
nnz(A)
ans =
        7840

nnz(lu(A))
ans =
      126478

shows that A has 7840 nonzeros, and its complete LU factorization has 126478 nonzeros:

nnz(luinc(A,'0'))
ans =
        7840

setup.type = 'nofill';
nnz(ilu(A,setup))
ans =
        7840

nnz(luinc(A,1e-6))
ans =
       51541

setup.type = 'ilutp';
setup.droptol = 1e-6;
nnz(luinc(A,setup))
ans =
       51541

These calculations, with both the luinc and ilu functions, show that with level 0 fill-in it has 7840 nonzeros, and with a drop tolerance of 1e-6 it has 51541 nonzeros. See the luinc and ilu reference pages for more options and details.

The cholinc function provides drop tolerance and level 0 fill-in Cholesky factorizations of symmetric, positive definite sparse matrices. See the cholinc reference page for more information.

Systems of Linear Equations

There are two different classes of methods for solving systems of simultaneous linear equations:

Direct Methods

Direct methods are usually faster and more generally applicable than indirect methods, if there is enough storage available to carry them out. Iterative methods are usually applicable to restricted cases of equations and depend on properties like diagonal dominance or the existence of an underlying differential operator. Direct methods are implemented in the core of the MATLAB software and are made as efficient as possible for general classes of matrices. Iterative methods are usually implemented in MATLAB M-files and can use the direct solution of subproblems or preconditioners.

Using a Different Preordering.   If A is not diagonal, banded, triangular, or a permutation of a triangular matrix, backslash (\) reorders the indices of A to reduce the amount of fill-in—that is, the number of nonzero entries that are added to the sparse factorization matrices. The new ordering, called a preordering, is performed before the factorization of A. In some cases, you might be able to provide a better preordering than the one used by the backslash algorithm.

To use a different preordering, first turn off both of the automatic preorderings that backslash might perform by default, using the function spparms as follows:

spparms('autoamd',0);
spparms('autommd',0);

Now, assuming you have created a permutation vector p that specifies a preordering of the indices of A, apply backslash to the matrix A(:,p), whose columns are the columns of A, permuted according to the vector p.

x = A (:,p) \ b;
x(p) = x;
spparms('autoamd',1);
spparms('autommd',1);

The commands spparms('autoamd',1) and spparms('autommd',1) turns the automatic preordering back on, in case you use A\b later without specifying an appropriate preordering.

Iterative Methods

Nine functions are available that implement iterative methods for sparse systems of simultaneous linear systems.

Functions for Iterative Methods for Sparse Systems

Function

Method

bicg

Biconjugate gradient

bicgstab

Biconjugate gradient stabilized

cgs

Conjugate gradient squared

gmres

Generalized minimum residual

lsqr

Least squares

minres

Minimum residual

pcg

Preconditioned conjugate gradient

qmr

Quasiminimal residual

symmlq

Symmetric LQ

These methods are designed to solve Ax = b or minimize the norm of bAx. For the Preconditioned Conjugate Gradient method, pcg, A must be a symmetric, positive definite matrix. minres and symmlq can be used on symmetric indefinite matrices. For lsqr, the matrix need not be square. The other five can handle nonsymmetric, square matrices.

All nine methods can make use of preconditioners. The linear system

A times x = b

is replaced by the equivalent system

M to the minus 1 power times A times x = M to the minus 1 power times b

The preconditioner M is chosen to accelerate convergence of the iterative method. In many cases, the preconditioners occur naturally in the mathematical model. A partial differential equation with variable coefficients can be approximated by one with constant coefficients, for example. Incomplete matrix factorizations can be used in the absence of natural preconditioners.

The five-point finite difference approximation to Laplace's equation on a square, two-dimensional domain provides an example. The following statements use the preconditioned conjugate gradient method preconditioner M = R'*R, where R is the incomplete Cholesky factor of A.

A = delsq(numgrid('S',50));
b = ones(size(A,1),1);
tol = 1.e-3;
maxit = 10;
R = cholinc(A,tol);
[x,flag,err,iter,res] = pcg(A,b,tol,maxit,R',R);

Only four iterations are required to achieve the prescribed accuracy.

Background information on these iterative methods and incomplete factorizations is available in [2] and [7].

Eigenvalues and Singular Values

Two functions are available that compute a few specified eigenvalues or singular values. svds is based on eigs that uses ARPACK [6].

Functions to Compute a Few Eigenvalues or Singular Values

Function

Description

eigs

Few eigenvalues

svds

Few singular values

These functions are most frequently used with sparse matrices, but they can be used with full matrices or even with linear operators defined by M-files.

The statement

[V,lambda] = eigs(A,k,sigma)

finds the k eigenvalues and corresponding eigenvectors of the matrix A that are nearest the "shift" sigma. If sigma is omitted, the eigenvalues largest in magnitude are found. If sigma is zero, the eigenvalues smallest in magnitude are found. A second matrix, B, can be included for the generalized eigenvalue problem:

A times upsilon = lambda times B times upsilon

The statement

[U,S,V] = svds(A,k)

finds the k largest singular values of A and

[U,S,V] = svds(A,k,0)

finds the k smallest singular values.

For example, the statements

L = numgrid('L',65);
A = delsq(L);

set up the five-point Laplacian difference operator on a 65-by-65 grid in an L-shaped, two-dimensional domain. The statements

size(A)
nnz(A)

show that A is a matrix of order 2945 with 14,473 nonzero elements.

The statement

[v,d] = eigs(A,1,0);

computes the smallest eigenvalue and eigenvector. Finally,

L(L>0) = full(v(L(L>0)));
x = -1:1/32:1;
contour(x,x,L,15)
axis square

distributes the components of the eigenvector over the appropriate grid points and produces a contour plot of the result.

contour plot of the result

The numerical techniques used in eigs and svds are described in [6].

Performance Limitations

This section describes some limitations of the sparse matrix storage format and their impact on matrix creation, manipulation, and operations.

Creating Sparse Matrices

The best way to create a sparse matrix is to use the sparse function. If you do not have prior knowledge of the nonzero indices or their values, it is much more efficient to create the vectors containing these values, and then create the sparse matrix.

Preallocating the memory for a sparse matrix and filling it in an elementwise manner causes a significant amount of overhead in indexing into the sparse array:

S1 = spalloc(1000,1000,100000);
tic;
for n = 1:100000
    i = ceil(1000*rand(1,1));
    j = ceil(1000*rand(1,1));
    S1(i,j) = rand(1,1);
end
toc

Elapsed time is 26.281000 seconds.

Whereas constructing the vectors of indices and values eliminates the need to index into the sparse array, and thus is significantly faster:

i = ceil(1000*rand(100000,1));
j = ceil(1000*rand(100000,1));
v = zeros(size(i));
for n = 1:100000
    v(n) = rand(1,1);
end

tic;
S2 = sparse(i,j,v,1000,1000);
toc

Elapsed time is 0.078000 seconds.

Manipulating Sparse Matrices

Because sparse matrices are stored in a column-major format, accessing the matrix by columns is more efficient than by rows. Compare the time required for adding rows of a matrix 1000 times

S = sparse(10000,10000,1);
tic; 
for n = 1:1000
   A = S(100,:) + S(200,:);
end; 
toc

Elapsed time is 1.208162 seconds.

versus the time required for adding columns

S = sparse(10000,10000,1);
tic;
for n = 1:1000
   B = S(:,100) + S(:,200);
end;
toc

Elapsed time is 0.088747 seconds.

When possible, you can transpose the matrix, perform operations on the columns, and then retranspose the result:

S = sparse(10000,10000,1);
tic; 
for n = 1:1000
   A = S(100,:)' + S(200,:)';
   A = A';
end; 
toc

Elapsed time is 0.597142 seconds.

The time required to transpose the matrix is negligible. Note that the sparse matrix memory requirements could prevent you from transposing a sparse matrix having a large number of rows. This might occur even when the number of nonzero values is small.

Using linear indexing to access or assign an element in a large sparse matrix will fail if the linear index exceeds intmax. To access an element whose linear index is greater than intmax, use array indexing:

S = spalloc(216^2, 216^2, 2)
S(1) = 1
S(end) = 1
S(216^2,216^2) = 1
  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS