Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
size(sparse matrix) > size(full matrix)

Subject: size(sparse matrix) > size(full matrix)

From: arun

Date: 17 Sep, 2009 10:15:09

Message: 1 of 17

Hi,
I just discovered that my data set which is about 1200 * 39000 and
contains only values 0, 0.5 and 1, when stored as sparse has size of
760 MB and when stored as double has about 380 MB.
I agree, the number of zeros may not be a lot, which again might
suggest that using sparse might not be advantageous. However, why
should it result in this *explosion* of size???
if someone could clarify, it would be great.

best, arun.

Subject: size(sparse matrix) > size(full matrix)

From: Rune Allnor

Date: 17 Sep, 2009 10:31:36

Message: 2 of 17

On 17 Sep, 12:15, arun <aragorn1...@gmail.com> wrote:
> Hi,
> I just discovered that my data set which is about 1200 * 39000 and
> contains only values 0, 0.5 and 1, when stored as sparse has size of
> 760 MB and when stored as double has about 380 MB.
> I agree, the number of zeros may not be a lot, which again might
> suggest that using sparse might not be advantageous. However, why
> should it result in this *explosion* of size???

Sparse matrices store indexes of where a non-zero elemented
is located along with the actual value of the value. So there
is three times as much information stored per element in a
sparse matrix. This means that you need a certain large fraction
of zero elements (the exact numbers depend on implementation
details) before the total memory footprint of the sparse matrix
becomes smaller than the footprint of the same matrix stored
on dense format.

Rune

Subject: size(sparse matrix) > size(full matrix)

From: arun

Date: 17 Sep, 2009 10:49:56

Message: 3 of 17

On Sep 17, 12:31 pm, Rune Allnor <all...@tele.ntnu.no> wrote:
> On 17 Sep, 12:15, arun <aragorn1...@gmail.com> wrote:
>
> > Hi,
> > I just discovered that my data set which is about 1200 * 39000 and
> > contains only values 0, 0.5 and 1, when stored as sparse has size of
> > 760 MB and when stored as double has about 380 MB.
> > I agree, the number of zeros may not be a lot, which again might
> > suggest that using sparse might not be advantageous. However, why
> > should it result in this *explosion* of size???
>
> Sparse matrices store indexes of where a non-zero elemented
> is located along with the actual value of the value. So there
> is three times as much information stored per element in a
> sparse matrix. This means that you need a certain large fraction
> of zero elements (the exact numbers depend on implementation
> details) before the total memory footprint of the sparse matrix
> becomes smaller than the footprint of the same matrix stored
> on dense format.
>
> Rune

Understood. thank you very much.
I changed the data back to full matrices.
The code is even more faster now.
best, arun.

Subject: size(sparse matrix) > size(full matrix)

From: Bruno Luong

Date: 17 Sep, 2009 10:53:02

Message: 4 of 17

arun <aragorn168b@gmail.com> wrote in message <8e83f86c-df5e-413d-9943-da36c83a66d2@g1g2000vbr.googlegroups.com>...
> Hi,
> I just discovered that my data set which is about 1200 * 39000 and
> contains only values 0, 0.5 and 1, when stored as sparse has size of
> 760 MB and when stored as double has about 380 MB.
> I agree, the number of zeros may not be a lot, which again might
> suggest that using sparse might not be advantageous. However, why
> should it result in this *explosion* of size???
> if someone could clarify, it would be great.
>
> best, arun.

Roughly speaking, a *non-zero* sparse element needs
- double, 32-byte platform: 12 bytes
- double, 64-byte platform: 16 bytes
- logical, 32-byte platform: 5 bytes
- logical, 64-byte platform: 9 bytes

A full matrix needs per element
- double, 8 bytes (both platforms)
- logical, 1 byte (both platforms)

Do the math.

Bruno

Subject: size(sparse matrix) > size(full matrix)

From: Rune Allnor

Date: 17 Sep, 2009 10:55:00

Message: 5 of 17

On 17 Sep, 12:49, arun <aragorn1...@gmail.com> wrote:
> On Sep 17, 12:31 pm, Rune Allnor <all...@tele.ntnu.no> wrote:
>
>
>
>
>
> > On 17 Sep, 12:15, arun <aragorn1...@gmail.com> wrote:
>
> > > Hi,
> > > I just discovered that my data set which is about 1200 * 39000 and
> > > contains only values 0, 0.5 and 1, when stored as sparse has size of
> > > 760 MB and when stored as double has about 380 MB.
> > > I agree, the number of zeros may not be a lot, which again might
> > > suggest that using sparse might not be advantageous. However, why
> > > should it result in this *explosion* of size???
>
> > Sparse matrices store indexes of where a non-zero elemented
> > is located along with the actual value of the value. So there
> > is three times as much information stored per element in a
> > sparse matrix. This means that you need a certain large fraction
> > of zero elements (the exact numbers depend on implementation
> > details) before the total memory footprint of the sparse matrix
> > becomes smaller than the footprint of the same matrix stored
> > on dense format.
>
> > Rune
>
> Understood. thank you very much.
> I changed the data back to full matrices.
> The code is even more faster now.

Sparse matrices are not used to gain speed, but to save
space. The indivicual steps needed to access any element
in a sparse matrix might take several times as long as
for accessing elements in a dense matrix. However, for
large systems the sparse format might be necessary to be
able to get the job done at all.

Rune

Subject: size(sparse matrix) > size(full matrix)

From: Bruno Luong

Date: 17 Sep, 2009 11:10:03

Message: 6 of 17

Rune Allnor <allnor@tele.ntnu.no> wrote in message <a56978d9-99f2-4ff6-8073-a77ab61bb6d6@o21g2000vbl.googlegroups.com>...

>
> Sparse matrices are not used to gain speed, but to save
> space.

Wrong, many linear algebra algorithms are designed to gain speed with sparse matrix since Matrix-vector product is much faster than for full matrix. That lead to the class of iterative method (gradient conjugate, GMRES, Lanczos, etc...) that is suitable when working with sparse matrix. The full matrix is not even competitive, even for speed.

Bruno

Subject: size(sparse matrix) > size(full matrix)

From: arun

Date: 17 Sep, 2009 11:17:21

Message: 7 of 17

On Sep 17, 1:10 pm, "Bruno Luong" <b.lu...@fogale.findmycountry>
wrote:
> Rune Allnor <all...@tele.ntnu.no> wrote in message <a56978d9-99f2-4ff6-8073-a77ab61bb...@o21g2000vbl.googlegroups.com>...
>
> > Sparse matrices are not used to gain speed, but to save
> > space.
>
> Wrong, many linear algebra algorithms are designed to gain speed with sparse matrix since Matrix-vector product is much faster than for full matrix. That lead to the class of iterative method (gradient conjugate, GMRES, Lanczos, etc...) that is suitable when working with sparse matrix. The full matrix is not even competitive, even for speed.
>
> Bruno

Bruno,
Yes, its true, however I see that when operations involving sparse-
matrices result in a more-or-less full matrix (as in my case at some
intermediate point), it becomes slow. But as Rune pointed out, my data
set is quite huge 1200 * 312000. So I had to split them and I thought
I could also save more space by storing sparse as my calculations are
faster already and shouldn't take much *even if* sparse makes it
slow.

thank you,
best, arun.

Subject: size(sparse matrix) > size(full matrix)

From: Bruno Luong

Date: 17 Sep, 2009 12:20:21

Message: 8 of 17

You may consider to store 2*A as int8/uint8 arrays. But many operations will not supported. If that's problem, using SINGLE instead of DOUBLE.

Bruno

Subject: size(sparse matrix) > size(full matrix)

From: Steven Lord

Date: 17 Sep, 2009 13:22:01

Message: 9 of 17


"arun" <aragorn168b@gmail.com> wrote in message
news:3f28c1c1-af9b-458a-a802-25e66f1bc1bb@p15g2000vbl.googlegroups.com...
> On Sep 17, 1:10 pm, "Bruno Luong" <b.lu...@fogale.findmycountry>
wrote:
> > Rune Allnor <all...@tele.ntnu.no> wrote in message
> > <a56978d9-99f2-4ff6-8073-a77ab61bb...@o21g2000vbl.googlegroups.com>...
> >
> > > Sparse matrices are not used to gain speed, but to save
> > > space.
> >
> > Wrong, many linear algebra algorithms are designed to gain speed with
> > sparse matrix since Matrix-vector product is much faster than for full
> > matrix. That lead to the class of iterative method (gradient conjugate,
> > GMRES, Lanczos, etc...) that is suitable when working with sparse
> > matrix. The full matrix is not even competitive, even for speed.
> >
> > Bruno
>
> Bruno,
> Yes, its true, however I see that when operations involving sparse-
> matrices result in a more-or-less full matrix (as in my case at some
> intermediate point), it becomes slow. But as Rune pointed out, my data
> set is quite huge 1200 * 312000. So I had to split them and I thought
> I could also save more space by storing sparse as my calculations are
> faster already and shouldn't take much *even if* sparse makes it
> slow.

What types of operations are you doing on this sparse matrix? If you're
working along the rows, I recommend you store your matrix transposed and
work down the columns. The size in memory of a sparse matrix depends on the
number of columns in the matrix, not the number of rows, and due to the way
it's stored in memory retrieving a column of a sparse matrix is likely to be
faster and more efficient than retrieving a row.

--
Steve Lord
slord@mathworks.com
comp.soft-sys.matlab (CSSM) FAQ: http://matlabwiki.mathworks.com/MATLAB_FAQ

Subject: size(sparse matrix) > size(full matrix)

From: Sebastiaan

Date: 17 Sep, 2009 14:05:24

Message: 10 of 17

Dear Arun,

Another issue can be the dimensions of your matrix. But first I question the sparsity of your current matrix.

This is going to be a long reply, so try to keep up.


1) To be more precise, let me explain how matrices are stored exactly.

Matlab stores the matrix in the compressed column storage (CCS) format. This consists of 3 vectors:
IA:
column pointer
length: #columns + 1
type: unsigned long int (8 bytes) (or int on 32 bits machines, 4 bytes)

JA:
row pointer
length: #nonzero elements
type: unsingled long int (or int on 32 bit)

NA:
numerical values
length: #nonzero elements
type: double (8 bytes)

So, assume you have 100 000 nonzero's, the memory usage in your case would be:
(39000 + 1)* 8 + 100000 * 8 + 100000 * 8 = 1.8MiB

Whereas stored as a full matrix:
 1200 * 39000 * 8 = 357 MiB.

2) It looks like you have a matrix which is 100% sparse (i.e. no dedicated nonzeros). You say your sparse matrix uses 760MiB? Using the formula above backwards:
( 760MiB/8 - 39001 )/2 ~ 50e6 elements, equal to a full storage (1200*39000).

Are you sure that the 0-elements are not counted? What is:
nnz(A)
nnz(A~=0)
(nnz(A) for a sparse matrix directly returns the number of nonzero places, even is the elements are 0.).

3) Later, you mention a matrix size of 1200 * 312000. This is an inconvenient size considered the CCS scheme of Matlab. If it is possible, store it as 312000*1200.

Assume again 100 000 nonzeros.
1200*312000 uses (312 000 + 1)* 8 + 100000 * 8 + 100000 * 8 = 3.9MiB
312000*1200 uses (1200 + 1)* 8 + 100000 * 8 + 100000 * 8 = 1.5MiB

So, if there is a huge difference in dimensions of a matrix, make sure you have more rows than columns (as far as storage is concerned, that is).

Sebastiaan

Subject: size(sparse matrix) > size(full matrix)

From: arun

Date: 17 Sep, 2009 15:25:39

Message: 11 of 17

On Sep 17, 3:22 pm, "Steven Lord" <sl...@mathworks.com> wrote:
> "arun" <aragorn1...@gmail.com> wrote in message
>
> news:3f28c1c1-af9b-458a-a802-25e66f1bc1bb@p15g2000vbl.googlegroups.com...
>
>
>
> > On Sep 17, 1:10 pm, "Bruno Luong" <b.lu...@fogale.findmycountry>
> wrote:
> > > Rune Allnor <all...@tele.ntnu.no> wrote in message
> > > <a56978d9-99f2-4ff6-8073-a77ab61bb...@o21g2000vbl.googlegroups.com>...
>
> > > > Sparse matrices are not used to gain speed, but to save
> > > > space.
>
> > > Wrong, many linear algebra algorithms are designed to gain speed with
> > > sparse matrix since Matrix-vector product is much faster than for full
> > > matrix. That lead to the class of iterative method (gradient conjugate,
> > > GMRES, Lanczos, etc...) that is suitable when working with sparse
> > > matrix. The full matrix is not even competitive, even for speed.
>
> > > Bruno
>
> > Bruno,
> > Yes, its true, however I see that when operations involving sparse-
> > matrices result in a more-or-less full matrix (as in my case at some
> > intermediate point), it becomes slow. But as Rune pointed out, my data
> > set is quite huge 1200 * 312000. So I had to split them and I thought
> > I could also save more space by storing sparse as my calculations are
> > faster already and shouldn't take much *even if* sparse makes it
> > slow.
>
> What types of operations are you doing on this sparse matrix?  If you're
> working along the rows, I recommend you store your matrix transposed and
> work down the columns.  The size in memory of a sparse matrix depends on the
> number of columns in the matrix, not the number of rows, and due to the way
> it's stored in memory retrieving a column of a sparse matrix is likely to be
> faster and more efficient than retrieving a row.
>
> --
> Steve Lord
> sl...@mathworks.com
> comp.soft-sys.matlab (CSSM) FAQ:http://matlabwiki.mathworks.com/MATLAB_FAQ

Hi Steve,

1) My calculations:
Initially, from each of the column n*1 , I calculate a matrix A (n*n)
and then take the trace(A*B) where B is a constant pre-computed n*n
matrix.

However, with help from CSSM user Matt, I managed to vectorize the
code so that I calculate the trace of all columns without computing
for each separately using for-loops.

2) I already am working on columns. That is why I had to have 1200*x
as the size. I understand if I transpose the size of the matrix will
be very less when stored sparse, but I will have to transpose it again
to perform my trace computations!!

Hi Sebastiaan,

Thank you for your reply. It was extremely informative. I understand
the disadvantages of storing with huge number of columns. However, as
I explained above in (2) my operations are along the columns. So, the
best I could do is to store as a full-matrix... I guess.
Regarding the confusion between 1200*39000 and 1200*311200, My whole
data set is 1222*311398 to be precise. And I split them into 8 parts,
7 * 1222*38924) and the last as 1222*38930.
I load each of these from mat-files and do my computations. Just to
try, I transposed my 1222*38924 matrix and then created sparse array,
but it din't help much. Which suggests that it is not wise to store it
as a sparse array as it is *not really sparse*.

  Name Size Bytes Class Attributes

  gt 1222x38924 380521024 double
  gt1 1222x38924 761343080 double sparse
  gt2 38924x1222 761041464 double sparse

And this seems to be true. The problem is that, my data set has 0, 0.5
and 1. There are only 648 0's but 38423871 0.5's and 9140609 1's. I
guess sparse doesn't look for most-frequently occurring element and
stores the remaining... but just stores the non-zero values.. yes, it
is by definition.

thank you very much.
best, arun.

Subject: size(sparse matrix) > size(full matrix)

From: Tim Davis

Date: 18 Sep, 2009 02:24:03

Message: 12 of 17

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message

> > Sparse matrices are not used to gain speed, but to save
> > space.
>
> Wrong, many linear algebra algorithms are designed to gain speed with sparse matrix since Matrix-vector product is much faster than for full matrix. That lead to the class of iterative method (gradient conjugate, GMRES, Lanczos, etc...) that is suitable when working with sparse matrix. The full matrix is not even competitive, even for speed.
>
> Bruno

Bruno is correct. Sparse is for performance, both time and memory. Sure, there are many matrices with too many nonzeros to store as sparse. Jim Wilkinson's definition of a sparse matrix is one in which it pays to exploit the zeros to save time and memory (he may have said "time and/or memory", but I paraphrase anyway). Exploiting sparsity can reduce the time needed by many orders of magnitude, depending on the problem.

And in some cases, sparse is even faster than full on a completely nonzero matrix. Give this a try, and ponder the results. I ran each twice, to warm up MATLAB (dynamic loading of code). You need to use R2009b, the latest version, to get these results.

A = rand (2000,1000);
b = rand (2000,1);
tic ; x=A\b ; toc

Elapsed time is 3.069423 seconds.

tic ; x=A\b ; toc

Elapsed time is 2.724363 seconds.

tic ; x=sparse(A)\b ; toc

Elapsed time is 2.120395 seconds.

tic ; x=sparse(A)\b ; toc
Elapsed time is 2.047998 seconds.

sparse(A)\b gets even faster if you turn off its fill-reducing ordering, with

spparms('autoamd',0)
tic ; x=sparse(A)\b ; toc

Elapsed time is 1.910281 seconds.

And you still get the right answer:

x1=A\b ;
x2=sparse(A)\b ;
norm(x1-x2)/norm(x1)

ans =
   5.2512e-15

in fact, sparse is a teeny bit more accurate in this case, but the difference is small enough to ignore:

>> norm(A'*A*x1-A'*b)
ans =
   2.9924e-11
>> norm(A'*A*x2-A'*b)
ans =
   2.0417e-11

Subject: size(sparse matrix) > size(full matrix)

From: Sebastiaan

Date: 18 Sep, 2009 08:05:06

Message: 13 of 17

Hi,
>
> And this seems to be true. The problem is that, my data set has 0, 0.5
> and 1. There are only 648 0's but 38423871 0.5's and 9140609 1's. I
> guess sparse doesn't look for most-frequently occurring element and
> stores the remaining... but just stores the non-zero values.. yes, it
> is by definition.
>

You can try to scale/translate your data, i.e. subtract 0.5 from it. Depending on what you are doing, you might have to rescale the other data in your algorithm as well (and undo it afterwards of course).

Simplest alternative is Bruno's suggestion to store 2*A as uint8. This will reduce the memory to 1/8th. However, your operations should allow it.

What operation are you trying to do on the matrix? So far, you have only told us how you make it.

Subject: size(sparse matrix) > size(full matrix)

From: Derek O'Connor

Date: 18 Sep, 2009 13:24:01

Message: 14 of 17

"Tim Davis" <davis@cise.ufl.edu> wrote in message
 --- snip ---
> And in some cases, sparse is even faster than full on a completely nonzero matrix. Give this a try, and ponder the results. I ran each twice, to warm up MATLAB (dynamic loading of code). You need to use R2009b, the latest version, to get these results.
>
> A = rand (2000,1000);
> b = rand (2000,1);
> tic ; x=A\b ; toc
>
> Elapsed time is 3.069423 seconds.
>
> tic ; x=A\b ; toc
>
> Elapsed time is 2.724363 seconds.
>
> tic ; x=sparse(A)\b ; toc
>
> Elapsed time is 2.120395 seconds.
>
> tic ; x=sparse(A)\b ; toc
> Elapsed time is 2.047998 seconds.
>
> sparse(A)\b gets even faster if you turn off its fill-reducing ordering, with
>
> spparms('autoamd',0)
> tic ; x=sparse(A)\b ; toc
>
> Elapsed time is 1.910281 seconds.
>
> And you still get the right answer:
>
> x1=A\b ;
> x2=sparse(A)\b ;
> norm(x1-x2)/norm(x1)
>
> ans =
> 5.2512e-15
>
> in fact, sparse is a teeny bit more accurate in this case, but the difference is small enough to ignore:
>
> >> norm(A'*A*x1-A'*b)
> ans =
> 2.9924e-11
> >> norm(A'*A*x2-A'*b)
> ans =
> 2.0417e-11


Tim,

Why is R2009b so fast on x=sparse(A)\b? Has the built-in sparse matrix package been changed from previous versions? Can you tell us how this speed-up has been achieved?

When I use R2008a, A\b is three times faster than sparse(A)\b.

Subject: size(sparse matrix) > size(full matrix)

From: Steven Lord

Date: 18 Sep, 2009 13:30:43

Message: 15 of 17


"Derek O'Connor" <derekroconnor@eircom.net> wrote in message
news:h901lh$as$1@fred.mathworks.com...

*snip*

> Why is R2009b so fast on x=sparse(A)\b? Has the built-in sparse matrix
> package been changed from previous versions? Can you tell us how this
> speed-up has been achieved?

We've made some improvements to sparse rectangular MLDIVIDE (including
multithreading for sufficiently large matrices) as described in the release
notes:

http://www.mathworks.com/access/helpdesk/help/techdoc/rn/br5k34y-1.html

--
Steve Lord
slord@mathworks.com
comp.soft-sys.matlab (CSSM) FAQ: http://matlabwiki.mathworks.com/MATLAB_FAQ

Subject: size(sparse matrix) > size(full matrix)

From: Tim Davis

Date: 21 Sep, 2009 16:59:06

Message: 16 of 17

"Derek O'Connor" <derekroconnor@eircom.net> wrote in message

> Tim,
>
> Why is R2009b so fast on x=sparse(A)\b? Has the built-in sparse matrix package been changed from previous versions? Can you tell us how this speed-up has been achieved?
>
> When I use R2008a, A\b is three times faster than sparse(A)\b.

The new sparse QR uses a multifrontal method (SuiteSparseQR). For a sparse matrix that's all nonzero, it creates a single dense frontal matrix. I then use my own dense QR on this, based on LAPACK, but slightly faster than LAPACK. You can read the details here:

http://www.cise.ufl.edu/~davis/techreports/SPQR/spqr.pdf
http://www.cise.ufl.edu/~davis/techreports/SPQR/algo_spqr.pdf

Subject: size(sparse matrix) > size(full matrix)

From: Matt

Date: 21 Sep, 2009 19:49:04

Message: 17 of 17

arun <aragorn168b@gmail.com> wrote in message <c6a3a485-e8b4-459b-8a67-5ed0e5b16a22@h30g2000vbr.googlegroups.com>...

> > What types of operations are you doing on this sparse matrix? ?If you're
> > working along the rows, I recommend you store your matrix transposed and
> > work down the columns. ?The size in memory of a sparse matrix depends on the
> > number of columns in the matrix, not the number of rows, and due to the way
> > it's stored in memory retrieving a column of a sparse matrix is likely to be
> > faster and more efficient than retrieving a row.
> >
> > --
> > Steve Lord
> > sl...@mathworks.com
> > comp.soft-sys.matlab (CSSM) FAQ:http://matlabwiki.mathworks.com/MATLAB_FAQ
>
> Hi Steve,
>
> 1) My calculations:
> Initially, from each of the column n*1 , I calculate a matrix A (n*n)
> and then take the trace(A*B) where B is a constant pre-computed n*n
> matrix.
>
> However, with help from CSSM user Matt, I managed to vectorize the
> code so that I calculate the trace of all columns without computing
> for each separately using for-loops.
>
> 2) I already am working on columns. That is why I had to have 1200*x
> as the size. I understand if I transpose the size of the matrix will
> be very less when stored sparse, but I will have to transpose it again
> to perform my trace computations!!

I don't think that's true, arun, from what I know of your problem. The method that we came up with boils down to this, essentially:

result= sum(V.*(L*V));

where V is sparse and L is symmetric. There's no reason it couldn't be implemented as,

result= sum(V.' .*(V.' * L),2).';


It will be a bit slower because pre-multiplying by sparse matrices is slower than post multiplying, but probably not too bad.

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us