Path: news.mathworks.com!not-for-mail
From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Re: size(sparse matrix) > size(full matrix)
Date: Thu, 17 Sep 2009 14:05:24 +0000 (UTC)
Organization: ErasmusMC
Lines: 48
Message-ID: <h8tfn4$1n0$1@fred.mathworks.com>
References: <8e83f86c-df5e-413d-9943-da36c83a66d2@g1g2000vbr.googlegroups.com>
Reply-To: <HIDDEN>
NNTP-Posting-Host: webapp-02-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1253196324 1760 172.30.248.37 (17 Sep 2009 14:05:24 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Thu, 17 Sep 2009 14:05:24 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 1095751
Xref: news.mathworks.com comp.soft-sys.matlab:571083


Dear Arun, 

Another issue can be the dimensions of your matrix. But first I question the sparsity of your current matrix.

This is going to be a long reply, so try to keep up.


1) To be more precise, let me explain how matrices are stored exactly.

Matlab stores the matrix in the compressed column storage (CCS) format. This consists of 3 vectors:
IA:
column pointer
length: #columns + 1
type: unsigned long int (8 bytes) (or int on 32 bits machines, 4 bytes)

JA:
row pointer
length: #nonzero elements
type: unsingled long int (or int on 32 bit)

NA:
numerical values
length: #nonzero elements
type: double (8 bytes)

So, assume you have 100 000 nonzero's, the memory usage  in your case would be:
(39000 + 1)* 8 + 100000 * 8 + 100000 * 8 = 1.8MiB

Whereas stored as a full matrix:
 1200 * 39000 * 8 = 357 MiB.

2) It looks like you have a matrix which is 100% sparse (i.e. no dedicated nonzeros). You say your sparse matrix uses 760MiB? Using the formula above backwards:
( 760MiB/8 - 39001 )/2 ~ 50e6 elements, equal to a full storage (1200*39000). 

Are you sure that the 0-elements are not counted? What is:
nnz(A)
nnz(A~=0)
(nnz(A) for a sparse matrix directly returns the number of nonzero places, even is the elements are 0.).

3) Later, you mention a matrix size of 1200 * 312000. This is an inconvenient size considered the CCS scheme of Matlab. If it is possible, store it as 312000*1200.

Assume again 100 000 nonzeros.
1200*312000 uses (312 000 + 1)* 8 + 100000 * 8 + 100000 * 8 = 3.9MiB
312000*1200 uses (1200 + 1)* 8 + 100000 * 8 + 100000 * 8 = 1.5MiB

So, if there is a huge difference in dimensions of a matrix, make sure you have more rows than columns (as far as storage is concerned, that is).

Sebastiaan