sparse function cannot get integer arrays for the indices

4 views (last 30 days)
Hello, we are working with some pretty large sparse matrices and we found this strange behaviour of the sparse function:
>> sparse(uint32([1 2]),uint32([1 2]),[3 4],2,2)
Undefined function 'sparse' for input arguments of type 'uint32'.
>> sparse(int32([1 2]),int32([1 2]),[3 4],2,2)
Undefined function 'sparse' for input arguments of type 'int32'.
>> sparse(double([1 2]),double([1 2]),[3 4],2,2)
ans =
(1,1) 3
(2,2) 4
Is it not possible to index with integers? Any reason for that?
Thanks, Sergio

Answers (2)

James Tursa
James Tursa on 11 Nov 2016
The sparse matrix storage has always been the following internally as far as I know (I think back to R2006a is all I have checked ... I have never had access to earlier versions of MATLAB):
Row indexes: An integer type (e.g., size_t)
Column indexes: An integer type (e.g., size_t)*
Data: Either double or byte (for logical)
* Note that the "column indexes" are actually cumulative column number totals, not indexes per se
The fact that the "sparse" function in MATLAB only accepts double indexes is maybe because these indexes (and the associated data) need to get sorted and converted internally in order to build the sparse matrix, and TMW didn't want to write these routines at the low level for all possible combinations of numeric input types. This is all a guess on my part. Regardless, these indexes will get converted to an integer type internally ... they are not stored as double and to my knowledge never have been.
Of course, MATLAB could simply cast the input indexes of any numeric type to double internally (with the user none the wiser) for the sorting etc, but that would necessitate potentially large temporary copies of the input indexes (pre-sorted versions). So rather than doing this and having a hidden additional memory hit in the background, TMW forces the user to do this conversion manually. Again, this is simply conjecture on my part.
The minimum data storage requirement formula for a double m x n sparse matrix with nnz non-zero elements, including the index data, is as follows on a 32-bit system:
bytes = max(nnz,1) * (4 + 8) + (n+1)*4
Which breaks down as follows:
nnz * 4 = Storing the row index of the non-zero elements
nnz * 8 = Storing the non-zero double element values themselves
(n+1)*4 = Storing the cumulative number of non-zero elements thru column
For 64-bit systems using 8-byte integers for the indexing you can replace the 4's above with 8's. This is just the minimum requirements. A sparse matrix can have excess memory allocated beyond the minimum if desired. And for logical sparse matrices, replace the nnz * 8 with just nnz of course (only 1 byte per value).
The only thing that has changed over the years that I am aware of, is that shared data copies of sparse matrices used to only share the data memory, not the index memory. But several years ago (I don't recall the version) that was changed so that the index memory is now also shared.
  1 Comment
Sergio Zlotnik
Sergio Zlotnik on 12 Nov 2016
Thanks for your detailed answer James. I know the internal storage of sparse matrices and have coded some mex dealing with it in the past. It is still strange to me that sparse does not accept integers, I cannot think in any operation between integers that is better to do with doubles.

Sign in to comment.


Guillaume
Guillaume on 11 Nov 2016
The documentation of sparse does state that the only valid type for the indices is double.
Since double can represent exactly all uint32 integers, why is it a problem anyway? You will only start having problems if your indices are greater than flintmax (= 9007199254740992) which is well outside the range of uint32 (but not out of the range of uint64).
  4 Comments
Matt J
Matt J on 11 Nov 2016
Edited: Matt J on 11 Nov 2016
Since sparse arrays accept indices above this value ... they would have to be stored either as double or as uint64 which use the same amount of space.
Yes, but it would be nice if the user could have some freedom to use a smaller type if the matrix size allowed.
I believe internally, matlab use an integer type (size_t ?) to store the integers. That's what the blurb about sparse matrices on this page seems to imply.
That would be a fairly recent thing. I tested this some releases ago with a comparison like the following
>> Af=rand(5e6,3); As=sparse(Af);
>> Whos Af As
Name Size Kilobytes Class Attributes
Af 5000000x3 117188 double
As 5000000x3 234376 double sparse
The memory consumption of the sparse version does seem to have dropped a bit from a factor of 3 difference, so perhaps you're right.
Sergio Zlotnik
Sergio Zlotnik on 12 Nov 2016
Thanks for the discussion.
As Matt pointed out, my concern is about memory use. For example, I don´t see any value of having negative indices in the sparse function (diag is a different case).

Sign in to comment.

Categories

Find more on Sparse Matrices in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!