Entry-wise multiplication of a sparse matrix on GPU

Using a GPU, I need to multiply sparse matrices using the entrywise ' times' (i.e. .*) operation. As far as I can tell, this is not currently supported on a gpuArray.
For example,
>> rand(5).*sparse(rand(5))
and
>> rand(5,'gpuArray').*rand(5,'gpuArray')
both work, but
>> rand(5,'gpuArray').*sparse(rand(5,'gpuArray'))
Error using .*
Sparse gpuArrays are not supported for this function.
Is there any way I can get around this without converting matrices back to full (which would negate most/all of the advantage of using the GPU).
Thanks

Answers (1)

Is the sparsity the same for both matrices?
[I, J, VA] = find(Asparse);
[~, ~, VB] = find(Bsparse);
C = sparse(I, J, VA.*VB);
If you just want to multiply it by itself you can use SPFUN:
Asqr = spfun(@(x)x.^2, Asparse);
If the sparsity is different then you have to merge the indices unfortunately. It's a horrible sequence of operations involving sorting and indexing, and not very efficient which is why it hasn't been implemented. I'm curious as to what your application is - maybe there's another solution that doesn't involve element-wise operations.

9 Comments

Thanks,
Unfortunately the matrices may be of different sparsity.
Any work-around I can think of seems to fall apart because of the inability to index sparse arrays on the gpu.
i.e.
>> Asparse(1,1)
Error using gpuArray/bsxfun
Sparse gpuArrays are not supported for this function.
If you will excuse my very naive question, I am curious as to why is this not possible for sparse arrays on a GPU?
---------------------------------------------------------
I am working on an ecosystem model. The main state variable matrix P is [ns x np], where np is the number of populations, and ns is the number of points in a spatial grid.
The main overhead is multiplication of P by a trophic interaction matrix G [np x np] that describes who is eating whom. This is the main cost of the model, but I can save a lot of time because P is sparse.
However, there are several other functions that require entry-wise multiplication.
For example, I need to multiply the growth rate mu by a temperature dependence function gamma by the resource affinity alpha by the resource concentration N by the population size P. (N and P are time-dependent state variables)
(mu*gamma).*(alpha*N).*P
with dimensions
mu np x 1
alpha np x 1
gamma 1 x ns
N 1 x ns
P np x ns
I cannot see away of avoiding the entry-wise product here. It is also worth noting that this is the simplest linear form of the problem I could not think of a solution to (avoiding times). The actual equation is non-linear, but if you have any suggestions for this case, it might set me in the right direction.
@Ben: The GPU libraries and the hardware are optimized for processing full matrices. Working with sparse matrices is a completely different job, because for full arrays the arrangement of the elements is determined by their position in the memory, while for sparse arrays there is an additional layer, which contains the indices. There are several different ways to implement sparse arrays, and not all of them might be implemented in the GPU libraries.
Because the GPU hardware is not optimized for sparse arrays, it is not sure, if processing them there is faster as on the CPU at all.
For arrays of different sparsity, the following seems to work (in 2017a)...
[IA, JA, VA] = find(A);
[IB, JB, VB] = find(B);
[Ci I J] = intersect([IA JA],[IB JB],'rows');
C = sparse(Ci(:,1), Ci(:,2), VA(I).*VB(J))
But can be quite inefficient in some cases.
What about:
K = find(A);
[I, J, VA] = find(A);
C = sparse(I, J, VA .* B(K));
I would not expect that a C-Mex is faster than the builtin libraries for sparse matrices. But the problem remains: The suggested methods do not work on a GPU. Even Asparse(1,1) fails, that B(K) is not an option also.
Hi Jan,
"Because the GPU hardware is not optimized for sparse arrays, it is not sure, if processing them there is faster as on the CPU at all."...
This is what I was afraid of.
Interestingly, the GPU (with full matrices) does become more efficient relative to the CPUs (with sparse matrices) as the ecosystem grows and the P matrix becomes less sparse.
Being optimistic, NVIDIA do provide a sparse matrix library (cuSPARSE), so perhaps more functionality can be incorporated into future releases?
Thanks
@Ben: Ask Matlab directly for a feature enhancement. This is sometimes more useful than hoping :-)
Thanks for your answer Jan - I should have noticed that intersect already did all the things I needed!
However, Jan's answer says it all I think - it gives the right answer, but on my Kepler card it's 15x slower. It's not efficient on the GPU and so the workaround is just to do it on the CPU. Any native implementation we wrote would have to do much of what intersect and the three-input form of sparse are having to do, and it's none too efficient on a GPU. Nevertheless, I'm sure something better could be done.
I will take it as an enhancement request (TBH we already have one). When we added support for sparse gpuArrays we just put in the most useful functionality first. The most common use case for sparse arrays is solving linear systems, which is all matrix-matrix and matrix-vector multiplication. But it's great to have a real use case for this. times is actually one of the easier element-wise functions because the output sparsity is the intersection rather than the union.
You refer to "inability to extract individual elements from a sparse array on the gpu" - but of course this is possible using find. It's indexing that isn't supported.
It was hard to tell from your example equation but it looks like a lot of those operations were multiplication by a scalar - that is supported.
Hi Joss and Jan, Thanks both for your replies.
Joss, I've edited my initial reply to make it clearer for future readers. For reference, 'a real use case' can be found here...
https://en.wikipedia.org/wiki/Matrix_population_models
The core of these models is the mtimes operation, but entry-wise operations will frequently be essential as well. Adding this functionality would therefore be very useful in supporting this kind of research.
thanks again.
Ben
Thanks Ben. Just to reiterate, there's no guarantee a GPU implementation will be faster than the CPU. Don't convert your sparse matrices to dense to work around this issue, gather them to the CPU.

Sign in to comment.

Categories

Find more on Sparse Matrices in Help Center and File Exchange

Asked:

on 8 Aug 2017

Commented:

on 12 Aug 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!