Entry-wise multiplication of a sparse matrix on GPU
Show older comments
Using a GPU, I need to multiply sparse matrices using the entrywise ' times' (i.e. .*) operation. As far as I can tell, this is not currently supported on a gpuArray.
For example,
>> rand(5).*sparse(rand(5))
and
>> rand(5,'gpuArray').*rand(5,'gpuArray')
both work, but
>> rand(5,'gpuArray').*sparse(rand(5,'gpuArray'))
Error using .*
Sparse gpuArrays are not supported for this function.
Is there any way I can get around this without converting matrices back to full (which would negate most/all of the advantage of using the GPU).
Thanks
Answers (1)
Joss Knight
on 8 Aug 2017
Is the sparsity the same for both matrices?
[I, J, VA] = find(Asparse);
[~, ~, VB] = find(Bsparse);
C = sparse(I, J, VA.*VB);
If you just want to multiply it by itself you can use SPFUN:
Asqr = spfun(@(x)x.^2, Asparse);
If the sparsity is different then you have to merge the indices unfortunately. It's a horrible sequence of operations involving sorting and indexing, and not very efficient which is why it hasn't been implemented. I'm curious as to what your application is - maybe there's another solution that doesn't involve element-wise operations.
9 Comments
Jan
on 9 Aug 2017
@Ben: The GPU libraries and the hardware are optimized for processing full matrices. Working with sparse matrices is a completely different job, because for full arrays the arrangement of the elements is determined by their position in the memory, while for sparse arrays there is an additional layer, which contains the indices. There are several different ways to implement sparse arrays, and not all of them might be implemented in the GPU libraries.
Because the GPU hardware is not optimized for sparse arrays, it is not sure, if processing them there is faster as on the CPU at all.
Ben Ward
on 9 Aug 2017
Jan
on 9 Aug 2017
What about:
K = find(A);
[I, J, VA] = find(A);
C = sparse(I, J, VA .* B(K));
I would not expect that a C-Mex is faster than the builtin libraries for sparse matrices. But the problem remains: The suggested methods do not work on a GPU. Even Asparse(1,1) fails, that B(K) is not an option also.
Jan
on 9 Aug 2017
@Ben: Ask Matlab directly for a feature enhancement. This is sometimes more useful than hoping :-)
Joss Knight
on 9 Aug 2017
Edited: Joss Knight
on 9 Aug 2017
Thanks for your answer Jan - I should have noticed that intersect already did all the things I needed!
However, Jan's answer says it all I think - it gives the right answer, but on my Kepler card it's 15x slower. It's not efficient on the GPU and so the workaround is just to do it on the CPU. Any native implementation we wrote would have to do much of what intersect and the three-input form of sparse are having to do, and it's none too efficient on a GPU. Nevertheless, I'm sure something better could be done.
I will take it as an enhancement request (TBH we already have one). When we added support for sparse gpuArrays we just put in the most useful functionality first. The most common use case for sparse arrays is solving linear systems, which is all matrix-matrix and matrix-vector multiplication. But it's great to have a real use case for this. times is actually one of the easier element-wise functions because the output sparsity is the intersection rather than the union.
You refer to "inability to extract individual elements from a sparse array on the gpu" - but of course this is possible using find. It's indexing that isn't supported.
It was hard to tell from your example equation but it looks like a lot of those operations were multiplication by a scalar - that is supported.
Joss Knight
on 12 Aug 2017
Thanks Ben. Just to reiterate, there's no guarantee a GPU implementation will be faster than the CPU. Don't convert your sparse matrices to dense to work around this issue, gather them to the CPU.
Categories
Find more on Sparse Matrices in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!