How can I perform large sparse matrix multiplication efficiently when one sparse matrix is block diagonal?

16 views (last 30 days)
Background:
I have a 3d matrix V of size [J,I,K], where J and I are large, say J=1e5 and I=1e4. I would like to get two matrices:
V_ik = sum(V,1);
V_jk = sum(V,2);
The 3d matrix V is actually very sparse (around 90% sparsity), but the full [J,I,K] matrix takes up too much memory usage.
To deal with this problem, I code V into a 2d sparse matrix V_sp = [V_1;V_2;...;V_I], where each block matrix V_i is a matrix of size [J,K]. The problem arises when I calculate V_ik and V_jk. From the view of matrix multiplication, I write the following code:
V_ik = kron(speye(I),ones(1,J)) * V_sp;
V_jk = repmat(speye(J),1,I) * V_sp;
However, the sparse matrix kron(speye(I),ones(1,J)) is of size [I,J*I], and repmat(speye(J),1,I) is of size [J,J*I]. Either of them takes up 22.4G storage for J=1e5 and I=1e4.
My Question:
Is there an efficient way to perform this kind of matrix multiplication without too much memory usage? I tried using a for-loop to sum along the desired axis, the code with for-loop worked at the cost of around 5x more running time.
If there's no such efficient solution, can I code the 3d sparse matrix V in some other smart format such that I can reach a balance between the memory usage and running time? (I found ndSparse class that could store n-dimensional sparse arrays, but the class will store it internally as an ordinary 2D sparse array, and the sum function applied to it is not that fast.)
Thanks in advance for your help!
  7 Comments
Bruno Luong
Bruno Luong on 28 Oct 2021
The blockdiag idea seems to work well
I=1e4;
J=1e4;
K=10;
n=I*J*K;
density=0.1;
m=round(n*density); % number of non-zero elements
% Storage [i,j,k,V] as sparse 3D array, (i,j,k) are subindex, V are corresponding values
i=randi(I,[m,1],'uint32');
j=randi(J,[m,1],'uint32');
k=randi(K,[m,1],'uint32');
V=rand([m,1]);
% Sparse storage
V_sp=sparse(j+(i-1)*J,k,V,I*J,K);
V_blkdiag=sparse(i+(k-1)*I,j+(k-1)*J,V,I*K,J*K);
% Your method using V_sp
tic
V_ik = kron(speye(I),ones(1,J)) * V_sp;
V_jk = repmat(speye(J),1,I) * V_sp;
toc
Elapsed time is 7.864562 seconds.
% James's method using V_blkdiag
tic
V_ik = reshape(sum(V_blkdiag,2),[I K]);
V_jk = reshape(sum(V_blkdiag,1),[J K]);
toc
Elapsed time is 0.292978 seconds.
% Method using [i,j,k,V]
tic
V_ik=accumarray([i k],V,[I K]);
V_jk=accumarray([j k],V,[J K]);
toc
Elapsed time is 2.402071 seconds.
Wenyu Zhang
Wenyu Zhang on 28 Oct 2021
Thank you for comparison @Bruno Luong!
Although the blockdiag idea seems to work well, if what I have at hand is the non-zero values and the subindex [i,j,k], I don't need to go far to create a sparse diagonal matrix first, so the accumarray is faster. Perhaps the storage of my data is what I should reconsider as the highest priority.
I=1e4;
J=1e4;
K=10;
n=I*J*K;
density=0.1;
m=round(n*density); % number of non-zero elements
% Storage [i,j,k,V] as sparse 3D array, (i,j,k) are subindex, V are corresponding values
i=randi(I,[m,1],'uint32');
j=randi(J,[m,1],'uint32');
k=randi(K,[m,1],'uint32');
V=rand([m,1]);
% James's method using V_blkdiag
tic
V_blkdiag=sparse(i+(k-1)*I,j+(k-1)*J,V,I*K,J*K);
V_ik = reshape(sum(V_blkdiag,2),[I K]);
V_jk = reshape(sum(V_blkdiag,1),[J K]);
toc
Elapsed time is 11.238796 seconds.
% Method using [i,j,k,V]
tic
V_ik=accumarray([i k],V,[I K]);
V_jk=accumarray([j k],V,[J K]);
toc
Elapsed time is 2.329572 seconds.

Sign in to comment.

Accepted Answer

Bruno Luong
Bruno Luong on 27 Oct 2021
Edited: Bruno Luong on 28 Oct 2021
You might rethink of the storage of your data, such as this (tic/toc result obtained from run on TMW online server):
I=1e4;
J=1e4;
K=10;
n=I*J*K;
density=0.1;
m=round(n*density); % number of non-zero elements
% Storage [i,j,k,V] as sparse 3D array, (i,j,k) are subindex, V are corresponding values
i=randi(I,[m,1],'uint32');
j=randi(J,[m,1],'uint32');
k=randi(K,[m,1],'uint32');
V=rand([m,1]);
% Sparse storage
V_sp=sparse(j+(i-1)*J,k,V,I*J,K);
% Your method using V_sp
tic
V_ik = kron(speye(I),ones(1,J)) * V_sp;
V_jk = repmat(speye(J),1,I) * V_sp;
toc
Elapsed time is 7.273694 seconds.
% Method using [i,j,k,V]
tic
V_ik=accumarray([i k],V,[I K]);
V_jk=accumarray([j k],V,[J K]);
toc
Elapsed time is 2.346202 seconds.

More Answers (3)

Matt J
Matt J on 27 Oct 2021
Edited: Matt J on 27 Oct 2021
With ndSparse, did you try the summl() method? It's more memory intensive, but it can also be faster. Here's what I see in a simple test comparing ndSparse.sum() and ndSparse.summl().
>> A=ndSparse.sprand([1e4,1e4,10],0.1);
>> tic; sum(A,1);toc
Elapsed time is 10.697812 seconds.
>> tic; summl(A,1);toc
Elapsed time is 0.927524 seconds.
  2 Comments
Wenyu Zhang
Wenyu Zhang on 28 Oct 2021
Thank you so much. I haven't tried this before. I will try it and see if it improves significantly in my own matrix.
Wenyu Zhang
Wenyu Zhang on 29 Oct 2021
I have tried this and found that summl is significantly faster than sum. However, it takes some time to build the ndSparse class. Anyway, thanks for your help!

Sign in to comment.


Bjorn Gustavsson
Bjorn Gustavsson on 27 Oct 2021
Shouldn't straight summation be preferable for the calculations of sums rather than doing it by way of matrix multiplication. Perhaps something like this might work:
v_ik = zeros(I,K);
v_jk = zeros(J,K);
for i1 = 1:I,
v_ik(i1,:) = sum(v_sp((1+J*(i1-1)):(J*i1),:));
end
for j1 = 1:J,
v_jk(j1,:) = sum(v_sp(j1:J:end,:));
end
HTH
  1 Comment
Wenyu Zhang
Wenyu Zhang on 27 Oct 2021
The code with for-loop worked indeed, but at the cost of around 5x more running time.
I think a possible reason is that extracting submatrices from a sparse matrix takes time.

Sign in to comment.


Matt J
Matt J on 28 Oct 2021
Edited: Matt J on 28 Oct 2021
The typical value of K is 50.
Since the third dimension is so small, it's probably worthwhile just storing the data as a length-K cell array of JxI matrices. You can then just do the summations with,
cellfun(@(v) sum(v,1), V,'uni',0);
cellfun(@(v) sum(v,2), V,'uni',0);
  1 Comment
Wenyu Zhang
Wenyu Zhang on 29 Oct 2021
Actually my V consists of J*I length-K vectors, each generated from a multinomial distribution using mnrnd. It may bring overhead to reshape the data into a length-K cell array of J*I. Thanks for your suggestion!

Sign in to comment.

Products


Release

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!