How can I perform large sparse matrix multiplication efficiently when one sparse matrix is block diagonal?
16 views (last 30 days)
Show older comments
Background:
I have a 3d matrix V of size [J,I,K], where J and I are large, say J=1e5 and I=1e4. I would like to get two matrices:
V_ik = sum(V,1);
V_jk = sum(V,2);
The 3d matrix V is actually very sparse (around 90% sparsity), but the full [J,I,K] matrix takes up too much memory usage.
To deal with this problem, I code V into a 2d sparse matrix V_sp = [V_1;V_2;...;V_I], where each block matrix V_i is a matrix of size [J,K]. The problem arises when I calculate V_ik and V_jk. From the view of matrix multiplication, I write the following code:
V_ik = kron(speye(I),ones(1,J)) * V_sp;
V_jk = repmat(speye(J),1,I) * V_sp;
However, the sparse matrix kron(speye(I),ones(1,J)) is of size [I,J*I], and repmat(speye(J),1,I) is of size [J,J*I]. Either of them takes up 22.4G storage for J=1e5 and I=1e4.
My Question:
Is there an efficient way to perform this kind of matrix multiplication without too much memory usage? I tried using a for-loop to sum along the desired axis, the code with for-loop worked at the cost of around 5x more running time.
If there's no such efficient solution, can I code the 3d sparse matrix V in some other smart format such that I can reach a balance between the memory usage and running time? (I found ndSparse class that could store n-dimensional sparse arrays, but the class will store it internally as an ordinary 2D sparse array, and the sum function applied to it is not that fast.)
Thanks in advance for your help!
7 Comments
Bruno Luong
on 28 Oct 2021
The blockdiag idea seems to work well
I=1e4;
J=1e4;
K=10;
n=I*J*K;
density=0.1;
m=round(n*density); % number of non-zero elements
% Storage [i,j,k,V] as sparse 3D array, (i,j,k) are subindex, V are corresponding values
i=randi(I,[m,1],'uint32');
j=randi(J,[m,1],'uint32');
k=randi(K,[m,1],'uint32');
V=rand([m,1]);
% Sparse storage
V_sp=sparse(j+(i-1)*J,k,V,I*J,K);
V_blkdiag=sparse(i+(k-1)*I,j+(k-1)*J,V,I*K,J*K);
% Your method using V_sp
tic
V_ik = kron(speye(I),ones(1,J)) * V_sp;
V_jk = repmat(speye(J),1,I) * V_sp;
toc
% James's method using V_blkdiag
tic
V_ik = reshape(sum(V_blkdiag,2),[I K]);
V_jk = reshape(sum(V_blkdiag,1),[J K]);
toc
% Method using [i,j,k,V]
tic
V_ik=accumarray([i k],V,[I K]);
V_jk=accumarray([j k],V,[J K]);
toc
Accepted Answer
Bruno Luong
on 27 Oct 2021
Edited: Bruno Luong
on 28 Oct 2021
You might rethink of the storage of your data, such as this (tic/toc result obtained from run on TMW online server):
I=1e4;
J=1e4;
K=10;
n=I*J*K;
density=0.1;
m=round(n*density); % number of non-zero elements
% Storage [i,j,k,V] as sparse 3D array, (i,j,k) are subindex, V are corresponding values
i=randi(I,[m,1],'uint32');
j=randi(J,[m,1],'uint32');
k=randi(K,[m,1],'uint32');
V=rand([m,1]);
% Sparse storage
V_sp=sparse(j+(i-1)*J,k,V,I*J,K);
% Your method using V_sp
tic
V_ik = kron(speye(I),ones(1,J)) * V_sp;
V_jk = repmat(speye(J),1,I) * V_sp;
toc
% Method using [i,j,k,V]
tic
V_ik=accumarray([i k],V,[I K]);
V_jk=accumarray([j k],V,[J K]);
toc
More Answers (3)
Matt J
on 27 Oct 2021
Edited: Matt J
on 27 Oct 2021
With ndSparse, did you try the summl() method? It's more memory intensive, but it can also be faster. Here's what I see in a simple test comparing ndSparse.sum() and ndSparse.summl().
>> A=ndSparse.sprand([1e4,1e4,10],0.1);
>> tic; sum(A,1);toc
Elapsed time is 10.697812 seconds.
>> tic; summl(A,1);toc
Elapsed time is 0.927524 seconds.
Bjorn Gustavsson
on 27 Oct 2021
Shouldn't straight summation be preferable for the calculations of sums rather than doing it by way of matrix multiplication. Perhaps something like this might work:
v_ik = zeros(I,K);
v_jk = zeros(J,K);
for i1 = 1:I,
v_ik(i1,:) = sum(v_sp((1+J*(i1-1)):(J*i1),:));
end
for j1 = 1:J,
v_jk(j1,:) = sum(v_sp(j1:J:end,:));
end
HTH
Matt J
on 28 Oct 2021
Edited: Matt J
on 28 Oct 2021
The typical value of K is 50.
Since the third dimension is so small, it's probably worthwhile just storing the data as a length-K cell array of JxI matrices. You can then just do the summations with,
cellfun(@(v) sum(v,1), V,'uni',0);
cellfun(@(v) sum(v,2), V,'uni',0);
See Also
Categories
Find more on Matrix Indexing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!