Path: news.mathworks.com!newsfeed-00.mathworks.com!fu-berlin.de!uni-berlin.de!news.dfncis.de!not-for-mail
From: Joachim Selke <j.selke@tu-bs.de>
Newsgroups: comp.soft-sys.matlab
Subject: Sparse matrix product again
Date: Wed, 30 Apr 2008 17:03:28 +0200
Organization: Technische =?ISO-8859-1?Q?Universit=E4t_Braunschweig?=
Lines: 52
Message-ID: <67rg20F2pn19mU1@mid.dfncis.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Trace: news.dfncis.de D+s8zJCUsTGd3EKFrfq5qwsybz68rVGvjeFBnax+euYbKJ
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.12) Gecko/20080226 Fedora/2.0.0.12-1.fc8 Thunderbird/2.0.0.12 Mnenhy/0.7.5.0
Xref: news.mathworks.com comp.soft-sys.matlab:465952


Hi,

in <news:frk8n2$dqp$1@newsserver.rrzn.uni-hannover.de> I asked how to
efficiently compute a "sparse matrix product":

Given three matrices

  A: m x k   (full)
  B: k x n   (full)
  X: m x n   (sparse, around 2% density)

where m and n are very large numbers and k is small, I want to compute
the matrix product of A and B but I only need the entries of A * B at
the places where X is non-zero.

Thanks to Tim Davis and Roger Stafford I got two nice solutions that I
combined and optimized a bit further. My current best solution is the
following ("best" means fastest and memory-efficient enough):

% Basic idea: Construct the sparse product matrix row by row.

Atr = A';
[rows, cols, vals] = find(X);
lX = logical(X);
nnz_bycol = full(sum(lX));
cumnnz_bycol = [0 cumsum(nnz_bycol)];

for j = 1:n
    t_start = cumnnz_bycol(j) + 1;
    t_end = cumnnz_bycol(j + 1);
    rows_j = rows(t_start:t_end);
    vals(t_start:t_end) = Atr(:, rows_j)' * B(:, j);
end

AB = sparse(rows, cols, vals, m, n);


Now I did some experiments using random matrices (m = 50000, n = 10000,
k = 15). The total time needed for constructing the product was around
2.3 seconds on my computer. When looking at the details, I found that
almost the complete computation time is used by the statement
"Atr(:, rows_j)." So, obviously this algorithm is extremely
memory-bound. I tried a lot but was not able to solve this problem.

Are there any known tricks what can I do to speed-up the memory accesses?

(I should note that I want to compute this product for many different
matrices A and B while keeping X fixed. Maybe this fact can be used in
optimization ...)


Joachim