in <news:frk8n2$dqp$1@newsserver.rrzn.uni-hannover.de> I asked how to
efficiently compute a "sparse matrix product":
Given three matrices
A: m x k (full)
B: k x n (full)
X: m x n (sparse, around 2% density)
where m and n are very large numbers and k is small, I want to compute
the matrix product of A and B but I only need the entries of A * B at
the places where X is non-zero.
Thanks to Tim Davis and Roger Stafford I got two nice solutions that I
combined and optimized a bit further. My current best solution is the
following ("best" means fastest and memory-efficient enough):
% Basic idea: Construct the sparse product matrix row by row.
for j = 1:n
t_start = cumnnz_bycol(j) + 1;
t_end = cumnnz_bycol(j + 1);
rows_j = rows(t_start:t_end);
vals(t_start:t_end) = Atr(:, rows_j)' * B(:, j);
end
AB = sparse(rows, cols, vals, m, n);
Now I did some experiments using random matrices (m = 50000, n = 10000,
k = 15). The total time needed for constructing the product was around
2.3 seconds on my computer. When looking at the details, I found that
almost the complete computation time is used by the statement
"Atr(:, rows_j)." So, obviously this algorithm is extremely
memory-bound. I tried a lot but was not able to solve this problem.
Are there any known tricks what can I do to speed-up the memory accesses?
(I should note that I want to compute this product for many different
matrices A and B while keeping X fixed. Maybe this fact can be used in
optimization ...)
Joachim Selke <j.selke@tu-bs.de> wrote in message
<67rg20F2pn19mU1@mid.dfncis.de>...
> Hi,
>
> in <news:frk8n2$dqp$1@newsserver.rrzn.uni-hannover.de> I
asked how to
> efficiently compute a "sparse matrix product":
>
> Given three matrices
>
> A: m x k (full)
> B: k x n (full)
> X: m x n (sparse, around 2% density)
>
> where m and n are very large numbers and k is small, I
want to compute
> the matrix product of A and B but I only need the entries
of A * B at
> the places where X is non-zero.
>
> Thanks to Tim Davis and Roger Stafford I got two nice
solutions that I
> combined and optimized a bit further. My current best
solution is the
> following ("best" means fastest and memory-efficient enough):
>
> % Basic idea: Construct the sparse product matrix row by row.
>
> Atr = A';
> [rows, cols, vals] = find(X);
> lX = logical(X);
> nnz_bycol = full(sum(lX));
> cumnnz_bycol = [0 cumsum(nnz_bycol)];
>
> for j = 1:n
> t_start = cumnnz_bycol(j) + 1;
> t_end = cumnnz_bycol(j + 1);
> rows_j = rows(t_start:t_end);
> vals(t_start:t_end) = Atr(:, rows_j)' * B(:, j);
> end
>
> AB = sparse(rows, cols, vals, m, n);
>
>
> Now I did some experiments using random matrices (m =
50000, n = 10000,
> k = 15). The total time needed for constructing the
product was around
> 2.3 seconds on my computer. When looking at the details, I
found that
> almost the complete computation time is used by the statement
> "Atr(:, rows_j)." So, obviously this algorithm is extremely
> memory-bound. I tried a lot but was not able to solve this
problem.
>
> Are there any known tricks what can I do to speed-up the
memory accesses?
>
> (I should note that I want to compute this product for
many different
> matrices A and B while keeping X fixed. Maybe this fact
can be used in
> optimization ...)
>
>
> Joachim
That looks about as good as you could get, to me.
The Atr(:,rows_j)'*B(:j) is a neat trick; using a
matrix-vector multiply to do lots of entries in AB at once.
I can't think of a better way ... except to go to a
mexFunction of course.
The reason I say that is because a mexFunction could do
Atr(:,rows_j)'*B(:j) without forming the Atr(:,rows_j)
matrix as an intermediate step.
The subsref and subsasgn operations in MATLAB aren't speed
demons, unfortunately, particularly for the sparse case
(this is the dense case, though, since Atr is dense).
Tim Davis wrote:
> I can't think of a better way ... except to go to a
> mexFunction of course.
>
> The reason I say that is because a mexFunction could do
> Atr(:,rows_j)'*B(:j) without forming the Atr(:,rows_j)
> matrix as an intermediate step.
Wow, that really speeds it up. Thanks! :-)
Again, I did some experiments using random matrices (m = 50000,
n = 10000, k = 15). The total time needed for constructing the product
was around 1.3 seconds on my computer with a MEX function written in C
(compared to 2.3 seconds using my previous implementation).
Since this is my first MEX function (and since I did not even program in
C before) it would help me a lot if you please could have a quick look
at my implementation. It is straight-forward and I am quite sure that
there is still room for improvement (maybe by doing some computations in
parallel?). Here is the code:
Joachim Selke <j.selke@tu-bs.de> wrote in message
<688agnF2rcjvjU1@mid.dfncis.de>...
> Tim Davis wrote:
> > I can't think of a better way ... except to go to a
> > mexFunction of course.
> >
> > The reason I say that is because a mexFunction could do
> > Atr(:,rows_j)'*B(:j) without forming the Atr(:,rows_j)
> > matrix as an intermediate step.
>
> Wow, that really speeds it up. Thanks! :-)
>
> Again, I did some experiments using random matrices (m =
50000,
> n = 10000, k = 15). The total time needed for constructing
the product
> was around 1.3 seconds on my computer with a MEX function
written in C
> (compared to 2.3 seconds using my previous implementation).
>
> Since this is my first MEX function (and since I did not
even program in
> C before) it would help me a lot if you please could have
a quick look
> at my implementation. It is straight-forward and I am
quite sure that
> there is still room for improvement (maybe by doing some
computations in
> parallel?). Here is the code:
>
>
>
-------------------------------------------------------------------------
>
> #include "mex.h"
>
> void mexFunction(int nlhs, mxArray *plhs[], int nrhs,
const mxArray
> *prhs[]) {
>
> const mxArray *a, *b, *x;
> a = prhs[0];
> b = prhs[1];
> x = prhs[2];
>
> mwSize m, n, k;
> m = mxGetM(x);
> n = mxGetN(x);
> k = mxGetN(a);
>
> mxArray *ab;
> ab = mxDuplicateArray(x);
>
> mwIndex *rows_ab, *cols_ab;
> rows_ab = mxGetIr(ab);
> cols_ab = mxGetJc(ab);
>
> double *data_a, *data_b, *data_ab;
> data_a = mxGetPr(a);
> data_b = mxGetPr(b);
> data_ab = mxGetPr(ab);
>
> int i, j, r;
> int ind_a, ind_b, ind_ab, ind_ab_next;
> ind_ab = 0;
> for (j = 0; j < n; j++) {
> ind_ab_next = ind_ab + cols_ab[j + 1] - cols_ab[j];
> while (ind_ab < ind_ab_next) {
> i = rows_ab[ind_ab];
> data_ab[ind_ab] = 0;
> for (r = 0; r < k; r++) {
> ind_a = r * m + i;
> ind_b = j * k + r;
> data_ab[ind_ab] += data_a[ind_a] *
data_b[ind_b];
> }
> ind_ab++;
> }
> }
>
> plhs[0] = ab;
> }
>
>
-------------------------------------------------------------------------
>
>
> Joachim
Looks fine to me. I could suggest only one thing -
mxDuplicateArray is copying the numerical values of x into
the output ab, then later you overwrite them. It would be
faster to create ab with mxCreateSparse, then fill the
matrix with both the pattern and the values (the Jc, Ir, and
data arrays). But the improvement would be minor.
Tim Davis wrote:
> Looks fine to me. I could suggest only one thing -
> mxDuplicateArray is copying the numerical values of x into
> the output ab, then later you overwrite them. It would be
> faster to create ab with mxCreateSparse, then fill the
> matrix with both the pattern and the values (the Jc, Ir, and
> data arrays). But the improvement would be minor.
I tried that and it really seems to be minor. In fact, my implementation
of the copying (mxCreateSparse followed by memcopy) was slightly slower
than using mxDuplicateArray.
But I found two additional ways to speed things up:
(1) Replace the k write accesses to data_ab[ind_ab] in the inner loop
by write accesses to a temporary variable z and do a single write
operation to data_ab[ind_ab] after the loop has finished
(2) Linearize accesses to matrix A by using A's transpose instead of A
The new code only needs 0.7 s in my example scenario, computing of A' in
MATLAB included (old code: 1.3 s, M code: 2.3 s).
Instead of "int" you should use mwSignedIndex. It's safer,
if you ever go to a 64-bit platform. I also recommend not
using mwIndex at all, but mwSignedIndex in its place. The
problemis that mwIndex is unsigned, so codes like this:
Your code might not compile with some C compilers as you
intermix declarations and statements. I have replaced
separate declarations and assignments with 'definitions'.
I don't have a MEX compiler system handy so this is untested!
Tim Davis wrote:
> Instead of "int" you should use mwSignedIndex. It's safer,
> if you ever go to a 64-bit platform. I also recommend not
> using mwIndex at all, but mwSignedIndex in its place.
Andy Robb wrote:
> Your code might not compile with some C compilers as you
> intermix declarations and statements. I have replaced
> separate declarations and assignments with 'definitions'.
Thanks. I also changed the loop order of the inner loop as you suggested
in your code. Testing for zero makes it slightly faster. Here is my
updated code:
"Tim Davis" <davis@cise.ufl.edu> wrote in message
<fvsiih$m48$1@fred.mathworks.com>...
> Instead of "int" you should use mwSignedIndex. It's
safer,
> if you ever go to a 64-bit platform. I also recommend
not
> using mwIndex at all, but mwSignedIndex in its place.
The
> problemis that mwIndex is unsigned, so codes like this:
>
> mwIndex i, n ;
> ...
>
> for (i = n-1 ; i >= 0 ; i--) { do something }
>
> fail in an infinite loop, since i is always >= 0.
Tim,
I also avoid unsigned integer types whenever possible for
these types of concerns, but I think your advice is
incomplete. For example, if a function returns an mwIndex
and you stuff the result it into an mwSignedIndex variable
without any overflow checking, you may get a negative
number that will cause downstream problems.
I prefer to get the return value of a function into the
exact same typed variable and *then* deal with it. If you
want to use only signed types in your code after that then
go ahead and convert it but check the result first. e.g.,
suppose you know that mwIndex is unsigned, then consider
the following:
mwIndex ui;
mwSignedIndex si;
ui = (some function that returns an mwIndex type);
si = (mwSignedIndex) ui;
if( si < 0 )
overflow ... deal with the error;
In the above code you will detect the overflow when the
return value is very large and you can deal with it before
causing downstream problems. But even the above code has a
flaw if mwIndex and mwSignedIndex are not the same size.
So to be extra sure you might want to put in even more
checks using sizeof, etc.
"Tim Davis" <davis@cise.ufl.edu> wrote in message
<fvsiih$m48$1@fred.mathworks.com>...
> Instead of "int" you should use mwSignedIndex. It's
safer,
> if you ever go to a 64-bit platform. I also recommend
not
> using mwIndex at all, but mwSignedIndex in its place.
The
> problemis that mwIndex is unsigned, so codes like this:
>
> mwIndex i, n ;
> ...
>
> for (i = n-1 ; i >= 0 ; i--) { do something }
>
> fail in an infinite loop, since i is always >= 0.
Tim,
I also avoid unsigned integer types whenever possible for
these types of concerns, but I think your advice is
incomplete. For example, if a function returns an mwIndex
and you stuff the result it into an mwSignedIndex variable
without any overflow checking, you may get a negative
number that will cause downstream problems.
I prefer to get the return value of a function into the
exact same typed variable and *then* deal with it. If you
want to use only signed types in your code after that then
go ahead and convert it but check the result first. e.g.,
suppose you know that mwIndex is unsigned, then consider
the following:
mwIndex ui;
mwSignedIndex si;
ui = (some function that returns an mwIndex type);
si = (mwSignedIndex) ui;
if( si < 0 )
overflow ... deal with the error;
In the above code you will detect the overflow when the
return value is very large and you can deal with it before
causing downstream problems. But even the above code has a
flaw if mwIndex and mwSignedIndex are not the same size.
So to be extra sure you might want to put in even more
checks using sizeof, etc.
"Tim Davis" <davis@cise.ufl.edu> wrote in message
<fvsiih$m48$1@fred.mathworks.com>...
> Instead of "int" you should use mwSignedIndex. It's
safer,
> if you ever go to a 64-bit platform. I also recommend
not
> using mwIndex at all, but mwSignedIndex in its place.
The
> problemis that mwIndex is unsigned, so codes like this:
>
> mwIndex i, n ;
> ...
>
> for (i = n-1 ; i >= 0 ; i--) { do something }
>
> fail in an infinite loop, since i is always >= 0.
Tim,
I also avoid unsigned integer types whenever possible for
these types of concerns, but I think your advice is
incomplete. For example, if a function returns an mwIndex
and you stuff the result it into an mwSignedIndex variable
without any overflow checking, you may get a negative
number that will cause downstream problems.
I prefer to get the return value of a function into the
exact same typed variable and *then* deal with it. If you
want to use only signed types in your code after that then
go ahead and convert it but check the result first. e.g.,
suppose you know that mwIndex is unsigned, then consider
the following:
mwIndex ui;
mwSignedIndex si;
ui = (some function that returns an mwIndex type);
si = (mwSignedIndex) ui;
if( si < 0 )
overflow ... deal with the error;
In the above code you will detect the overflow when the
return value is very large and you can deal with it before
causing downstream problems. But even the above code has a
flaw if mwIndex and mwSignedIndex are not the same size.
So to be extra sure you might want to put in even more
checks using sizeof, etc.
"Tim Davis" <davis@cise.ufl.edu> wrote in message
<fvsiih$m48$1@fred.mathworks.com>...
> Instead of "int" you should use mwSignedIndex. It's
safer,
> if you ever go to a 64-bit platform. I also recommend
not
> using mwIndex at all, but mwSignedIndex in its place.
The
> problemis that mwIndex is unsigned, so codes like this:
>
> mwIndex i, n ;
> ...
>
> for (i = n-1 ; i >= 0 ; i--) { do something }
>
> fail in an infinite loop, since i is always >= 0.
Tim,
I also avoid unsigned integer types whenever possible for
these types of concerns, but I think your advice is
incomplete. For example, if a function returns an mwIndex
and you stuff the result it into an mwSignedIndex variable
without any overflow checking, you may get a negative
number that will cause downstream problems.
I prefer to get the return value of a function into the
exact same typed variable and *then* deal with it. If you
want to use only signed types in your code after that then
go ahead and convert it but check the result first. e.g.,
suppose you know that mwIndex is unsigned, then consider
the following:
mwIndex ui;
mwSignedIndex si;
ui = (some function that returns an mwIndex type);
si = (mwSignedIndex) ui;
if( si < 0 )
overflow ... deal with the error;
In the above code you will detect the overflow when the
return value is very large and you can deal with it before
causing downstream problems. But even the above code has a
flaw if mwIndex and mwSignedIndex are not the same size.
So to be extra sure you might want to put in even more
checks using sizeof, etc.
Public Submission Policy
NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for
all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content.
Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available
via MATLAB Central. Read the complete Disclaimer prior to use.