Path: news.mathworks.com!newsfeed-00.mathworks.com!news.tele.dk!feed118.news.tele.dk!news.tele.dk!small.news.tele.dk!fu-berlin.de!uni-berlin.de!news.dfncis.de!not-for-mail
From: Joachim Selke <j.selke@tu-bs.de>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Sparse matrix product again
Date: Wed, 07 May 2008 15:02:21 +0200
Organization: Technische =?ISO-8859-1?Q?Universit=E4t_Braunschweig?=
Lines: 83
Message-ID: <68dnitF2soa18U1@mid.dfncis.de>
References: <67rg20F2pn19mU1@mid.dfncis.de> <fvfd1n$1dc$1@fred.mathworks.com> <688agnF2rcjvjU1@mid.dfncis.de> <fvmtmc$k80$1@fred.mathworks.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Trace: news.dfncis.de 9EIIcble3rTV7PKWoZV13wu5OWZ2+O8wboBPefRvZ+HhzT
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.1.12) Gecko/20080226 Fedora/2.0.0.12-1.fc8 Thunderbird/2.0.0.12 Mnenhy/0.7.5.0
In-Reply-To: <fvmtmc$k80$1@fred.mathworks.com>
Xref: news.mathworks.com comp.soft-sys.matlab:467155


Tim Davis wrote:
> Looks fine to me.  I could suggest only one thing -
> mxDuplicateArray is copying the numerical values of x into
> the output ab, then later you overwrite them.  It would be
> faster to create ab with mxCreateSparse, then fill the
> matrix with both the pattern and the values (the Jc, Ir, and
> data arrays).  But the improvement would be minor.

I tried that and it really seems to be minor. In fact, my implementation
of the copying (mxCreateSparse followed by memcopy) was slightly slower
than using mxDuplicateArray.

But I found two additional ways to speed things up:
 (1) Replace the k write accesses to data_ab[ind_ab] in the inner loop
     by write accesses to a temporary variable z and do a single write
     operation to data_ab[ind_ab] after the loop has finished
 (2) Linearize accesses to matrix A by using A's transpose instead of A

The new code only needs 0.7 s in my example scenario, computing of A' in
MATLAB included (old code: 1.3 s, M code: 2.3 s).

Here is the code:

---------------------------------------------------------------------

#include "mex.h"

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray
*prhs[]) {

    const mxArray *atr, *b, *x;
    atr = prhs[0];
    b = prhs[1];
    x = prhs[2];

    mwSize m, n, k;
    m = mxGetM(x);
    n = mxGetN(x);
    k = mxGetM(atr);

    mxArray *ab;
    ab = mxDuplicateArray(x);

    mwIndex *rows_ab, *cols_ab;
    rows_ab = mxGetIr(ab);
    cols_ab = mxGetJc(ab);

    double *data_atr, *data_b, *data_ab;
    data_atr = mxGetPr(atr);
    data_b = mxGetPr(b);
    data_ab = mxGetPr(ab);

    double z;
    int i, j, r;
    int ind_atr, ind_b, ind_ab, rows_left;
    ind_ab = 0;
    for (j = 0; j < n; j++) {
        rows_left = cols_ab[j + 1] - cols_ab[j];
        while (rows_left != 0) {
            i = rows_ab[ind_ab];
            z = 0;
            ind_atr = i * k;
            ind_b = j * k;
            for (r = 0; r < k; r++) {
                z += data_atr[ind_atr] * data_b[ind_b];
                ind_atr++;
                ind_b++;
            }
            data_ab[ind_ab] = z;
            ind_ab++;
            rows_left--;
        }
    }

    plhs[0] = ab;
}

---------------------------------------------------------------------


Now I think that this is all one can do to speed it up. Am I right? :-)

Joachim