Path: news.mathworks.com!not-for-mail
From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Re: New challenge! Minimizing the amount of 'unnecessary memory' !
Date: Sun, 13 Mar 2011 01:43:04 +0000 (UTC)
Organization: Boeing Co
Lines: 82
Message-ID: <ilh7f8$l2l$1@fred.mathworks.com>
References: <ilbujg$rbl$1@fred.mathworks.com>
Reply-To: <HIDDEN>
NNTP-Posting-Host: www-00-blr.mathworks.com
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1299980584 21589 172.30.248.45 (13 Mar 2011 01:43:04 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Sun, 13 Mar 2011 01:43:04 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 756104
Xref: news.mathworks.com comp.soft-sys.matlab:715453

"Nike Dattani" <dattani.nike@gmail.com> wrote in message <ilbujg$rbl$1@fred.mathworks.com>...
> 
> I didn't have much luck the only other time I tried to use this newsgroup, but 
> I'm having a problem with getting the line below to compute without using 'unnecessary' memory.
> ===============
> A is a 4^13 by 1 , complex valued, double precision, vector (1.07GB)
> 
> indices is a 4^13 by 13, unsigned 8 bit integer matrix (0.87GB)
> 
> indices=
> [ 1 1 1 1 1 1 1 1 1 1 1 1 1;
>   1 1 1 1 1 1 1 1 1 1 1 1 2;
>   1 1 1 1 1 1 1 1 1 1 1 1 3;
>   1 1 1 1 1 1 1 1 1 1 1 1 4;
>   1 1 1 1 1 1 1 1 1 1 1 1 2;
> ...
>   4 4 4 4 4 4 4 4 4 4 4 4 4]
> 
> which is easily computed using Matt Fig's function from the File Exchange: npermutek(1:4,13);
> 
> K is a 16 by 13 complex valued, double precision, matrix. (negligibile memory)
> ===============
> 
> Now I am running the line below in a loop over j from 0 to 12:
> 
> A=A.*K((indices(:,end-j)-1)*4+indices(:,end),j+1); 
> 
> ===============

Here is an example of a mex routine that does the above calculation. It works in-place and takes the A, K, and indices arrays as inputs. This is bare bones with no argument checking, but it does illustrate how simply the above can be done without creating a lot of intermediate arrays and unnecessary data movement.

---------------------------------------------------

// File: AKindices.c
// Performs the following calculation inplace:
//   for j=0:12
//     A=A.*K((indices(:,end-j)-1)*4+indices(:,end),j+1); 
//   end
//   A = 4^13 x 1 complex double
//   K = 16 x 13 complex double
//   indices = 4^13 x 13 uint8
// Inplace calculation!
// Building:  mex AKindices.c
// Calling Syntax:  AKindices(A,K,indices);
// Programmer: James Tursa
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    double Apr_k;
    double *Apr, *Api, *Kpr, *Kpi;
    unsigned char *Ipr, *Ipr_end, *Ipr_start;
    mwSize i, j, k, n;
    
    n = mxGetNumberOfElements(prhs[0]);
    Apr = mxGetPr(prhs[0]);
    Api = mxGetPi(prhs[0]);
    Kpr = mxGetPr(prhs[1]);
    Kpi = mxGetPi(prhs[1]);
    Ipr = mxGetData(prhs[2]);
    Ipr_end = Ipr + n * 12;
    
    for( j=0; j<13; j++ ) {
        Ipr_start = Ipr + n*(12-j);
        for( k=0; k<n; k++ ) {
            i = (Ipr_start[k] - 1) * 4 + Ipr_end[k] + j * 16 - 1;
            Apr_k  = Apr[k] * Kpr[i] - Api[k] * Kpi[i];
            Api[k] = Apr[k] * Kpi[i] + Api[k] * Kpr[i];
            Apr[k] = Apr_k;
        }
    }
}

--------------------------------------------------------------------

The above is not necessarily optimized for memory access, but it should run quite a bit faster than what you are doing now.

The next level of optimization would be to do as Roger suggested and calculate indices in the mex routine on the fly (I haven't done that in the above code).

And to get even fancier it would not be hard at all to multi-thread the C code above using OpenMP.


James Tursa