Path: news.mathworks.com!not-for-mail
From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Re: New challenge! Minimizing the amount of 'unnecessary memory' !
Date: Sun, 13 Mar 2011 19:04:04 +0000 (UTC)
Organization: Boeing Co
Lines: 73
Message-ID: <ilj4f4$757$1@fred.mathworks.com>
References: <ilbujg$rbl$1@fred.mathworks.com> <ilh7f8$l2l$1@fred.mathworks.com>
Reply-To: <HIDDEN>
NNTP-Posting-Host: www-03-blr.mathworks.com
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1300043044 7335 172.30.248.48 (13 Mar 2011 19:04:04 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Sun, 13 Mar 2011 19:04:04 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 756104
Xref: news.mathworks.com comp.soft-sys.matlab:715554

"James Tursa" wrote in message <ilh7f8$l2l$1@fred.mathworks.com>...
> "Nike Dattani" <dattani.nike@gmail.com> wrote in message <ilbujg$rbl$1@fred.mathworks.com>...
> > 
> > I didn't have much luck the only other time I tried to use this newsgroup, but 
> > I'm having a problem with getting the line below to compute without using 'unnecessary' memory.
> > ===============
> > A is a 4^13 by 1 , complex valued, double precision, vector (1.07GB)
> > 
> > indices is a 4^13 by 13, unsigned 8 bit integer matrix (0.87GB)
> > 
> > indices=
> > [ 1 1 1 1 1 1 1 1 1 1 1 1 1;
> >   1 1 1 1 1 1 1 1 1 1 1 1 2;
> >   1 1 1 1 1 1 1 1 1 1 1 1 3;
> >   1 1 1 1 1 1 1 1 1 1 1 1 4;
> >   1 1 1 1 1 1 1 1 1 1 1 1 2;
> > ...
> >   4 4 4 4 4 4 4 4 4 4 4 4 4]
> > 
> > which is easily computed using Matt Fig's function from the File Exchange: npermutek(1:4,13);
> > 
> > K is a 16 by 13 complex valued, double precision, matrix. (negligibile memory)
> > ===============
> > 
> > Now I am running the line below in a loop over j from 0 to 12:
> > 
> > A=A.*K((indices(:,end-j)-1)*4+indices(:,end),j+1); 
> > 
> > ===============

Another slightly different mex routine with the for loops switched, so A is dragged through the high level cache only once. But the access for indices may be worse. Not sure how much this will affect timing, but you can give it a shot.

-----------------------------------------------------------------------------------------------------------------

// File: AKindices2.c
// Performs the following calculation inplace:
//   for j=0:12
//     A=A.*K((indices(:,end-j)-1)*4+indices(:,end),j+1); 
//   end
//   A = 4^13 x 1 complex double
//   K = 16 x 13 complex double
//   indices = 4^13 x 13 uint8
// Inplace calculation!
// Building:  mex AKindices2.c
// Calling Syntax:  AKindices2(A,K,indices);
// Programmer: James Tursa
#include "mex.h"
#define mwSize int
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    double Apr_k;
    double *Apr, *Api, *Kpr, *Kpi;
    unsigned char *Ipr, *Ipr_end, *Ipr_start;
    mwSize i, j, k, n;
    
    n = mxGetNumberOfElements(prhs[0]);
    Apr = mxGetPr(prhs[0]);
    Api = mxGetPi(prhs[0]);
    Kpr = mxGetPr(prhs[1]);
    Kpi = mxGetPi(prhs[1]);
    Ipr = mxGetData(prhs[2]);
    Ipr_end = Ipr + n * 12;
    
    for( k=0; k<n; k++ ) {
        for( j=0; j<13; j++ ) {
            Ipr_start = Ipr + n*(12-j);
            i = (Ipr_start[k] - 1) * 4 + Ipr_end[k] + j * 16 - 1;
            Apr_k  = Apr[k] * Kpr[i] - Api[k] * Kpi[i];
            Api[k] = Apr[k] * Kpi[i] + Api[k] * Kpr[i];
            Apr[k] = Apr_k;
        }
    }
}