Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
New challenge! Minimizing the amount of 'unnecessary memory' !

Subject: New challenge! Minimizing the amount of 'unnecessary memory' !

From: Nike Dattani

Date: 11 Mar, 2011 01:41:04

Message: 1 of 15

Hello everyone !

I didn't have much luck the only other time I tried to use this newsgroup, but
I'm having a problem with getting the line below to compute without using 'unnecessary' memory.
===============
A is a 4^13 by 1 , complex valued, double precision, vector (1.07GB)

indices is a 4^13 by 13, unsigned 8 bit integer matrix (0.87GB)

indices=
[ 1 1 1 1 1 1 1 1 1 1 1 1 1;
  1 1 1 1 1 1 1 1 1 1 1 1 2;
  1 1 1 1 1 1 1 1 1 1 1 1 3;
  1 1 1 1 1 1 1 1 1 1 1 1 4;
  1 1 1 1 1 1 1 1 1 1 1 1 2;
...
  4 4 4 4 4 4 4 4 4 4 4 4 4]

which is easily computed using Matt Fig's function from the File Exchange: npermutek(1:4,13);

K is a 16 by 13 complex valued, double precision, matrix. (negligibile memory)
===============

Now I am running the line below in a loop over j from 0 to 12:

A=A.*K((indices(:,end-j)-1)*4+indices(:,end),j+1);

===============

This line is using up much more memory than I would expect for it to!

indices(:,end-j) and indices(:,end) are both 4^13 by 1 unsigned 8 bit intiger vectors (134MB in total).

I understand that these two vectors have to be created internally by matlab in order for the addition operation to happen, even if they aren't saved in the workspace. But once that addition is completed, those temporary arrays should be deleted and we should have the vector: (indices(:,end-j)-1)*4+indices(:,end) , which is only 67MB , stored temporarily. Then K((indices(:,end-j)-1)*4+indices(:,end),j+1) should be stored temprarily in memory, which would be another 1.07GB in addition to that 67MB.

So in total my matlab session should be using, in addition to what it was already storing, around 1.07GB [size of 'A'] + 0.87GB [size of 'indices'] + 67MB [size of temporary vector(indices(:,end-j)-1)*4+indices(:,end) ] + 1.07GB [size of temporary vector K((indices(:,end-j)-1)*4+indices(:,end))] which should only take up about 3.08GB more than what it was originally using.

What doesn't make sense to me is that about 4.8GB , much more than 1GB more than the expected 3.08GB, is being used during the computation of that line (which I can see by watching the 'memory allocated to matlab' increase in Windows Task manager, and then decrease back to around 2GB [the size of A plus the size of indices] after this is done).

What's even stranger is that even if I define a new temprary variable:
temp=((indices(:,end-j)-1)*4+indices(:,end),j+1);

and now write down the total memory allocated to matlab, and then apply the command:
A.*temp;

the amount of memory used by matlab shoots up by more than 1GB during this computation, then goes back down. what it started as. Why would A.*temp require SO much more memory, which appears to be the amount of memory that A and temp themselves take ??? I thought the whole point of having them in memory is so that we can do basic operations like multiplication very easily without having to dubplicate them !


Okay, I've said enough about the memory problem, but ultimately I'd be very, very happy if I could get the line A=A.*K((indices(:,end-j)-1)*4+indices(:,end),j+1); replaced by something that doesn't use much more memory than the amount of space taken by A and by K. Hopefully this would be possible, for example, with BSXFUN, without slowing down the multiplcation speed very much, but at this point ... if there's a way to do it that takes a long time but saves on memory, I'd love it !

Any thoughts ?
Thanks !!!

Subject: New challenge! Minimizing the amount of 'unnecessary memory' !

From: James Tursa

Date: 11 Mar, 2011 02:51:05

Message: 2 of 15

"Nike Dattani" <dattani.nike@gmail.com> wrote in message <ilbujg$rbl$1@fred.mathworks.com>...
>
> A is a 4^13 by 1 , complex valued, double precision, vector (1.07GB)
>
> indices is a 4^13 by 13, unsigned 8 bit integer matrix (0.87GB)
>
(snip)
>
> Now I am running the line below in a loop over j from 0 to 12:
>
> A=A.*K((indices(:,end-j)-1)*4+indices(:,end),j+1);
>
(snip)
>
> Okay, I've said enough about the memory problem, but ultimately I'd be very, very happy if I could get the line A=A.*K((indices(:,end-j)-1)*4+indices(:,end),j+1); replaced by something that doesn't use much more memory than the amount of space taken by A and by K. Hopefully this would be possible, for example, with BSXFUN, without slowing down the multiplcation speed very much, but at this point ... if there's a way to do it that takes a long time but saves on memory, I'd love it !
>
> Any thoughts ?
> Thanks !!!

One of the things MATLAB is not very good at is working with array slices, since that typically will cause data copies to take place. You can often get around that with a custom mex routine. Is the line above the *only* such memory problem in your code, or are there others? And would you be open to a mex solution?

James Tursa

Subject: New challenge! Minimizing the amount of 'unnecessary memory' !

From: Roger Stafford

Date: 11 Mar, 2011 06:37:04

Message: 3 of 15

"Nike Dattani" <dattani.nike@gmail.com> wrote in message <ilbujg$rbl$1@fred.mathworks.com>...
> ... if there's a way to do it that takes a long time but saves on memory, I'd love it !

  If I were you, Nike, I would seriously consider doing without the 'indices' array. Its structure is comparatively simple and you should be able to carry out the equivalent of the A = A.*K(..j..) computation using a for-loop to generate the necessary indices in K, one index at a time. That way A would be the only large array involved. Of course you would suffer a serious increase in computation time, but you seem resigned to that possibility.

  As to how the necessary indices could be generated, for example, for j equal to 0, the K row indices have a repeating pattern with a period of four elements: 1,6,11,16,1,6,11,16, .... or for j equal to 1 it would be 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,1,2,3,... repeating every 16 elements. At the other extreme with j equal to 12, it is a repeating pattern: 1,2,3,4,1,2,3,4,... for 4^12 elements, followed by 5,6,7,8,5,6,7,8,.... for another 4^12, etc. Such things are comparatively easy to compute on the fly rather than storing such monstrosities in arrays. It just takes longer but requires a lot less memory.

  This possibility seems even more attractive if you follow James' advice and prepare a good mex file to do this.

Roger Stafford

Subject: New challenge! Minimizing the amount of 'unnecessary memory' !

From: Roger Stafford

Date: 11 Mar, 2011 08:33:05

Message: 4 of 15

"Roger Stafford" wrote in message <ilcfug$30o$1@fred.mathworks.com>...
> "Nike Dattani" <dattani.nike@gmail.com> wrote in message <ilbujg$rbl$1@fred.mathworks.com>...
> > ... if there's a way to do it that takes a long time but saves on memory, I'd love it !
>
> If I were you, Nike, I would seriously consider doing without the 'indices' array. Its structure is comparatively simple and you should be able to carry out the equivalent of the A = A.*K(..j..) computation using a for-loop to generate the necessary indices in K, one index at a time. That way A would be the only large array involved. Of course you would suffer a serious increase in computation time, but you seem resigned to that possibility.
>
> As to how the necessary indices could be generated, for example, for j equal to 0, the K row indices have a repeating pattern with a period of four elements: 1,6,11,16,1,6,11,16, .... or for j equal to 1 it would be 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,1,2,3,... repeating every 16 elements. At the other extreme with j equal to 12, it is a repeating pattern: 1,2,3,4,1,2,3,4,... for 4^12 elements, followed by 5,6,7,8,5,6,7,8,.... for another 4^12, etc. Such things are comparatively easy to compute on the fly rather than storing such monstrosities in arrays. It just takes longer but requires a lot less memory.
>
> This possibility seems even more attractive if you follow James' advice and prepare a good mex file to do this.
>
> Roger Stafford
- - - - - - - - - -
  To illustrate what I meant by the above, you could do this:

 for k = 1:4^13
  A(k) = A(k)*K(4*mod(floor((k-1)/4^j),4)+mod(k-1,4)+1,j+1);
 end

to replace

 A=A.*K((indices(:,end-j)-1)*4+indices(:,end),j+1);

and there would then be no need for 'indices'. (There is no problem of round off errors in the division by 4^j above since this is an exact power of two.)

Roger Stafford

Subject: New challenge! Minimizing the amount of 'unnecessary memory' !

From: Nike Dattani

Date: 12 Mar, 2011 17:53:06

Message: 5 of 15

Thanks very much James and Roger for being so helpful in assisting me with this sticky point !!

"James Tursa" wrote in message <ilc2mo$k41$1@fred.mathworks.com>...
> One of the things MATLAB is not very good at is working with array slices, since that typically will cause data copies to take place.

Thanks very much James !

I understand that the command: K(indices(:,1))
 temporarily creates a copy of the array: indices(:,1)

But in my analysis I accounted for all of these 'data copies' and was still very dissatisfied with the amount of memory being used.

Particularly,
When I apply the command:
temp=((indices(:,end-j)-1)*4+indices(:,end),j+1);

Matlab is now using 5GB of memory in total. Then if I watch the 'memory used by matlab' in Windows Task Manager while doing:

A.*temp;

The memory used shoots up to 6GB during that multiplication, then goes back to 5GB.

It's this short-lived jump that's making my program unusable for larger test cases where we're the short-lived jump is between something like 60GB and 70GB (I have 64GB of RAM). Does it even make any sense that more memory needs to be used in order to multiply two numbers that are already stored in memory ?? !!

Even if somehow using more memory allows for a more efficient multiplication algorithm, I don't see why we can't force matlab to do the multiplication without using any more memory .. even if it's at the sacrifice of speed !

==========

>You can often get around that with a custom mex routine. Is the line above the *only* such memory problem in your code, or are there others? And would you be open to a mex solution?

This line, and other lines which resemble it very closely, are certainly the main sources of the memory problem. I am certainly open to a mex solution, although I've never made a MEX before since I'm not well versed in FORTRAN or C. I was hoping the multiplication could be done without array slicing (hence without making extra copies) by using the BSXFUN or something, since all the data is already there, but certain elements are multiplied repeatedly in a regular pattern.

==========

"Roger Stafford" wrote in message <ilcmo1$j2t$1@fred.mathworks.com>...
> To illustrate what I meant by the above, you could do this:
>
> for k = 1:4^13
> A(k) = A(k)*K(4*mod(floor((k-1)/4^j),4)+mod(k-1,4)+1,j+1);
> end
>
> to replace
>
> A=A.*K((indices(:,end-j)-1)*4+indices(:,end),j+1);
>
> and there would then be no need for 'indices'. (There is no problem of round off errors in the division by 4^j above since this is an exact power of two.)
>


That is a Brilliant idea Roger !

There's a reason why I didn't want to resort to remove the indices array:
In one version of my code I like to remove the following elements A(abs(A) < 1e-8)=[] ; indices(abs(A) < 1e-8))=[] ;
at each iteration of the algorithm, to make the A array smaller and to keep 'A' and 'indices' the same length. But I suppose such 'filtering' of indices can be done in your for loop by defining an array that contains integers between 1 and 4^13 specifying which indices to include in the forloop, and then looping over those integers.

However, my intuition is that implementing such a forloop in FORTRAN, even after compiling it to machine code, would be slower than doing array-wise multiplication in matlab, or using built-in matlab functions like BSXFUN to do repeated multiplications. Does your intuition agree with mine here ??

I only want to sacrifice speed as a last resort! And I've heard that BSXFUN can do repeated multiplications like this without requiring the equivalent of my 'indices' array.

Subject: New challenge! Minimizing the amount of 'unnecessary memory' !

From: James Tursa

Date: 13 Mar, 2011 01:43:04

Message: 6 of 15

"Nike Dattani" <dattani.nike@gmail.com> wrote in message <ilbujg$rbl$1@fred.mathworks.com>...
>
> I didn't have much luck the only other time I tried to use this newsgroup, but
> I'm having a problem with getting the line below to compute without using 'unnecessary' memory.
> ===============
> A is a 4^13 by 1 , complex valued, double precision, vector (1.07GB)
>
> indices is a 4^13 by 13, unsigned 8 bit integer matrix (0.87GB)
>
> indices=
> [ 1 1 1 1 1 1 1 1 1 1 1 1 1;
> 1 1 1 1 1 1 1 1 1 1 1 1 2;
> 1 1 1 1 1 1 1 1 1 1 1 1 3;
> 1 1 1 1 1 1 1 1 1 1 1 1 4;
> 1 1 1 1 1 1 1 1 1 1 1 1 2;
> ...
> 4 4 4 4 4 4 4 4 4 4 4 4 4]
>
> which is easily computed using Matt Fig's function from the File Exchange: npermutek(1:4,13);
>
> K is a 16 by 13 complex valued, double precision, matrix. (negligibile memory)
> ===============
>
> Now I am running the line below in a loop over j from 0 to 12:
>
> A=A.*K((indices(:,end-j)-1)*4+indices(:,end),j+1);
>
> ===============

Here is an example of a mex routine that does the above calculation. It works in-place and takes the A, K, and indices arrays as inputs. This is bare bones with no argument checking, but it does illustrate how simply the above can be done without creating a lot of intermediate arrays and unnecessary data movement.

---------------------------------------------------

// File: AKindices.c
// Performs the following calculation inplace:
// for j=0:12
// A=A.*K((indices(:,end-j)-1)*4+indices(:,end),j+1);
// end
// A = 4^13 x 1 complex double
// K = 16 x 13 complex double
// indices = 4^13 x 13 uint8
// Inplace calculation!
// Building: mex AKindices.c
// Calling Syntax: AKindices(A,K,indices);
// Programmer: James Tursa
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    double Apr_k;
    double *Apr, *Api, *Kpr, *Kpi;
    unsigned char *Ipr, *Ipr_end, *Ipr_start;
    mwSize i, j, k, n;
    
    n = mxGetNumberOfElements(prhs[0]);
    Apr = mxGetPr(prhs[0]);
    Api = mxGetPi(prhs[0]);
    Kpr = mxGetPr(prhs[1]);
    Kpi = mxGetPi(prhs[1]);
    Ipr = mxGetData(prhs[2]);
    Ipr_end = Ipr + n * 12;
    
    for( j=0; j<13; j++ ) {
        Ipr_start = Ipr + n*(12-j);
        for( k=0; k<n; k++ ) {
            i = (Ipr_start[k] - 1) * 4 + Ipr_end[k] + j * 16 - 1;
            Apr_k = Apr[k] * Kpr[i] - Api[k] * Kpi[i];
            Api[k] = Apr[k] * Kpi[i] + Api[k] * Kpr[i];
            Apr[k] = Apr_k;
        }
    }
}

--------------------------------------------------------------------

The above is not necessarily optimized for memory access, but it should run quite a bit faster than what you are doing now.

The next level of optimization would be to do as Roger suggested and calculate indices in the mex routine on the fly (I haven't done that in the above code).

And to get even fancier it would not be hard at all to multi-thread the C code above using OpenMP.


James Tursa

Subject: New challenge! Minimizing the amount of 'unnecessary memory' !

From: James Tursa

Date: 13 Mar, 2011 19:04:04

Message: 7 of 15

"James Tursa" wrote in message <ilh7f8$l2l$1@fred.mathworks.com>...
> "Nike Dattani" <dattani.nike@gmail.com> wrote in message <ilbujg$rbl$1@fred.mathworks.com>...
> >
> > I didn't have much luck the only other time I tried to use this newsgroup, but
> > I'm having a problem with getting the line below to compute without using 'unnecessary' memory.
> > ===============
> > A is a 4^13 by 1 , complex valued, double precision, vector (1.07GB)
> >
> > indices is a 4^13 by 13, unsigned 8 bit integer matrix (0.87GB)
> >
> > indices=
> > [ 1 1 1 1 1 1 1 1 1 1 1 1 1;
> > 1 1 1 1 1 1 1 1 1 1 1 1 2;
> > 1 1 1 1 1 1 1 1 1 1 1 1 3;
> > 1 1 1 1 1 1 1 1 1 1 1 1 4;
> > 1 1 1 1 1 1 1 1 1 1 1 1 2;
> > ...
> > 4 4 4 4 4 4 4 4 4 4 4 4 4]
> >
> > which is easily computed using Matt Fig's function from the File Exchange: npermutek(1:4,13);
> >
> > K is a 16 by 13 complex valued, double precision, matrix. (negligibile memory)
> > ===============
> >
> > Now I am running the line below in a loop over j from 0 to 12:
> >
> > A=A.*K((indices(:,end-j)-1)*4+indices(:,end),j+1);
> >
> > ===============

Another slightly different mex routine with the for loops switched, so A is dragged through the high level cache only once. But the access for indices may be worse. Not sure how much this will affect timing, but you can give it a shot.

-----------------------------------------------------------------------------------------------------------------

// File: AKindices2.c
// Performs the following calculation inplace:
// for j=0:12
// A=A.*K((indices(:,end-j)-1)*4+indices(:,end),j+1);
// end
// A = 4^13 x 1 complex double
// K = 16 x 13 complex double
// indices = 4^13 x 13 uint8
// Inplace calculation!
// Building: mex AKindices2.c
// Calling Syntax: AKindices2(A,K,indices);
// Programmer: James Tursa
#include "mex.h"
#define mwSize int
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    double Apr_k;
    double *Apr, *Api, *Kpr, *Kpi;
    unsigned char *Ipr, *Ipr_end, *Ipr_start;
    mwSize i, j, k, n;
    
    n = mxGetNumberOfElements(prhs[0]);
    Apr = mxGetPr(prhs[0]);
    Api = mxGetPi(prhs[0]);
    Kpr = mxGetPr(prhs[1]);
    Kpi = mxGetPi(prhs[1]);
    Ipr = mxGetData(prhs[2]);
    Ipr_end = Ipr + n * 12;
    
    for( k=0; k<n; k++ ) {
        for( j=0; j<13; j++ ) {
            Ipr_start = Ipr + n*(12-j);
            i = (Ipr_start[k] - 1) * 4 + Ipr_end[k] + j * 16 - 1;
            Apr_k = Apr[k] * Kpr[i] - Api[k] * Kpi[i];
            Api[k] = Apr[k] * Kpi[i] + Api[k] * Kpr[i];
            Apr[k] = Apr_k;
        }
    }
}

Subject: New challenge! Minimizing the amount of 'unnecessary memory' !

From: James Tursa

Date: 13 Mar, 2011 19:32:05

Message: 8 of 15

"Nike Dattani" <dattani.nike@gmail.com> wrote in message <ilbujg$rbl$1@fred.mathworks.com>...
>
> What's even stranger is that even if I define a new temprary variable:
> temp=((indices(:,end-j)-1)*4+indices(:,end),j+1);
>
> and now write down the total memory allocated to matlab, and then apply the command:
> A.*temp;
>
> the amount of memory used by matlab shoots up by more than 1GB during this computation, then goes back down. what it started as. Why would A.*temp require SO much more memory, which appears to be the amount of memory that A and temp themselves take ??? I thought the whole point of having them in memory is so that we can do basic operations like multiplication very easily without having to dubplicate them !

FYI, the A.*temp calculation requires that temp first be converted from uint8 to double, hence the big memory increase necessary for the calculation.

James Tursa

Subject: New challenge! Minimizing the amount of 'unnecessary memory' !

From: Nike Dattani

Date: 14 Mar, 2011 02:39:04

Message: 9 of 15

"James Tursa" wrote in message <ilj63l$kr6$1@fred.mathworks.com>...
> "Nike Dattani" <dattani.nike@gmail.com> wrote in message <ilbujg$rbl$1@fred.mathworks.com>...
> >
> > What's even stranger is that even if I define a new temprary variable:
> > temp=((indices(:,end-j)-1)*4+indices(:,end),j+1);
> >
> > and now write down the total memory allocated to matlab, and then apply the command:
> > A.*temp;
> >
> > the amount of memory used by matlab shoots up by more than 1GB during this computation, then goes back down. what it started as. Why would A.*temp require SO much more memory, which appears to be the amount of memory that A and temp themselves take ??? I thought the whole point of having them in memory is so that we can do basic operations like multiplication very easily without having to dubplicate them !
>
> FYI, the A.*temp calculation requires that temp first be converted from uint8 to double, hence the big memory increase necessary for the calculation.
>
> James Tursa

Hi James !
Wow !! Thanks so much for taking the time to write those MEX routines !
That's incredibly kind, and I really appreciate it !

I was actually about to try writing a MEX routine in FORTRAN since array slicing can be done in FORTRAN , and that way I could hopefully avoid the for loops and do something that more closesly resembles: A.*K((indices(:,end-j)-1)*4+indices(:,end),j+1)

My intuition is that doing array multiplications is faster than doing forloops .. but I'm often wrong about these things.
==============
Also,
I made a crucial typo in both of my posts:
temp=((indices(:,end-j)-1)*4+indices(:,end),j+1);

should actually read:
temp=K((indices(:,end-j)-1)*4+indices(:,end),j+1);

so
A.*temp is exactly what I'm calculating, which is:
A.*K((indices(:,end-j)-1)*4+indices(:,end),j+1);

So everything is already double precision.
But now I'm not able to reproduce the behaviour I noticed before ..
Maybe Matlab isn't using up so much more RAM than necessary after all!

Subject: New challenge! Minimizing the amount of 'unnecessary memory' !

From: James Tursa

Date: 14 Mar, 2011 05:49:04

Message: 10 of 15

"Nike Dattani" <dattani.nike@gmail.com> wrote in message <iljv48$pkf$1@fred.mathworks.com>...
>
> I was actually about to try writing a MEX routine in FORTRAN since array slicing can be done in FORTRAN , and that way I could hopefully avoid the for loops ...

Array slicing in Fortran simply means that the compiler will write the for loops (or do loops) in the background for you, and will most likely do it in a manner that is efficient for memory access. But it doesn't necessarily mean that it will run any faster than if you write the for loops (or do loops) manually.

James Tursa

Subject: New challenge! Minimizing the amount of 'unnecessary memory' !

From: James Tursa

Date: 14 Mar, 2011 14:56:05

Message: 11 of 15

"James Tursa" wrote in message <ilj4f4$757$1@fred.mathworks.com>...
>
> Another slightly different mex routine with the for loops switched ...
(snip)
> #define mwSize int

Delete the above define.

James Tursa

Subject: New challenge! Minimizing the amount of 'unnecessary memory' !

From: Nike Dattani

Date: 24 Mar, 2011 01:50:19

Message: 12 of 15

Hello!
Thanks once again for your very useful MEX routines!

> Array slicing in Fortran simply means that the compiler will write the for loops (or do loops) in the background for you, and will most likely do it in a manner that is efficient for memory access. But it doesn't necessarily mean that it will run any faster than if you write the for loops (or do loops) manually.
>
> James Tursa

Thanks very much for your insight!
If that's true I should probably just have done the whole thing using for loops (like the one Roger Stafford suggested in his 2nd post) right from the beginning. The reason why I was working with arrays (which of course take up much more memory than Roger Stafford's for loop) was that I always thought that:

A=A.*B

Is much faster than:

for i from 1:4^13
A(i)=A(i)*B(i)
end

since the internal funciton for point-wise multiplication of arrays would hopefully have been highly optimized for memory management, multi-threading, multiplying several rows in parallel rather than one by one, etc ..
In principle a compiler could try to optimize the for loop similarly, but since every for loop is slightly different, I would be surprised if doing array-wise multiplication wouldn't be faster.

Has anyone done a study with a benchmark comparison of these two approaches ?
 

Subject: New challenge! Minimizing the amount of 'unnecessary memory' !

From: James Tursa

Date: 24 Mar, 2011 03:22:05

Message: 13 of 15

"Nike Dattani" <dattani.nike@gmail.com> wrote in message <ime80q$ndv$1@fred.mathworks.com>...
> Hello!
> Thanks once again for your very useful MEX routines!
>
> > Array slicing in Fortran simply means that the compiler will write the for loops (or do loops) in the background for you, and will most likely do it in a manner that is efficient for memory access. But it doesn't necessarily mean that it will run any faster than if you write the for loops (or do loops) manually.
> >
> > James Tursa
>
> Thanks very much for your insight!
> If that's true I should probably just have done the whole thing using for loops (like the one Roger Stafford suggested in his 2nd post) right from the beginning. The reason why I was working with arrays (which of course take up much more memory than Roger Stafford's for loop) was that I always thought that:
>
> A=A.*B
>
> Is much faster than:
>
> for i from 1:4^13
> A(i)=A(i)*B(i)
> end
>
> since the internal funciton for point-wise multiplication of arrays would hopefully have been highly optimized for memory management, multi-threading, multiplying several rows in parallel rather than one by one, etc ..
> In principle a compiler could try to optimize the for loop similarly, but since every for loop is slightly different, I would be surprised if doing array-wise multiplication wouldn't be faster.
>
> Has anyone done a study with a benchmark comparison of these two approaches ?
>

Well, now I am not sure where you are going with this. Whole array operations in MATLAB can be pretty fast. Array *slice* operations in MATLAB can be pretty slow, especially in a for loop. Doing array operations in C or Fortran using loops can be pretty fast. Doing array operations in Fortran using whole array or array slice notation can also be pretty fast depending on the compiler and how you are doing the slices. I have given you the C code for what you want, and I would expect this will beat anything you can come up with on the MATLAB side, potentially by quite a bit. What more do you want at this point? Do you have other code that has the same issues, or what?

James Tursa

Subject: New challenge! Minimizing the amount of 'unnecessary memory' !

From: Nike Dattani

Date: 26 Mar, 2011 03:58:04

Message: 14 of 15

> Well, now I am not sure where you are going with this. Whole array operations in MATLAB can be pretty fast. Array *slice* operations in MATLAB can be pretty slow, especially in a for loop. Doing array operations in C or Fortran using loops can be pretty fast. Doing array operations in Fortran using whole array or array slice notation can also be pretty fast depending on the compiler and how you are doing the slices. I have given you the C code for what you want, and I would expect this will beat anything you can come up with on the MATLAB side, potentially by quite a bit. What more do you want at this point? Do you have other code that has the same issues, or what?
>
> James Tursa

Thanks very muh for your C code! I agree wtih you that it is likely to beat my current matlab code by quite a large margin (I'll let you know how it works! I just have to modify it a few times because a few different variants of this line appear in my program). Thanks also for informing me that slice operations in MATLAB can be pretty slow. That's very useful to know. In my last post I was just mentioning why I was surprised to hear that using FORTRAN's array slicing wouldn't be much faster than writing the for loop manually and then compiling it, and I was just wondering if anyone on the newsgroup had seen some benchmark tests comparing the two.

But at this point, your C code is a very good solution and I'm not asking for more =)

Subject: New challenge! Minimizing the amount of 'unnecessary memory' !

From: James Tursa

Date: 26 Mar, 2011 05:23:04

Message: 15 of 15

"Nike Dattani" <dattani.nike@gmail.com> wrote in message <imjo8c$b56$1@fred.mathworks.com>...
> > Well, now I am not sure where you are going with this. Whole array operations in MATLAB can be pretty fast. Array *slice* operations in MATLAB can be pretty slow, especially in a for loop. Doing array operations in C or Fortran using loops can be pretty fast. Doing array operations in Fortran using whole array or array slice notation can also be pretty fast depending on the compiler and how you are doing the slices. I have given you the C code for what you want, and I would expect this will beat anything you can come up with on the MATLAB side, potentially by quite a bit. What more do you want at this point? Do you have other code that has the same issues, or what?
> >
> > James Tursa
>
> Thanks very much for your C code! I agree wtih you that it is likely to beat my current matlab code by quite a large margin (I'll let you know how it works! I just have to modify it a few times because a few different variants of this line appear in my program). Thanks also for informing me that slice operations in MATLAB can be pretty slow. That's very useful to know. In my last post I was just mentioning why I was surprised to hear that using FORTRAN's array slicing wouldn't be much faster than writing the for loop manually and then compiling it, and I was just wondering if anyone on the newsgroup had seen some benchmark tests comparing the two.
>
> But at this point, your C code is a very good solution and I'm not asking for more =)

In C or Fortran, if you pay attention to memory access patterns and write your loops accordingly, a good optimizing compiler can multi-thread the loops behind the scenes and you can get very good performance, often the same as an equivalent array slice operation. E.g., Intel Fortran and Microsoft Visual C are excellent optimizing compilers that will do this for you. I have seen cases where MSVC beat the LCC compiler that ships with MATLAB by a factor of > 10x. If you write crappy loops that have lousy memory access patterns then no optimizing compiler is going to be able to save you ... you will get crappy timing performance. The advantage of doing array slice operations in Fortran is that you can count on a good compiler doing efficient memory access patterns in the background for you, but you have to be careful because in some cases it will create large temporary variables in the
background (often off of the stack) and you can blow the memory and bomb the program. That is where it is handy to write your own loops instead, but only if you know what you are doing. When you do a whole array operation in MATLAB it essentially can call background C/C++ or Fortran code to do the work ... and the people who wrote that code knew what there were doing and used efficient memory access patterns and multi-threading. But when you do an array slice operation in MATLAB it has to first create the array slice off to the side, meaning a potentially large data copy, and then do the operation, and then probably another data copy to get it back into the result variable slice. That takes a lot of extra time, especially if it is in a loop. Always avoid large array slice operations in loops in MATLAB if you can avoid it because you can easily spend 99% of your time moving data around
and only 1% of your time doing the actual work. That is where C/C++ or Fortran can save you in a mex routine.

James Tursa

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us