Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
growing array in Matlab?

Subject: growing array in Matlab?

From: Stephen Vavasis

Date: 17 Aug, 2005 11:16:22

Message: 1 of 32

A well-known issue with Matlab is that it performs badly if you repeatedly
add elements to the end of an array like this:

a = [];
while <condition>
  a = [a; sin(x)];
end

 (run time is quadratic -- see the timings at the end of this email). If
you know in advance how large a should be, then you can simply declare it
before the loop: a = zeros(total_a_size,1). But if you don't know the size,
you can use the following technique to implement arrays that grow
exponentially:

a = [];
a_size = 0;
while <condition>
  a_size = a_size + 1;
  if a_size > length(a)
    a = [a;zeros(floor(0.4*a_size) + 4,1)];
  end
  a(a_size) = sin(x);
end
a = a(1:a_size);

This performs much better (see timings below).

The question is how to do this cleanly. Is it possible to write a class
that implements exponentially growing arrays efficiently? I have an
application that needs about ten such arrays.

Thanks,
Steve Vavasis


function a = testappend(n)
numtrial = 4;
avetime = 0;
for j = 1 : numtrial
  tic
  a = [];
  for j = 1 : n
    a = [a;sin(j)];
  end
  avetime = avetime + toc;
end
avetime = avetime / numtrial


function a = testappend2(n)
numtrial = 4;
avetime = 0;
for j = 1 : numtrial
  tic
  a = [];
  a_size = 0;
  for j = 1 : n
    a_size = j;
    if length(a) < a_size
      a = [a;zeros(max(a_size-length(a),floor(length(a)*.4))+4,1)];
      a(j) = sin(j);
    end
  end
  a = a(1:a_size);
  avetime = avetime + toc;
end
avetime = avetime / numtrial


>> testappend(4000);
avetime =
  1.2900e-001
>> testappend(8000);
avetime =
  6.9950e-001
>> testappend(16000);
avetime =
  2.3725e+000
>> testappend(32000);
avetime =
  1.1506e+001


>> testappend2(4000);
avetime =
  3.7500e-003
>> testappend2(8000);
avetime =
  7.7500e-003
>> testappend2(16000);
avetime =
  1.5500e-002
>> testappend2(32000);
avetime =
  5.4750e-002

Subject: growing array in Matlab?

From: Steve Amphlett

Date: 17 Aug, 2005 11:33:13

Message: 2 of 32

Stephen Vavasis wrote:
>
>
> A well-known issue with Matlab is that it performs badly if you
> repeatedly
> add elements to the end of an array like this:
>
> a = [];
> while <condition>
> a = [a; sin(x)];
> end
>
> (run time is quadratic -- see the timings at the end of this
> email). If
> you know in advance how large a should be, then you can simply
> declare it
> before the loop: a = zeros(total_a_size,1). But if you don't know
> the size,
> you can use the following technique to implement arrays that grow
> exponentially:
>
> a = [];
> a_size = 0;
> while <condition>
> a_size = a_size + 1;
> if a_size > length(a)
> a = [a;zeros(floor(0.4*a_size) + 4,1)];
> end
> a(a_size) = sin(x);
> end
> a = a(1:a_size);
>
> This performs much better (see timings below).
>
> The question is how to do this cleanly. Is it possible to write a
> class
> that implements exponentially growing arrays efficiently? I have
> an
> application that needs about ten such arrays.
>
> Thanks,
> Steve Vavasis

Have you tried growing a cell array and then when it's full,
cell2mat() the data out? I seem to remember someone saying here that
cell arrays grow well.

Subject: growing array in Matlab?

From: Steve Amphlett

Date: 17 Aug, 2005 11:42:11

Message: 3 of 32

Steve Amphlett wrote:
>
>
<snip, ways to grow arrays...
 
> Have you tried growing a cell array and then when it's full,
> cell2mat() the data out? I seem to remember someone saying here
> that
> cell arrays grow well.

I just slapped togetha test snippet and I think it's a winner:

a = [];
x=1;
tic
for idx=1:16000
  a = [a; sin(x)];
end
toc

clear b
tic
for idx=1:16000
  b(idx) = {sin(x)};
end
c=cell2mat(b);
toc

Elapsed time is 4.422000 seconds.
Elapsed time is 0.922000 seconds.

And if I make x a vecor, the results are even more convincing:

a = [];
x=(1:100)';
tic
for idx=1:1000
  a = [a; sin(x)];
end
toc

clear b
tic
for idx=1:1000
  b(idx) = {sin(x)};
end
c=cell2mat(b);
toc

Elapsed time is 3.843000 seconds.
Elapsed time is 0.078000 seconds.

Subject: growing array in Matlab?

From: Brett Shoelson

Date: 17 Aug, 2005 11:54:08

Message: 4 of 32


"Steve Amphlett" <Firstname.Lastname@where_I_work.com> wrote in message
news:ef11420.1@webx.raydaftYaTP...
> Steve Amphlett wrote:
>>
>>
> <snip, ways to grow arrays...
>
>> Have you tried growing a cell array and then when it's full,
>> cell2mat() the data out? I seem to remember someone saying here
>> that
>> cell arrays grow well.
>
> I just slapped togetha test snippet and I think it's a winner:
>
> a = [];
> x=1;
> tic
> for idx=1:16000
> a = [a; sin(x)];
> end
> toc
>
> clear b
> tic
> for idx=1:16000
> b(idx) = {sin(x)};
> end
> c=cell2mat(b);
> toc
>
> Elapsed time is 4.422000 seconds.
> Elapsed time is 0.922000 seconds.
>
> And if I make x a vecor, the results are even more convincing:
>
> a = [];
> x=(1:100)';
> tic
> for idx=1:1000
> a = [a; sin(x)];
> end
> toc
>
> clear b
> tic
> for idx=1:1000
> b(idx) = {sin(x)};
> end
> c=cell2mat(b);
> toc
>
> Elapsed time is 3.843000 seconds.
> Elapsed time is 0.078000 seconds.


This is interesting. I wouldn't have guessed that growing cell arrays would
have been so different from growing matrices. I'm curious what's behind
this...how are cell arrays stored that make them so much more efficient for
growing???
Any thoughts?
Brett
--
char(cumsum(...
[32 83 -11 7 -10 7 7 -4 -1 -46 40 -3 7 -3 15 -74 64 -5 -1 -58 57 8 7]))

Subject: growing array in Matlab?

From: Steve Amphlett

Date: 17 Aug, 2005 12:02:26

Message: 5 of 32

Brett Shoelson wrote:
>
>
>>
>> a = [];
>> x=(1:100)';
>> tic
>> for idx=1:1000
>> a = [a; sin(x)];
>> end
>> toc
>>
>> clear b
>> tic
>> for idx=1:1000
>> b(idx) = {sin(x)};
>> end
>> c=cell2mat(b);
>> toc
>>
>> Elapsed time is 3.843000 seconds.
>> Elapsed time is 0.078000 seconds.
>
>
> This is interesting. I wouldn't have guessed that growing cell
> arrays would
> have been so different from growing matrices. I'm curious what's
> behind
> this...how are cell arrays stored that make them so much more
> efficient for
> growing???

This is probably complete rubbish, but my best guess is this:

A cell array is a container for a load of pointers to the real data
members, which are scattered randomly around memory. When you grow a
cell array, you are growing it by one pointer rather than by the size
of the new member.

As I said, probably completely wrong, but you never know, it could be
a reasonable description. I'm waiting for one of "The Big Boys" to
pop by and tell us.

Subject: growing array in Matlab?

From: John D'Errico

Date: 17 Aug, 2005 16:53:20

Message: 6 of 32

In article <ddvkd9$7ar$1@ruby.cit.cornell.edu>,
 "Stephen Vavasis" <vavasis@cs.cornell.edu> wrote:

> A well-known issue with Matlab is that it performs badly if you repeatedly
> add elements to the end of an array like this:
>
> a = [];
> while <condition>
> a = [a; sin(x)];
> end
>
> (run time is quadratic -- see the timings at the end of this email). If
> you know in advance how large a should be, then you can simply declare it
> before the loop: a = zeros(total_a_size,1). But if you don't know the size,
> you can use the following technique to implement arrays that grow
> exponentially:

(snip)


> function a = testappend2(n)
> numtrial = 4;
> avetime = 0;
> for j = 1 : numtrial
> tic
> a = [];
> a_size = 0;
> for j = 1 : n
> a_size = j;
> if length(a) < a_size
> a = [a;zeros(max(a_size-length(a),floor(length(a)*.4))+4,1)];
> a(j) = sin(j);
> end
> end
> a = a(1:a_size);
> avetime = avetime + toc;
> end
> avetime = avetime / numtrial

I will note that testappend2 has a couple of serious bugs
in it. You should never use the same loop variable in
nested loops! You would see the problem by observing that
the two functions (testappend and testappend2) have
remarkably different outputs.

This might be a better test code:

function a = testappend2(n)
numtrial = 4;
avetime = 0;
for i = 1 : numtrial
  tic
  a = [];
  a_size = 0;
  for j = 1 : n
    a_size = j;
    if length(a) < a_size
      a = [a;zeros(max(a_size-length(a),floor(length(a)*.4))+4,1)];
    end
    a(j) = sin(j);
  end
  a = a(1:a_size);
  avetime = avetime + toc;
end
avetime = avetime / numtrial



The timings that you provide are not valid due to
those bugs. You might look at the function grow_array.
I wrote it some time ago in an attempt to speed up just
this process, allowing the user to append new chunks
more efficiently. I had also used a block allocation
scheme much like yours for a while, but since some of
my codes had to append many hundreds of thousands of
times I found it also inefficient.

http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=
7490&objectType=file

I used cell arrays in an attempt to speed up the
concatenation process. Note the reviewer's comment
however. Simple use of a cell array followed by
concatenation works better by far. Clearly there was
too much overhead in my code compared to the utter
simplicity of a cell array. Note that the code below
also works quite well, as long as n is not too large.


function a = testappend3(n)
numtrial = 4;
avetime = 0;
for j = 1 : numtrial
  tic
  a = {};
  for j = 1 : n
    a{j} = j;
  end
  a = cat(1,a{:});
  avetime = avetime + toc;
end
avetime = avetime / numtrial

 
Try testappend3(4000) - it is fairly efficient. Now try
testappend3(40000). I assume that concatenation of 40K
cells is the problem here.

As far as the block allocation scheme, the problem here
is handling millions of append operations. You need to
watch out for your array growing so large enough that
it begins to eat up a serious amount of ram. Once you
start to move into virtual memory, all bets are off.
Every time that you append a slew of zeros, the block
append approach needs to re-allocate the entire array.
In all seriousness, this can happen. It did to me, which
is why, for a while, I dropped the block append of zeros
in favor of the approach that grow_array uses.) Grow_array
also has its problems, I'd like to revisit it in order
to find a code that is optimal for any problem size.

John


--
The best material model of a cat is another, or
preferably the same, cat.
A. Rosenblueth, Philosophy of Science, 1945

Subject: growing array in Matlab?

From: Doug Schwarz

Date: 17 Aug, 2005 16:57:53

Message: 7 of 32

In article <ef11420.3@webx.raydaftYaTP>,
 "Steve Amphlett" <Firstname.Lastname@where_I_work.com> wrote:

> Brett Shoelson wrote:
> >
> > This is interesting. I wouldn't have guessed that growing cell
> > arrays would
> > have been so different from growing matrices. I'm curious what's
> > behind
> > this...how are cell arrays stored that make them so much more
> > efficient for
> > growing???
>
> This is probably complete rubbish, but my best guess is this:
>
> A cell array is a container for a load of pointers to the real data
> members, which are scattered randomly around memory. When you grow a
> cell array, you are growing it by one pointer rather than by the size
> of the new member.
>
> As I said, probably completely wrong, but you never know, it could be
> a reasonable description. I'm waiting for one of "The Big Boys" to
> pop by and tell us.

I don't know if I qualify as a "Big Boy", but your guess is almost
certainly 100% accurate.

Obviously, in addition to adding a pointer to the list you have to
allocate some memory for the cell contents, but that memory need not be
contiguous with any other block of memory. When you grow a matrix you
have to allocate space for the new larger matrix *and then copy all the
old matrix into the new location* before adding the new data. Finally,
you deallocate the old matrix.

Also, the array of pointers to the cells might be a linked list so you
don't have to reallocate an array of cell pointers when you grow the
cell array. Or not.

--
Doug Schwarz
dmschwarz&urgrad,rochester,edu
Make obvious changes to get real email address.

Subject: growing array in Matlab?

From: John D'Errico

Date: 17 Aug, 2005 17:04:53

Message: 8 of 32

In article <ef11420.3@webx.raydaftYaTP>,
 "Steve Amphlett" <Firstname.Lastname@where_I_work.com> wrote:

> Brett Shoelson wrote:
> >
> >
> >>
> >> a = [];
> >> x=(1:100)';
> >> tic
> >> for idx=1:1000
> >> a = [a; sin(x)];
> >> end
> >> toc
> >>
> >> clear b
> >> tic
> >> for idx=1:1000
> >> b(idx) = {sin(x)};
> >> end
> >> c=cell2mat(b);
> >> toc
> >>
> >> Elapsed time is 3.843000 seconds.
> >> Elapsed time is 0.078000 seconds.
> >
> >
> > This is interesting. I wouldn't have guessed that growing cell
> > arrays would
> > have been so different from growing matrices. I'm curious what's
> > behind
> > this...how are cell arrays stored that make them so much more
> > efficient for
> > growing???
>
> This is probably complete rubbish, but my best guess is this:
>
> A cell array is a container for a load of pointers to the real data
> members, which are scattered randomly around memory. When you grow a
> cell array, you are growing it by one pointer rather than by the size
> of the new member.
>
> As I said, probably completely wrong, but you never know, it could be
> a reasonable description. I'm waiting for one of "The Big Boys" to
> pop by and tell us.

I've played with this a bit at times. See my other
post in this thread. A problem is with a huge number
of appends. In some of my codes, I may need to append
potentially millions of times. When you try the above
codes for large numbers of appends then things get
interesting.

x=2;
clear b
tic
for idx=1:20000
 b(idx) = {sin(x)};
end
c=cell2mat(b);
toc


clear b
tic
for idx=1:40000
 b(idx) = {sin(x)};
end
c=cell2mat(b);
toc


Elapsed time is 4.547330 seconds.
Elapsed time is 17.345010 seconds.


Note that doubling the loop index does not double
the time.

The cell array approach works very well for small
enough numbers of cells. The block zero appends work
well as long as you don't need to re-allocate that
exponentially growing array too often. This can
cause your machine to wander into virtual memory -
a very bad thing for speed.

John


--
The best material model of a cat is another, or
preferably the same, cat.
A. Rosenblueth, Philosophy of Science, 1945

Subject: growing array in Matlab?

From: Brett Shoelson

Date: 17 Aug, 2005 13:22:32

Message: 9 of 32


<SNIP discussion of growing matrices, cell arrays>

> This is interesting. I wouldn't have guessed that growing cell arrays
> would have been so different from growing matrices. I'm curious what's
> behind this...how are cell arrays stored that make them so much more
> efficient for growing???
> Any thoughts?
> Brett

Thanks for the discussion...illuminating!
Brett
--
char(cumsum(...
[32 83 -11 7 -10 7 7 -4 -1 -46 40 -3 7 -3 15 -74 64 -5 -1 -58 57 8 7]))

Subject: growing array in Matlab?

From: Steve Amphlett

Date: 17 Aug, 2005 15:12:40

Message: 10 of 32

John D'Errico wrote:
>
>
> I've played with this a bit at times. See my other
> post in this thread. A problem is with a huge number
> of appends. In some of my codes, I may need to append
> potentially millions of times. When you try the above
> codes for large numbers of appends then things get
> interesting.
>
> x=2;
> clear b
> tic
> for idx=1:20000
> b(idx) = {sin(x)};
> end
> c=cell2mat(b);
> toc
>
>
> clear b
> tic
> for idx=1:40000
> b(idx) = {sin(x)};
> end
> c=cell2mat(b);
> toc
>
>
> Elapsed time is 4.547330 seconds.
> Elapsed time is 17.345010 seconds.
>
>
> Note that doubling the loop index does not double
> the time.
>
> The cell array approach works very well for small
> enough numbers of cells. The block zero appends work
> well as long as you don't need to re-allocate that
> exponentially growing array too often. This can
> cause your machine to wander into virtual memory -
> a very bad thing for speed.

My observation was that the bigger the appended items get, the better
the cell approach is. You're always going to be reallocating (new
memory followed by copying data) in any extension mode. Unless TMW
can do a linked-list cell array implementation. That would kick!
(1D easy, but what about more than that?)

Subject: growing array in Matlab?

From: Stephen Vavasis

Date: 17 Aug, 2005 15:16:16

Message: 11 of 32

John,

First, thanks for pointing out the obvious bug in my test case. I renamed
the outer loop variable, and I got similar asymptotic results (naive
algorithm runs quadratically, block-add runs linearly).

With regard to a function like "grow_array"-- I don't see how such an
approach could ever be subquadratic in the case of appending one element at
a time. The reason is that the idiom
     a = fun(a,b,c,d);
always makes a complete new copy of a, even if you change only one element,
because Matlab guarantees the "call-by-value" semantics. See below for some
more timings to illustrate this; you can tell that copying is taking place
because the running time is growing proportionally to m*n.

This is why in my original post I asked how this could be written in an
object-oriented manner; I don't see how to get around the call-by-value
limitation of Matlab (except by writing mex files).

With regard to the people who suggested I should use cell arrays -- it does
seem faster, but it also still seems to show quadratic growth, so this also
does not seem to solve the problem.

Thanks,
Steve


%%% Timing test to show that a=func(a) causes a to be recopied, even if only
one element changes.

function a = testwrite(a,i)
a(i) = 1;

function testwrite1(m,n)
a = zeros(m,1);
for j = 1 : n
  testwrite(a,1);
end

>> tic,testwrite1(1000,1000);,toc
Elapsed time is 0.047000 seconds.
>> tic,testwrite1(2000,2000);,toc
Elapsed time is 0.031000 seconds.
>> tic,testwrite1(4000,4000);,toc
Elapsed time is 0.345000 seconds.
>> tic,testwrite1(8000,8000);,toc
Elapsed time is 2.022000 seconds.
>> tic,testwrite1(16000,16000);,toc
Elapsed time is 7.743000 seconds.
>> tic,testwrite1(32000,32000);,toc
Elapsed time is 31.317000 seconds.

%%%% Timing test for cell arrays -- still quadratic although faster

function a = testappend4(n);
tic
a = {};
for j = 1 : n
  a(j) = {sin(j)};
end
toc

>> testappend4(4000);
Elapsed time is 0.047000 seconds.
>> testappend4(8000);
Elapsed time is 0.282000 seconds.
>> testappend4(8000);
Elapsed time is 0.250000 seconds.
>> testappend4(16000);
Elapsed time is 0.736000 seconds.
>> testappend4(16000);
Elapsed time is 0.611000 seconds.
>> testappend4(32000);
Elapsed time is 4.024000 seconds.
>> testappend4(32000);
Elapsed time is 4.416000 seconds.




"John D'Errico" <woodchips@rochester.rr.com> wrote in message
news:woodchips-EBB412.12532017082005@syrcnyrdrs-02-ge0.nyroc.rr.com...
> In article <ddvkd9$7ar$1@ruby.cit.cornell.edu>,
> "Stephen Vavasis" <vavasis@cs.cornell.edu> wrote:
>
>> A well-known issue with Matlab is that it performs badly if you
>> repeatedly
>> add elements to the end of an array like this:
>>
>> a = [];
>> while <condition>
>> a = [a; sin(x)];
>> end

[snip]

> The timings that you provide are not valid due to
> those bugs. You might look at the function grow_array.
> I wrote it some time ago in an attempt to speed up just
> this process, allowing the user to append new chunks
> more efficiently.

[snip]

Subject: growing array in Matlab?

From: Steve Amphlett

Date: 17 Aug, 2005 15:26:36

Message: 12 of 32

Stephen Vavasis wrote:
>
> With regard to the people who suggested I should use cell arrays --
> it does
> seem faster, but it also still seems to show quadratic growth, so
> this also
> does not seem to solve the problem.

Err, obviously. Were you expecting magic? Why not allocate the
entire available memory and then copy what you end up with back into
something suitably sized?

As I hinted, if you use cell arrays make your cells big.

Subject: growing array in Matlab?

From: W.Whiten@mailbox.uq.edu.au (Bill Whiten)

Date: 18 Aug, 2005 08:24:09

Message: 13 of 32

In article <ddvkd9$7ar$1@ruby.cit.cornell.edu>, "Stephen Vavasis"
<vavasis@cs.cornell.edu> wrote:

> A well-known issue with Matlab is that it performs badly if you repeatedly
> add elements to the end of an array like this:
>
> a = [];
> while <condition>
> a = [a; sin(x)];
> end
>


Some examples:

Using doubles:
>> clear a;tic;for i=1:10000;a(i)=i;end;toc
Elapsed time is 3.539078 seconds.
>> clear a;tic;for i=1:20000;a(i)=i;end;toc
Elapsed time is 12.559012 seconds.
>> clear a;tic;for i=1:30000;a(i)=i;end;toc
Elapsed time is 27.864958 seconds.
Time is roughly proportional to square of number of elements

>> clear a;tic;a=zeros(30000,1);for i=1:30000;a(i)=i;end;toc
Elapsed time is 0.263042 seconds.
Preallocation wims


Using cells
>> clear a;tic;for i=1:10000;a{i}=i;end;toc
Elapsed time is 2.539111 seconds.
>> clear a;tic;for i=1:20000;a{i}=i;end;toc
Elapsed time is 8.680755 seconds.
>> clear a;tic;for i=1:30000;a{i}=i;end;toc
Elapsed time is 18.302259 seconds.
Time is less but still proportional to square of number of elements

>> clear a;tic;a=cell(30000,1);for i=1:30000;a{i}=i;end;toc
Elapsed time is 0.395682 seconds.
Preallocation wims


Using a progressive preallocation:
>> clear a;tic;a=zeros(10,1);for i=1:10000;if(i>length(a))
a(round(i*1.6))=0;end;a(i)=i;end;toc
Elapsed time is 0.326124 seconds.
>> clear a;tic;a=zeros(10,1);for i=1:20000;if(i>length(a))
a(round(i*1.6))=0;end;a(i)=i;end;toc
Elapsed time is 0.609575 seconds.
>> clear a;tic;a=zeros(10,1);for i=1:30000;if(i>length(a))
a(round(i*1.6))=0;end;a(i)=i;end;toc
Elapsed time is 0.987948 seconds.

Time here is much less and increasing approximately linearly in the number
of elements. In theory time should be approximately proportion to
n*log(n).
Still slower than preallocation.

Regards,
-----------
Bill Whiten, W.Whiten@mailbox.uq.oz.au
Julius Kruttschnitt Mineral Research Centre,
The University Of Queensland, Tel: int +61 7 3365 5888
Isles Rd, Indooroopilly, Fax: int +61 7 3365 5999
Brisbane Qld 4068, AUSTRALIA.

Subject: growing array in Matlab?

From: John D'Errico

Date: 18 Aug, 2005 01:27:50

Message: 14 of 32

In article <de02f3$k3c$1@ruby.cit.cornell.edu>,
 "Stephen Vavasis" <vavasis@cs.cornell.edu> wrote:

> First, thanks for pointing out the obvious bug in my test case. I renamed
> the outer loop variable, and I got similar asymptotic results (naive
> algorithm runs quadratically, block-add runs linearly).
>
> With regard to a function like "grow_array"-- I don't see how such an
> approach could ever be subquadratic in the case of appending one element at
> a time. The reason is that the idiom
> a = fun(a,b,c,d);
> always makes a complete new copy of a, even if you change only one element,
> because Matlab guarantees the "call-by-value" semantics. See below for some
> more timings to illustrate this; you can tell that copying is taking place
> because the running time is growing proportionally to m*n.


Ok, I have a new version, written as an m-file, that is
truly linear in the time used.

It also has the advantage that it will not fail if
millions of rows are appended, as the simple cell array
solution does.

The solution that I chose was simple. I never return an
argument until the end. I store everything of interest
in persistent variables. Mainly one cell array, with big
cells to minimize the concatenate problems at the end.
Each individual cell is big enough that most of the time
its just a direct insertion into an existing preallocated
array. I do grow the cell size when large sets of data
are appended.

The flaws?

1. It cannot be used to append to two different arrays
at once, because of the persistent array implementation.

2. Its considerably faster than grow_array, but still
not as fast for small sets of data as direct use of a
cell array, or directly appending rows of zeros coupled
with direct indexing.

The reason it is slower than the direct solutions is
probably due to the function call overhead.

These flaws are not that bad, since I only would use
it in cases where I am forced to append potentially
millions of times and I can't predict the final array
size to preallocate.

Any ideas to improve on this?

John


========================================================
function A=growdata(newdata)
% incremental growth of an array by appending rows
%
% usage always involves 3 steps.
% (initial call): growdata
% (growth call): growdata(newdata)
% (final call): A = growdata;
%
%
% Example usage:
%
% growdata
% for i = 1:100000
% growdata(rand(1,3))
% end
% A = growdata;

% recall the stored data
persistent celldata n columncount rowcount totalrows
 
% which mode are we in?
if (nargout==0) && (nargin>0)
  % its a growth step
  
  % size of the appending data
  [r,c]=size(newdata);
  
  % was this the first call after initialization?
  if isnan(columncount)
    % this is actually an initialization step.
    % set the column count
    columncount = c;
    
    defaultrows = 20000;
    rowcount = max(defaultrows,r);
    
    celldata = {[newdata;zeros(rowcount - r,columncount)]};
    n = r;
    totalrows = n;
    
    if n==rowcount;
      celldata{end+1} = zeros(rowcount,columncount);
      n = 0;
    end
    
  else
    % its an appending call
    if c ~= columncount
      error 'Columns not compatible for appending'
    end
    
    % stuff into the last cell
    celldata{end}(n+(1:r),:) = newdata;
    n = n+r;
    totalrows = totalrows + r;
    
    % do we need to append another cell?
    if n>=rowcount
      rowcount = max(n,10*r);
      celldata{end+1} = zeros(rowcount,columncount);
      n = 0;
    end
    
  end
  
elseif (nargout==0) && (nargin==0)
  % its an initialization call, with no data
  columncount = NaN;
 
  defaultrows = 20000;
  rowcount = defaultrows;
    
  celldata = {zeros(rowcount,0)};
  n = 0;
  totalrows = 0;
  
elseif (nargout>0) && (nargin==0)
  % its an unpacking step
  
  % first drop any extraneous rows in the last cell
  celldata{end} = celldata{end}(1:n,:);
  
  % concatenate
  A = cat(1,celldata{:});
  
  % and clear the data
  clear celldata n columncount rowcount totalrows
  
else
  % cannot both append and unpack in the same step.
  error 'Cannot append more data and unpack in the same call'
end

Subject: growing array in Matlab?

From: John Creighton

Date: 17 Aug, 2005 21:42:22

Message: 15 of 32

John D'Errico wrote:
>
>
> In article <de02f3$k3c$1@ruby.cit.cornell.edu>,
> "Stephen Vavasis" <vavasis@cs.cornell.edu> wrote:
>
>> First, thanks for pointing out the obvious bug in my test case.
> I renamed
>> the outer loop variable, and I got similar asymptotic results
> (naive
>> algorithm runs quadratically, block-add runs linearly).
>>
>> With regard to a function like "grow_array"-- I don't see how
> such an
>> approach could ever be subquadratic in the case of appending
one
> element at
>> a time. The reason is that the idiom
>> a = fun(a,b,c,d);
>> always makes a complete new copy of a, even if you change only
> one element,
>> because Matlab guarantees the "call-by-value" semantics. See
> below for some
>> more timings to illustrate this; you can tell that copying is
> taking place
>> because the running time is growing proportionally to m*n.
>
>
> Ok, I have a new version, written as an m-file, that is
> truly linear in the time used.
>
> It also has the advantage that it will not fail if
> millions of rows are appended, as the simple cell array
> solution does.
>
> The solution that I chose was simple. I never return an
> argument until the end. I store everything of interest
> in persistent variables. Mainly one cell array, with big
> cells to minimize the concatenate problems at the end.
> Each individual cell is big enough that most of the time
> its just a direct insertion into an existing preallocated
> array. I do grow the cell size when large sets of data
> are appended.
>
> The flaws?
>
> 1. It cannot be used to append to two different arrays
> at once, because of the persistent array implementation.
>
> 2. Its considerably faster than grow_array, but still
> not as fast for small sets of data as direct use of a
> cell array, or directly appending rows of zeros coupled
> with direct indexing.
>
> The reason it is slower than the direct solutions is
> probably due to the function call overhead.
>
> These flaws are not that bad, since I only would use
> it in cases where I am forced to append potentially
> millions of times and I can't predict the final array
> size to preallocate.
>
> Any ideas to improve on this?
>
> John
>
>
> ========================================================
> function A=growdata(newdata)
> % incremental growth of an array by appending rows
> %
> % usage always involves 3 steps.
> % (initial call): growdata
> % (growth call): growdata(newdata)
> % (final call): A = growdata;
> %
> %
> % Example usage:
> %
> % growdata
> % for i = 1:100000
> % growdata(rand(1,3))
> % end
> % A = growdata;
>
> % recall the stored data
> persistent celldata n columncount rowcount totalrows
>
> % which mode are we in?
> if (nargout==0) && (nargin>0)
> % its a growth step
>
> % size of the appending data
> [r,c]=size(newdata);
>
> % was this the first call after initialization?
> if isnan(columncount)
> % this is actually an initialization step.
> % set the column count
> columncount = c;
>
> defaultrows = 20000;
> rowcount = max(defaultrows,r);
>
> celldata = {[newdata;zeros(rowcount - r,columncount)]};
> n = r;
> totalrows = n;
>
> if n==rowcount;
> celldata{end+1} = zeros(rowcount,columncount);
> n = 0;
> end
>
> else
> % its an appending call
> if c ~= columncount
> error 'Columns not compatible for appending'
> end
>
> % stuff into the last cell
> celldata{end}(n+(1:r),:) = newdata;
> n = n+r;
> totalrows = totalrows + r;
>
> % do we need to append another cell?
> if n>=rowcount
> rowcount = max(n,10*r);
> celldata{end+1} = zeros(rowcount,columncount);
> n = 0;
> end
>
> end
>
> elseif (nargout==0) && (nargin==0)
> % its an initialization call, with no data
> columncount = NaN;
>
> defaultrows = 20000;
> rowcount = defaultrows;
>
> celldata = {zeros(rowcount,0)};
> n = 0;
> totalrows = 0;
>
> elseif (nargout>0) && (nargin==0)
> % its an unpacking step
>
> % first drop any extraneous rows in the last cell
> celldata{end} = celldata{end}(1:n,:);
>
> % concatenate
> A = cat(1,celldata{:});
>
> % and clear the data
> clear celldata n columncount rowcount totalrows
>
> else
> % cannot both append and unpack in the same step.
> error 'Cannot append more data and unpack in the same call'
> end
>

There are enough people that want true pass by reference and
efficient features for data structures that it might be worth while
working together to write a data structure library for MATLAB that
stores everything in a global cell array. A lot would have to be
added including growing memory and garbage collection but I think it
would be a worth while project to take on if mathworks doesn’t want
to do it.

Subject: growing array in Matlab?

From: Loren Shure

Date: 18 Aug, 2005 07:24:56

Message: 16 of 32

In article <woodchips-EDAAE9.21275017082005@syrcnyrdrs-01-
ge0.nyroc.rr.com>, woodchips@rochester.rr.com says...
> In article <de02f3$k3c$1@ruby.cit.cornell.edu>,
> "Stephen Vavasis" <vavasis@cs.cornell.edu> wrote:
>
> > First, thanks for pointing out the obvious bug in my test case. I renamed
> > the outer loop variable, and I got similar asymptotic results (naive
> > algorithm runs quadratically, block-add runs linearly).
> >
> > With regard to a function like "grow_array"-- I don't see how such an
> > approach could ever be subquadratic in the case of appending one element at
> > a time. The reason is that the idiom
> > a = fun(a,b,c,d);
> > always makes a complete new copy of a, even if you change only one element,
> > because Matlab guarantees the "call-by-value" semantics. See below for some
> > more timings to illustrate this; you can tell that copying is taking place
> > because the running time is growing proportionally to m*n.
>
>
> Ok, I have a new version, written as an m-file, that is
> truly linear in the time used.
>
> It also has the advantage that it will not fail if
> millions of rows are appended, as the simple cell array
> solution does.
>
> The solution that I chose was simple. I never return an
> argument until the end. I store everything of interest
> in persistent variables. Mainly one cell array, with big
> cells to minimize the concatenate problems at the end.
> Each individual cell is big enough that most of the time
> its just a direct insertion into an existing preallocated
> array. I do grow the cell size when large sets of data
> are appended.
>
> The flaws?
>
> 1. It cannot be used to append to two different arrays
> at once, because of the persistent array implementation.
>
> 2. Its considerably faster than grow_array, but still
> not as fast for small sets of data as direct use of a
> cell array, or directly appending rows of zeros coupled
> with direct indexing.
>
> The reason it is slower than the direct solutions is
> probably due to the function call overhead.
>
> These flaws are not that bad, since I only would use
> it in cases where I am forced to append potentially
> millions of times and I can't predict the final array
> size to preallocate.
>
> Any ideas to improve on this?
>
> John
>
>
> ========================================================
> function A=growdata(newdata)
> % incremental growth of an array by appending rows
> %
> % usage always involves 3 steps.
> % (initial call): growdata
> % (growth call): growdata(newdata)
> % (final call): A = growdata;
> %
> %
> % Example usage:
> %
> % growdata
> % for i = 1:100000
> % growdata(rand(1,3))
> % end
> % A = growdata;
>
> % recall the stored data
> persistent celldata n columncount rowcount totalrows
>
> % which mode are we in?
> if (nargout==0) && (nargin>0)
> % its a growth step
>
> % size of the appending data
> [r,c]=size(newdata);
>
> % was this the first call after initialization?
> if isnan(columncount)
> % this is actually an initialization step.
> % set the column count
> columncount = c;
>
> defaultrows = 20000;
> rowcount = max(defaultrows,r);
>
> celldata = {[newdata;zeros(rowcount - r,columncount)]};
> n = r;
> totalrows = n;
>
> if n==rowcount;
> celldata{end+1} = zeros(rowcount,columncount);
> n = 0;
> end
>
> else
> % its an appending call
> if c ~= columncount
> error 'Columns not compatible for appending'
> end
>
> % stuff into the last cell
> celldata{end}(n+(1:r),:) = newdata;
> n = n+r;
> totalrows = totalrows + r;
>
> % do we need to append another cell?
> if n>=rowcount
> rowcount = max(n,10*r);
> celldata{end+1} = zeros(rowcount,columncount);
> n = 0;
> end
>
> end
>
> elseif (nargout==0) && (nargin==0)
> % its an initialization call, with no data
> columncount = NaN;
>
> defaultrows = 20000;
> rowcount = defaultrows;
>
> celldata = {zeros(rowcount,0)};
> n = 0;
> totalrows = 0;
>
> elseif (nargout>0) && (nargin==0)
> % its an unpacking step
>
> % first drop any extraneous rows in the last cell
> celldata{end} = celldata{end}(1:n,:);
>
> % concatenate
> A = cat(1,celldata{:});
>
> % and clear the data
> clear celldata n columncount rowcount totalrows
>
> else
> % cannot both append and unpack in the same step.
> error 'Cannot append more data and unpack in the same call'
> end
>

I have not tried anything here and don't know the performance
implications, but instead of persistent, what if you made your grow
array function return a handle to a nested function which stored the
data there? That way, you could have more than instance. Just a
thought...

--Loren

Subject: growing array in Matlab?

From: John D'Errico

Date: 18 Aug, 2005 12:28:50

Message: 17 of 32

In article <MPG.1d6e3a12f3c423c98992c@news.mathworks.com>,
 Loren Shure <loren.shure@mathworks.com> wrote:

> I have not tried anything here and don't know the performance
> implications, but instead of persistent, what if you made your grow
> array function return a handle to a nested function which stored the
> data there? That way, you could have more than instance. Just a
> thought...

Hi Loren,

Cute idea. I don't have enough experience with
function handles that I'd have thought of it myself.
Its like those anomalous functions. I'm still learning
to use them properly.

Thanks,
John

--
The best material model of a cat is another, or
preferably the same, cat.
A. Rosenblueth, Philosophy of Science, 1945

Subject: growing array in Matlab?

From: John D'Errico

Date: 18 Aug, 2005 13:18:20

Message: 18 of 32

In article <ef11420.13@webx.raydaftYaTP>,
 "John Creighton" <JohnCreighton_@hotmail.com> wrote:

> There are enough people that want true pass by reference and
> efficient features for data structures that it might be worth while
> working together to write a data structure library for MATLAB that
> stores everything in a global cell array. A lot would have to be
> added including growing memory and garbage collection but I think it
> would be a worth while project to take on if mathworks doesn’t want
> to do it.

I was thinking if this was the right way to go, but
I avoid it whenever possible. A mexed solution
requires compilation for different systems and must
often be recompiled with new releases of matlab.
Were TMW to provide such a tool, then they are the
ones who worry about those issues.

I'll stick with the m-file solution, even if it is
slower.

John

--
The best material model of a cat is another, or
preferably the same, cat.
A. Rosenblueth, Philosophy of Science, 1945

Subject: growing array in Matlab? (new code)

From: John D'Errico

Date: 18 Aug, 2005 15:38:00

Message: 19 of 32

In article
<woodchips-E55FA6.08284918082005@syrcnyrdrs-02-ge0.nyroc.rr.com>,
 John D'Errico <woodchips@rochester.rr.com> wrote:

> In article <MPG.1d6e3a12f3c423c98992c@news.mathworks.com>,
> Loren Shure <loren.shure@mathworks.com> wrote:
>
> > I have not tried anything here and don't know the performance
> > implications, but instead of persistent, what if you made your grow
> > array function return a handle to a nested function which stored the
> > data there? That way, you could have more than instance. Just a
> > thought...
>
> Hi Loren,
>
> Cute idea. I don't have enough experience with
> function handles that I'd have thought of it myself.
> Its like those anomalous functions. I'm still learning
> to use them properly.
>
> Thanks,
> John

It is an interesting idea. The code is below. It also
runs in linear time, as did my last attempt. But it is
also sadly twice as slow as was that last code. The
profile tool suggests that 47% of the time was spent
in function overhead. This explains where the doubled
time came from.

John


function funH=growdata2(appendmode)
% incremental growth of an array by appending rows (or columns)
%
% usage always involves 3 steps
% (initial call): fun1=growdata;
% (growth call): fun1(newdata)
% (final call): A = fun1();
%
% arguments: (input)
% appendmode - optional argument, which specifies how to append.
% appendmode == 'rows' --> append new data as rows
% of the final array.
%
% appendmode == 'columns' --> append new data as
% columns of the final array.
%
% DEFAULT: 'rows'
%
% arguments: (output)
% funH - function handle which will be used for the growth
% and terminal calls to unpack the grown array of
% data.
%
%
% Example usage 1:
%
% fun1 = growdata;
% for i = 1:100000
% fun1(i)
% end
% A = fun1();
%
% A will have size [100000,1]
%
%
% Example usage 2:
%
% Note: growdata uses a function handle to a nested function,
% so multiple arrays may be grown at the same time, simply by
% assigning a different function name on the first call to
% growdata. We can also change the appending style for each
% instance.
%
% fun1 = growdata; % append as new rows by default
% fun2 = growdata('columns'); % append as new columns
% fun3 = growdata('rows'); % append as new rows
% for i = 1:100000
% fun1(rand(2,3))
% fun2(sin(rand(2,1)))
% fun3(round(rand(1)*2),2)
% end
% A = fun1();
% B = fun2();
% C = fun3();
%
% In this case, A will have final size [200000,3], B will
% have size [2, 100000], and C will have a ramdom number of
% rows, with 2 columns.
 
 
% when growdata is called, a function handle to a nested
% function is returned.
% first, set up the variables we will need to remember
 
% check for appendmode
if (nargin<1)|isempty(appendmode)
  appendmode = 'rows';
end
valid = {'rows','columns'};
ind = strmatch(appendmode,valid);
if isempty(ind)
  error 'Appendmode must be ''rows'' or ''columns'' or any contraction'
end
% appendmode == 0 --> append as rows
% appendmode == 1 --> append as columns
appendmode = ind - 1;
 
% default number of rows in each cell
rowcount = 20000;
 
% we don't know how many columns
columncount = NaN;
celldata = {zeros(rowcount,0)};
n = 0;
totalrows = 0;
 
% Return a handle to the function which will actually
% do all the work
funH = @growthfun;
 
% ====================================================
% nested function definition
function A = growthfun(newdata)
 
% is this a growth call, or a final unpacking call?
if (nargout==0) && (nargin>0)
  % its a growth step
  
  % if appendmode says to use columns, then transpose the
  % data, append as rows. we will re-transpose at the very
  % end to undo this.
  if appendmode
    newdata = newdata';
  end
  
  % size of the appending data
  [r,c]=size(newdata);
  
  % was this the first call after initialization?
  if isnan(columncount)
    % first time called, we need to know the number
    % of columns to expect
    columncount = c;
    
    if r < rowcount
      % rowcount is large enough for now
      celldata = {[newdata;zeros(rowcount - r,columncount)]};
      n = r;
      totalrows = r;
    else
      % the first call overwhelmed rowcount, so make it larger
      n = 0
      totalrows = r;
      rowcount = 2*r;
      celldata = {newdata, zeros(rowcount,columncount)};
    end
  
  else
    % its an appending call after we have seen some data
    if c ~= columncount
      error '# of columns are incompatible for appending to this data'
    end
    
    % stuff into the last cell
    if (n+r)<rowcount
      celldata{end}(n+(1:r),:) = newdata;
      n = n+r;
      totalrows = totalrows + r;
    elseif (n+r)==rowcount
      % exactly filled that last cell, so add a new
      % (empty) cell on to the end
      celldata{end}(n+(1:r),:) = newdata;
      celldata{end+1} = zeros(rowcount,columncount);
      totalrows = totalrows + r;
      n = 0;
    else
      % we will overfill the last cell
      s = rowcount - n;
      celldata{end}(n+(1:s),:) = newdata;
      celldata{end+1} = newdata((s+1):end,:);
      totalrows = totalrows + r;
      n = size(celldata{end},1);
      
      if n>=rowcount
        % enlarge the cell size
        rowcount = n;
        celldata{end+1} = zeros(rowcount,columncount);
        n = 0;
      end
      
    end
  end
  
elseif (nargin==0)
  % its an unpacking step
  
  % first drop any extraneous rows in the last cell
  celldata{end} = celldata{end}(1:n,:);
  
  % concatenate
  if ~appendmode
    % as rows
    A = cat(1,celldata{:});
  else
    % as columns, but we did it as rows, so transpose
    A = cat(1,celldata{:})';
  end
  
  % and clear the data to return some memory
  clear celldata
  
elseif (nargout>0) && (nargin>0)
  % cannot both append and unpack in the same step.
  error 'Cannot append more data and unpack in the same call'
end
 
end % end nested function
end % end main fun

Subject: growing array in Matlab? (new code)

From: Stephen Vavasis

Date: 18 Aug, 2005 13:52:43

Message: 20 of 32

Dear Loren and John,

This is a feature of Matlab that is new to me and is very neat! It seems to
me that this technique (storing data in a function handle of a nested
function) may provide a workaround better than global variables to bypass
the "call by value" limitation of Matlab. Perhaps this trick will make it
possible to write objects that are efficient implementations of data
structures like heaps, in which some of the methods are supposed to update
the underlying data structure.

-- Steve


"John D'Errico" <woodchips@rochester.rr.com> wrote in message
news:woodchips-7262EC.11380018082005@syrcnyrdrs-03-ge0.nyroc.rr.com...
> In article
> <woodchips-E55FA6.08284918082005@syrcnyrdrs-02-ge0.nyroc.rr.com>,
> John D'Errico <woodchips@rochester.rr.com> wrote:
>
>> In article <MPG.1d6e3a12f3c423c98992c@news.mathworks.com>,
>> Loren Shure <loren.shure@mathworks.com> wrote:
>>
>> > I have not tried anything here and don't know the performance
>> > implications, but instead of persistent, what if you made your grow
>> > array function return a handle to a nested function which stored the
>> > data there? That way, you could have more than instance. Just a
>> > thought...
>> >
> It is an interesting idea. The code is below. It also
> runs in linear time, as did my last attempt. But it is
> also sadly twice as slow as was that last code. The
> profile tool suggests that 47% of the time was spent
> in function overhead. This explains where the doubled
> time came from.
>
> John
>

Subject: growing array in Matlab? (new code)

From: John Creighton

Date: 19 Aug, 2005 02:18:32

Message: 21 of 32

Stephen Vavasis wrote:
>
>
> Dear Loren and John,
>
> This is a feature of Matlab that is new to me and is very neat! It
> seems to
> me that this technique (storing data in a function handle of a
> nested
> function) may provide a workaround better than global variables to
> bypass
> the "call by value" limitation of Matlab. Perhaps this trick will
> make it
> possible to write objects that are efficient implementations of
> data
> structures like heaps, in which some of the methods are supposed to
> update
> the underlying data structure.
>
> -- Steve
>
>
> "John D'Errico" <woodchips@rochester.rr.com> wrote in message
>
news:woodchips-7262EC.11380018082005@syrcnyrdrs-03-ge0.nyroc.rr.com.
> ..
>> In article
>>
<woodchips-E55FA6.08284918082005@syrcnyrdrs-02-ge0.nyroc.rr.com>
,
>> John D'Errico <woodchips@rochester.rr.com> wrote:
>>
>>> In article
<MPG.1d6e3a12f3c423c98992c@news.mathworks.com>,
>>> Loren Shure <loren.shure@mathworks.com> wrote:
>>>
>>> > I have not tried anything here and don't know the
performance
>>> > implications, but instead of persistent, what if you
made your
> grow
>>> > array function return a handle to a nested function
which
> stored the
>>> > data there? That way, you could have more than
instance.
> Just a
>>> > thought...
>>> >
>> It is an interesting idea. The code is below. It also
>> runs in linear time, as did my last attempt. But it is
>> also sadly twice as slow as was that last code. The
>> profile tool suggests that 47% of the time was spent
>> in function overhead. This explains where the doubled
>> time came from.
>>
>> John
>>
>
>
I haven't read everything in this thread by what I gather is the data
is stored in a persistent variable and not a function handle. The
function handle just provides a key to that data. I'll read more to
see if I understand this right. I don't see anything wrong with the
global variable approach I will provide more information about this
in my next post.

Subject: growing array in Matlab?

From: John Creighton

Date: 19 Aug, 2005 02:26:00

Message: 22 of 32

John D'Errico wrote:
>
>
> In article <ef11420.13@webx.raydaftYaTP>,
> "John Creighton" <JohnCreighton_@hotmail.com> wrote:
>
>> There are enough people that want true pass by reference and
>> efficient features for data structures that it might be worth
> while
>> working together to write a data structure library for MATLAB
> that
>> stores everything in a global cell array. A lot would have to
be
>> added including growing memory and garbage collection but I
think
> it
>> would be a worth while project to take on if mathworks doesn’t
> want
>> to do it.
>
> I was thinking if this was the right way to go, but
> I avoid it whenever possible. A mexed solution
> requires compilation for different systems and must
> often be recompiled with new releases of matlab.
> Were TMW to provide such a tool, then they are the
> ones who worry about those issues.
>
> I'll stick with the m-file solution, even if it is
> slower.
>
> John
>
> --
> The best material model of a cat is another, or
> preferably the same, cat.
> A. Rosenblueth, Philosophy of Science, 1945
>
I didn't say anything about a mexed solution although that would add
speed.

Subject: growing array in Matlab? (new code)

From: John Creighton

Date: 19 Aug, 2005 03:03:27

Message: 23 of 32

Stephen Vavasis wrote:
>
>
> Dear Loren and John,
>
> This is a feature of Matlab that is new to me and is very neat! It
> seems to
> me that this technique (storing data in a function handle of a
> nested
> function) may provide a workaround better than global variables to
> bypass
> the "call by value" limitation of Matlab. Perhaps this trick will
> make it
> possible to write objects that are efficient implementations of
> data
> structures like heaps, in which some of the methods are supposed to
> update
> the underlying data structure.
>
> -- Steve
>
>
> "John D'Errico" <woodchips@rochester.rr.com> wrote in message
>
news:woodchips-7262EC.11380018082005@syrcnyrdrs-03-ge0.nyroc.rr.com.
> ..
>> In article
>>
<woodchips-E55FA6.08284918082005@syrcnyrdrs-02-ge0.nyroc.rr.com>
,
>> John D'Errico <woodchips@rochester.rr.com> wrote:
>>
>>> In article
<MPG.1d6e3a12f3c423c98992c@news.mathworks.com>,
>>> Loren Shure <loren.shure@mathworks.com> wrote:
>>>
>>> > I have not tried anything here and don't know the
performance
>>> > implications, but instead of persistent, what if you
made your
> grow
>>> > array function return a handle to a nested function
which
> stored the
>>> > data there? That way, you could have more than
instance.
> Just a
>>> > thought...
>>> >
>> It is an interesting idea. The code is below. It also
>> runs in linear time, as did my last attempt. But it is
>> also sadly twice as slow as was that last code. The
>> profile tool suggests that 47% of the time was spent
>> in function overhead. This explains where the doubled
>> time came from.
>>
>> John
>>
>
>
I was doing some thinking about function handles and while I think
the name space issue is not significant as we reserve names for
things anyway like keywords I think there could be some security
advantages of using persistent variables. If we used a persistent
variable in a function of a class I think ways could be devised to
limit access to that variable. However if a global variable is used
people could just look at the global data structure and see what is
in it. Well If people didn't know the name of the variable they
couldn't find the global variable. At least not without searching for
a while.

Subject: growing array in Matlab? (new code)

From: us

Date: 19 Aug, 2005 03:15:40

Message: 24 of 32

John Creighton:
<SNIP down to potential misunderstanding

> Well If people didn't know the name of the variable they couldn't
find the global variable. At least not without searching for a
while...

i wouldn't call

     whos global

searching for a while...

us

Subject: growing array in Matlab? (new code)

From: John Creighton

Date: 19 Aug, 2005 03:26:45

Message: 25 of 32

us wrote:
>
>
> John Creighton:
> <SNIP down to potential misunderstanding
>
>> Well If people didn't know the name of the variable they
couldn't
> find the global variable. At least not without searching for a
> while...
>
> i wouldn't call
>
> whos global
>
> searching for a while...
>
> us

Oh, that's good to know if I ever wanted to consider variable
security in MATLAB.

Subject: growing array in Matlab?

From: Jos

Date: 19 Aug, 2005 03:41:54

Message: 26 of 32

Steve Amphlett wrote:
>
>
> Steve Amphlett wrote:
>>
>>
> <snip, ways to grow arrays...
>
>> Have you tried growing a cell array and then when it's full,
>> cell2mat() the data out? I seem to remember someone saying
here
>> that
>> cell arrays grow well.
>
> I just slapped togetha test snippet and I think it's a winner:
>
> a = [];
> x=1;
> tic
> for idx=1:16000
> a = [a; sin(x)];
> end
> toc
>
> clear b
> tic
> for idx=1:16000
> b(idx) = {sin(x)};
> end
> c=cell2mat(b);
> toc
>
> Elapsed time is 4.422000 seconds.
> Elapsed time is 0.922000 seconds.
>
> And if I make x a vecor, the results are even more convincing:
>
> a = [];
> x=(1:100)';
> tic
> for idx=1:1000
> a = [a; sin(x)];
> end
> toc
>
> clear b
> tic
> for idx=1:1000
> b(idx) = {sin(x)};
> end
> c=cell2mat(b);
> toc
>
> Elapsed time is 3.843000 seconds.
> Elapsed time is 0.078000 seconds.

% Why is this
b = [] ; tic ; for idx = 1:20000, b(idx) = 2 ; end ; toc ;
% (slightly) faster than
b = [] ; tic ; for idx = 1:20000, b = [b ; 2] ; end ; toc
% ?

Jos

Subject: growing array in Matlab?

From: Steve Amphlett

Date: 19 Aug, 2005 04:24:52

Message: 27 of 32

Jos wrote:
>
>
> % Why is this
> b = [] ; tic ; for idx = 1:20000, b(idx) = 2 ; end ; toc ;
> % (slightly) faster than
> b = [] ; tic ; for idx = 1:20000, b = [b ; 2] ; end ; toc

Maybe it's because concatenation [b ; 2] needs to compare the sizes
of the concatenated arrays to work out if it's legal before doing it,
whereas b(idx) is always legal. Maybe concatenation and reassignment
has more intermediate steps.

BTW, I fount that "feature accel off" made no diffeence.

Subject: growing array in Matlab? (new code)

From: Loren Shure

Date: 19 Aug, 2005 07:40:11

Message: 28 of 32

In article <ef11420.19@webx.raydaftYaTP>, JohnCreighton_@hotmail.com
says...
> Stephen Vavasis wrote:
> >
> >
> > Dear Loren and John,
> >
> > This is a feature of Matlab that is new to me and is very neat! It
> > seems to
> > me that this technique (storing data in a function handle of a
> > nested
> > function) may provide a workaround better than global variables to
> > bypass
> > the "call by value" limitation of Matlab. Perhaps this trick will
> > make it
> > possible to write objects that are efficient implementations of
> > data
> > structures like heaps, in which some of the methods are supposed to
> > update
> > the underlying data structure.
> >
> > -- Steve
> >
> >
> > "John D'Errico" <woodchips@rochester.rr.com> wrote in message
> >
> news:woodchips-7262EC.11380018082005@syrcnyrdrs-03-ge0.nyroc.rr.com.
> > ..
> >> In article
> >>
> <woodchips-E55FA6.08284918082005@syrcnyrdrs-02-ge0.nyroc.rr.com>
> ,
> >> John D'Errico <woodchips@rochester.rr.com> wrote:
> >>
> >>> In article
> <MPG.1d6e3a12f3c423c98992c@news.mathworks.com>,
> >>> Loren Shure <loren.shure@mathworks.com> wrote:
> >>>
> >>> > I have not tried anything here and don't know the
> performance
> >>> > implications, but instead of persistent, what if you
> made your
> > grow
> >>> > array function return a handle to a nested function
> which
> > stored the
> >>> > data there? That way, you could have more than
> instance.
> > Just a
> >>> > thought...
> >>> >
> >> It is an interesting idea. The code is below. It also
> >> runs in linear time, as did my last attempt. But it is
> >> also sadly twice as slow as was that last code. The
> >> profile tool suggests that 47% of the time was spent
> >> in function overhead. This explains where the doubled
> >> time came from.
> >>
> >> John
> >>
> >
> >
> I haven't read everything in this thread by what I gather is the data
> is stored in a persistent variable and not a function handle. The
> function handle just provides a key to that data. I'll read more to
> see if I understand this right. I don't see anything wrong with the
> global variable approach I will provide more information about this
> in my next post.
>

Not true with nested functions that data is persistent, at least via the
persistent keyword in MATLAB and meaning only one instance. The
relevant workspace accompanies the handle to the nested function.
That's why you can have multiple instances.

--Loren

Subject: growing array in Matlab? (new code)

From: Loren Shure

Date: 19 Aug, 2005 07:42:41

Message: 29 of 32

In article <de2hup$qgu$1@ruby.cit.cornell.edu>, vavasis@cs.cornell.edu
says...
> Dear Loren and John,
>
> This is a feature of Matlab that is new to me and is very neat! It seems to
> me that this technique (storing data in a function handle of a nested
> function) may provide a workaround better than global variables to bypass
> the "call by value" limitation of Matlab. Perhaps this trick will make it
> possible to write objects that are efficient implementations of data
> structures like heaps, in which some of the methods are supposed to update
> the underlying data structure.
>
> -- Steve
>
[[[ stuff deleted ]]]
> >> >
> > It is an interesting idea. The code is below. It also
> > runs in linear time, as did my last attempt. But it is
> > also sadly twice as slow as was that last code. The
> > profile tool suggests that 47% of the time was spent
> > in function overhead. This explains where the doubled
> > time came from.
> >
> > John
> >
>
>


There was a post on this from a user sometime during the spring this
year. I can't find it now -- but he wrote an M-file to make a linked
list, I think, using nested functions.

--Loren

Subject: growing array in Matlab? (new code)

From: Steve Amphlett

Date: 19 Aug, 2005 07:51:36

Message: 30 of 32

Loren Shure wrote:
>
>
> In article <de2hup$qgu$1@ruby.cit.cornell.edu>,
> vavasis@cs.cornell.edu
> says...
>> Dear Loren and John,
>>
>> This is a feature of Matlab that is new to me and is very neat!
> It seems to
>> me that this technique (storing data in a function handle of a
> nested
>> function) may provide a workaround better than global variables
> to bypass
>> the "call by value" limitation of Matlab. Perhaps this trick
> will make it
>> possible to write objects that are efficient implementations of
> data
>> structures like heaps, in which some of the methods are
supposed
> to update
>> the underlying data structure.
>>
>> -- Steve
>>
> [[[ stuff deleted ]]]
>> >> >
>> > It is an interesting idea. The code is below. It also
>> > runs in linear time, as did my last attempt. But it is
>> > also sadly twice as slow as was that last code. The
>> > profile tool suggests that 47% of the time was spent
>> > in function overhead. This explains where the doubled
>> > time came from.
>> >
>> > John
>> >
>>
>>
>
>
> There was a post on this from a user sometime during the spring
> this
> year. I can't find it now -- but he wrote an M-file to make a
> linked
> list, I think, using nested functions.

I think this may be the post:

sturla.molden, "Programming data structures in Matlab 14 (HOWTO)" #, 23 Oct 2004 10:11 pm </WebX?50@@.eeedfcb>

Subject: growing array in Matlab? (new code)

From: Jim Simos

Date: 20 Aug, 2005 09:43:56

Message: 31 of 32

Dear John D'Errico,

                 Firstly,many thanks for your growdata function.I
have an algorithm tha need to create a large matrix by continously
appending vectos as rows to the final matrix.My first implementation
involving vertcat with cell arrays was rather slow in comparison to
yours.Some relults for the elapsed time:
Size of matrix A created--> 2114882 rows x 16 columns
My implementation gave about 3600 sec.
When I used your growdata for appending data (with a slight change to
the newrows=200000 instead of 20000 which you had),the elapsed time
was 310 sec.Of course this change is faster for numel(A)>2000000
in my case.
A suggestion,is tha you could integrate an if condition when the
maximum limit of the integer representation is reached,and pack the
memory.

Friendly,

Jim

Subject: growing array in Matlab? (new code)

From: John D'Errico

Date: 20 Aug, 2005 17:08:05

Message: 32 of 32

In article <ef11420.29@webx.raydaftYaTP>,
 "Jim Simos" <jsimos@math.uoa.gr> wrote:

> Firstly,many thanks for your growdata function.I
> have an algorithm tha need to create a large matrix by continously
> appending vectos as rows to the final matrix.My first implementation
> involving vertcat with cell arrays was rather slow in comparison to
> yours.Some relults for the elapsed time:
> Size of matrix A created--> 2114882 rows x 16 columns
> My implementation gave about 3600 sec.
> When I used your growdata for appending data (with a slight change to
> the newrows=200000 instead of 20000 which you had),the elapsed time
> was 310 sec.Of course this change is faster for numel(A)>2000000
> in my case.
> A suggestion,is tha you could integrate an if condition when the
> maximum limit of the integer representation is reached,and pack the
> memory.

Thanks for the feedback. This is the class of problem
I'm aiming at, where the user needs to concatenate
potentially millions of elements of possibly random
numbers of rows.

I will clean these codes up a bit and put them on
matlab central. I'll see about adding a user
controllable block size, since some users may not
want to grab such large chunks of memory at a time.

John


--
The best material model of a cat is another, or
preferably the same, cat.
A. Rosenblueth, Philosophy of Science, 1945

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us