Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Appending to arrays in a loop efficiently

Subject: Appending to arrays in a loop efficiently

From: Szabolcs

Date: 21 Apr, 2008 08:19:10

Message: 1 of 6


When appending to arrays repeatedly in a loop, the Matlab editor warns
about the performance impact:

"'v' might be growing inside a loop. Consider preallocating for
speed."

The problem is that I cannot preallocate because I do not know the
final size of the array. How could I solve this problem in this case?

Does Matlab have any data structures that are optimized for appending
elements (e.g. linked lists)?

Another possibility is to double the size of the array manually each
time I run out of space, and truncate it again when the computation is
finished. But this is a lot of work and extra code. I was hoping
that Matlab could do this automatically for me. (For those familiar
with Mathematica, does Matlab offer anything similar to Sow and Reap?)

Thanks for any suggestions in advance,
Szabolcs Horv=E1t

P.S. I am new to Matlab. Sorry if this is a very basic question. I
haven't been able to find an answer by searching the web.

Subject: Appending to arrays in a loop efficiently

From: John D'Errico

Date: 21 Apr, 2008 10:30:05

Message: 2 of 6

Szabolcs <szhorvat@gmail.com> wrote in message <93ba9462-f7d1-4b18-
9424-69a615127dc0@a1g2000hsb.googlegroups.com>...
>
> When appending to arrays repeatedly in a loop, the Matlab editor warns
> about the performance impact:
>
> "'v' might be growing inside a loop. Consider preallocating for
> speed."
>
> The problem is that I cannot preallocate because I do not know the
> final size of the array. How could I solve this problem in this case?
>
> Does Matlab have any data structures that are optimized for appending
> elements (e.g. linked lists)?
>
> Another possibility is to double the size of the array manually each
> time I run out of space, and truncate it again when the computation is
> finished. But this is a lot of work and extra code. I was hoping
> that Matlab could do this automatically for me. (For those familiar
> with Mathematica, does Matlab offer anything similar to Sow and Reap?)
>
> Thanks for any suggestions in advance,
> Szabolcs Horv=E1t
>
> P.S. I am new to Matlab. Sorry if this is a very basic question. I
> haven't been able to find an answer by searching the web.

The simplest solution is to use a cell array.

Put each element in a new cell as you add
it to the list. At the very end, you use cell2mat
to combine all into one flat array. This works
nicely as long as you do not get too many cells.
Millions of cells will cause problems, because
Matlab still does need to allocate space for
each cell. I've found that a few thousand cells
will generally not be an issue.

If you want some code that does it all for
you, I wrote a couple of functions from the
conclusions of a previous thread where this
question was asked. You can find it here:

http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?
objectId=8334&objectType=file

These functions work with the cell array trick,
with an extra twist. They try not to append
millions of new cells. This helps when you
have a huge number of elements to append.

John

Subject: Appending to arrays in a loop efficiently

From: Szabolcs

Date: 21 Apr, 2008 11:41:58

Message: 3 of 6

On Apr 21, 12:30=A0pm, "John D'Errico" <woodch...@rochester.rr.com>
wrote:
> Szabolcs <szhor...@gmail.com> wrote in message <93ba9462-f7d1-4b18-
>
> 9424-69a615127...@a1g2000hsb.googlegroups.com>...
>
>
>
>
>
> > When appending to arrays repeatedly in a loop, the Matlab editor warns
> > about the performance impact:
>
> > "'v' might be growing inside a loop. Consider preallocating for
> > speed."
>
> > The problem is that I cannot preallocate because I do not know the
> > final size of the array. =A0How could I solve this problem in this case?=

>
> > Does Matlab have any data structures that are optimized for appending
> > elements (e.g. linked lists)?
>
> > Another possibility is to double the size of the array manually each
> > time I run out of space, and truncate it again when the computation is
> > finished. =A0But this is a lot of work and extra code. =A0I was hoping
> > that Matlab could do this automatically for me. =A0(For those familiar
> > with Mathematica, does Matlab offer anything similar to Sow and Reap?)
>
> > Thanks for any suggestions in advance,
> > Szabolcs Horv=3DE1t
>
> > P.S. =A0I am new to Matlab. =A0Sorry if this is a very basic question. =
=A0I
> > haven't been able to find an answer by searching the web.
>
> The simplest solution is to use a cell array.
>
> Put each element in a new cell as you add
> it to the list. At the very end, you use cell2mat
> to combine all into one flat array. This works
> nicely as long as you do not get too many cells.
> Millions of cells will cause problems, because
> Matlab still does need to allocate space for
> each cell. I've found that a few thousand cells
> will generally not be an issue.

Hi John,

Thanks for the reply, but unfortunately I am not sure I understand it
completely.

This is my first encounter with cell arrays. If I understand it
correctly, the only difference between normal arrays and cell arrays
is that the latter can hold inhomogeneous data types. So using
something like

a =3D {}
for k=3D1:10
 a =3D [a 1];
end

is still slow (and the editor still warns about the need for
reallocation).

Using

a =3D {}
for k=3D1:10
 a =3D {a 1};
end

appears to build a linked list (so this should be fast), but I could
not figure out how to flatten it in the end (cell2mat does not work on
this data structure). Is there a built-in way to flatten this data
structure, or did I misunderstand this completely?

I wrote the a function to flatten lists (attached at the end of the
message), and benchmarked the following:

tic
a =3D {};
for k=3D1:10000, a =3D {a, k}; end
toc
flatten_cell(a);
toc

tic
a =3D [];
for k=3D1:10000, a =3D [a k]; end
toc

It appears that the time taken by the second approach grows roughly as
n^2 where n is the length of the list, while the first one is linear
(as expected). But the difference in absolute timings is not a great
one (i.e. n has to be large for the first approach to be really
faster). So this does not seem like a good solution.

Also, as I said I am a newbie, so any comments on my code are most
welcome (what is it that I'm doing right, where do I take an awkward
approach, and what is completely wrong, etc.)

Szabolcs

function a =3D flatten_cell(c)

cc =3D c; % I'm not sure whether this triggers a deep copy ...
count =3D 0;
while ~isempty(cc)
% I'm not sure about the following either,
% but if it really did a deep copy, then the complexity
% would not be linear ... So most probably it doesn't do it.
    cc =3D cc{1};
    count =3D count+1;
end

cc =3D c;
a =3D zeros(1,count);
for k=3Dcount:-1:1
    a(k) =3D cc{2};
    cc =3D cc{1};
end

end

Subject: Appending to arrays in a loop efficiently

From: John D'Errico

Date: 21 Apr, 2008 12:27:01

Message: 4 of 6

Szabolcs <szhorvat@gmail.com> wrote in message <b2447fe0-eb20-4ecd-
bf76-fbc6024fdf7c@f24g2000prh.googlegroups.com>...

> Hi John,
>
> Thanks for the reply, but unfortunately I am not sure I understand it
> completely.
>
> This is my first encounter with cell arrays. If I understand it
> correctly, the only difference between normal arrays and cell arrays
> is that the latter can hold inhomogeneous data types. So using
> something like
>
> a =3D {}
> for k=3D1:10
> a =3D [a 1];
> end
>
> is still slow (and the editor still warns about the need for
> reallocation).
>
> Using
>
> a =3D {}
> for k=3D1:10
> a =3D {a 1};
> end
>
> appears to build a linked list (so this should be fast), but I could
> not figure out how to flatten it in the end (cell2mat does not work on
> this data structure). Is there a built-in way to flatten this data
> structure, or did I misunderstand this completely?
>
> I wrote the a function to flatten lists (attached at the end of the
> message), and benchmarked the following:
>
> tic
> a =3D {};
> for k=3D1:10000, a =3D {a, k}; end
> toc
> flatten_cell(a);
> toc
>
> tic
> a =3D [];
> for k=3D1:10000, a =3D [a k]; end
> toc
>
> It appears that the time taken by the second approach grows roughly as
> n^2 where n is the length of the list, while the first one is linear
> (as expected). But the difference in absolute timings is not a great
> one (i.e. n has to be large for the first approach to be really
> faster). So this does not seem like a good solution.
>
> Also, as I said I am a newbie, so any comments on my code are most
> welcome (what is it that I'm doing right, where do I take an awkward
> approach, and what is completely wrong, etc.)
>
> Szabolcs
>
> function a =3D flatten_cell(c)
>
> cc =3D c; % I'm not sure whether this triggers a deep copy ...
> count =3D 0;
> while ~isempty(cc)
> % I'm not sure about the following either,
> % but if it really did a deep copy, then the complexity
> % would not be linear ... So most probably it doesn't do it.
> cc =3D cc{1};
> count =3D count+1;
> end
>
> cc =3D c;
> a =3D zeros(1,count);
> for k=3Dcount:-1:1
> a(k) =3D cc{2};
> cc =3D cc{1};
> end
>
> end

The basic cell trick is simple. I'll fix it so I
don't know how many elements I'll end up
with. It should be roughly 15000 elements.

% simple concatenated growth (quadratic)
tic
C = [];
for i = 1:10000
  C = [C;rand(round(rand(1))+1,2)];
end
toc
Elapsed time is 9.992871 seconds.

% cell growth
tic
C = {};
for i = 1:10000
  C{i,1} = rand(round(rand(1))+1,2);
end
C = cell2mat(C);
toc
Elapsed time is 2.167827 seconds.

The simple cell growing is faster, but it is still
quadratic in its behavior. Note that doubling the
number of appends (and the size of C) multiples
the time by more than 2. It should be roughly
a 4 to 1 increase.

% cell growth
tic
C = {};
for i = 1:20000
  C{i,1} = rand(round(rand(1))+1,2);
end
C = cell2mat(C);
toc
Elapsed time is 10.120017 seconds.

Growdata is better behaved. It works by
preallocating a moderately large numeric array
inside each cell, then it uses direct inserts into
these flat arrays. When it runs out of room in
a flat array, it does not grow that array, it
starts a new cell, with a newly preallocated
array.

% use of growdata
tic
growdata
for i = 1:10000
  growdata(rand(round(rand(1))+1,2));
end
C = growdata;
toc
Elapsed time is 1.896589 seconds.

But, note that growdata is now much more
linear in its time behavior.

% growdata
tic
growdata
for i = 1:20000
  growdata(rand(round(rand(1))+1,2));
end
C = growdata;
toc
Elapsed time is 3.748545 seconds.

Doubling the number of appends only
doubled the time required.

Because growdata uses a persistent array
to store its work, a downside of it is you
cannot grow two different arrays in
parallel. For this reason, I wrote growdata2.
Growdata2 uses a nested array strategy,
returning a function handle to work with.
This allows you to grow multiple arrays
at once.

In the older release of Matlab that I wrote
growdata2 to run in, the nested arrays
implementation was slower than the
persistent version in growdata. This
appears to have been remedied in my
newer release of Matlab.

% Use of growdata2
tic
H = growdata2;
for i = 1:10000
  H(rand(round(rand(1))+1,2));
end
C = H();
toc
Elapsed time is 1.687099 seconds.

% Growdata2 is also roughly linear
tic
H = growdata2;
for i = 1:20000
  H(rand(round(rand(1))+1,2));
end
C = H();
toc
Elapsed time is 2.911826 seconds.

John

Subject: Appending to arrays in a loop efficiently

From: Steven Lord

Date: 21 Apr, 2008 14:58:16

Message: 5 of 6


"John D'Errico" <woodchips@rochester.rr.com> wrote in message
news:fui16l$2cr$1@fred.mathworks.com...
> Szabolcs <szhorvat@gmail.com> wrote in message <b2447fe0-eb20-4ecd-
> bf76-fbc6024fdf7c@f24g2000prh.googlegroups.com>...
>
>> Hi John,
>>
>> Thanks for the reply, but unfortunately I am not sure I understand it
>> completely.
>>
>> This is my first encounter with cell arrays. If I understand it
>> correctly, the only difference between normal arrays and cell arrays
>> is that the latter can hold inhomogeneous data types. So using
>> something like
>>
>> a =3D {}
>> for k=3D1:10
>> a =3D [a 1];
>> end
>>
>> is still slow (and the editor still warns about the need for
>> reallocation).
>>
>> Using
>>
>> a =3D {}
>> for k=3D1:10
>> a =3D {a 1};
>> end
>>
>> appears to build a linked list (so this should be fast), but I could
>> not figure out how to flatten it in the end (cell2mat does not work on
>> this data structure). Is there a built-in way to flatten this data
>> structure, or did I misunderstand this completely?
>>
>> I wrote the a function to flatten lists (attached at the end of the
>> message), and benchmarked the following:
>>
>> tic
>> a =3D {};
>> for k=3D1:10000, a =3D {a, k}; end
>> toc
>> flatten_cell(a);
>> toc
>>
>> tic
>> a =3D [];
>> for k=3D1:10000, a =3D [a k]; end
>> toc
>>
>> It appears that the time taken by the second approach grows roughly as
>> n^2 where n is the length of the list, while the first one is linear
>> (as expected). But the difference in absolute timings is not a great
>> one (i.e. n has to be large for the first approach to be really
>> faster). So this does not seem like a good solution.
>>
>> Also, as I said I am a newbie, so any comments on my code are most
>> welcome (what is it that I'm doing right, where do I take an awkward
>> approach, and what is completely wrong, etc.)
>>
>> Szabolcs
>>
>> function a =3D flatten_cell(c)
>>
>> cc =3D c; % I'm not sure whether this triggers a deep copy ...
>> count =3D 0;
>> while ~isempty(cc)
>> % I'm not sure about the following either,
>> % but if it really did a deep copy, then the complexity
>> % would not be linear ... So most probably it doesn't do it.
>> cc =3D cc{1};
>> count =3D count+1;
>> end
>>
>> cc =3D c;
>> a =3D zeros(1,count);
>> for k=3Dcount:-1:1
>> a(k) =3D cc{2};
>> cc =3D cc{1};
>> end
>>
>> end
>
> The basic cell trick is simple. I'll fix it so I
> don't know how many elements I'll end up
> with. It should be roughly 15000 elements.
>
> % simple concatenated growth (quadratic)
> tic
> C = [];
> for i = 1:10000
> C = [C;rand(round(rand(1))+1,2)];
> end
> toc
> Elapsed time is 9.992871 seconds.

Another solution to this type of problem, where you're not sure how long
your array will be to start out with, is to use an upper bound on the number
of elements as the size to which you preallocate and trim the excess at the
end. If you don't overestimate by too much, this can be pretty quick.


clear all
% simple concatenated growth (quadratic)
rand('twister', 5409);
tic
C1 = [];
for i = 1:10000
  a = rand(round(rand(1))+1,2);
  C1 = [C1; a];
end
toc

% "Overestimate" preallocation
rand('twister', 5409);
tic
% Since each a below can have at most 2 rows, and there are 10000 of them
% we know that the combination of all of them can have no more than 20000
rows
C2 = zeros(20000, 2);
counter = 0;
for k = 1:10000
  a = rand(round(rand)+1, 2);
  % Assign a to certain rows of C2
  C2(counter +(1:size(a, 1)), :) = a;
  counter = counter + size(a, 1);
end
C2 = C2(1:counter , :);
toc
isequal(C1, C2)


> % cell growth
> tic
> C = {};
> for i = 1:10000
> C{i,1} = rand(round(rand(1))+1,2);
> end
> C = cell2mat(C);
> toc
> Elapsed time is 2.167827 seconds.

The "overestimate" preallocation was quicker than both simple concatenated
growth and cell growth in R2008a on my Windows XP machine. Of course, in
order to use this, you have to have some knowledge of how your function
behaves to know how much to preallocate ... but ideally you'll have that
knowledge of how you want your function to behave, since you're writing it.

*snip*

--
Steve Lord
slord@mathworks.com

Subject: Appending to arrays in a loop efficiently

From: Szabolcs

Date: 21 Apr, 2008 18:16:28

Message: 6 of 6


John, Steve,

Thanks for the detailed replies!

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us