From: "John D'Errico" <woodchips@rochester.rr.com>
Path: news.mathworks.com!newsfeed-00.mathworks.com!webcrossing
Newsgroups: comp.soft-sys.matlab
Subject: Re: decrease in speed due to appending a row vector
Message-ID: <ef580c1.5@webcrossing.raydaftYaTP>
Date: Thu, 24 May 2007 22:50:37 -0400
References: <ef580c1.-1@webcrossing.raydaftYaTP> <ef580c1.1@webcrossing.raydaftYaTP> <ef580c1.2@webcrossing.raydaftYaTP> <1180055867.161190.184610@u36g2000prd.googlegroups.com>
Lines: 164
NNTP-Posting-Host: 66.66.16.32
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
Xref: news.mathworks.com comp.soft-sys.matlab:410979



NZTideMan wrote:
>
>
> On May 25, 12:35 pm, "John D'Errico"
<woodch...@rochester.rr.com>
> wrote:
>> Brian wrote:
>>
>> > I don't know how big will the vector grow each time. The
space
>> > depends on the input data which will change most of the
time
> and I
>> > have no way of know the final size.
>>
>> Take a look at my growdata or growdata2
>> codes. They came from discussions here
>> on c.s-s.m, and were written to allow
>> arrays to grow in size with a minimum
>> penalty in time. My goal was to allow
>> arrays to grow in size to potentially
>> millions of elements, without advance
>> knowledge of the final count. These codes
>> use cell arrays, each cell of which is
>> preallocated to use moderately large
>> but still manageable chunks of memory.
>> Arrays with huge numbers of cells can
>> have problems too.
>>
>> <http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objec...>
>>
>> Of course, its always better to just
>> preallocate your large arrays whenever
>> you can.
>>
>> HTh,
>> John
>
> I didn't know about your growdata routines, John, so I have devised
> the following way for accumulating results of a Monte Carlo
> simulation:
> I generate a structure:
> s(irun).ymx=a;
> where irun is the Monte Carlo run number, and the result of the
> simulation a(1,n) is a row vector where n is the number of events
> (drawn from a Poisson distribution). n varies from run to run.
> I understand that when you do it this way, each array in s uses
> contiguous memory, but they are not contiguous within the
> structure.
> Thus, the memory allocation problem does not arise.
> I'd appreciate your comments on this.

This is also a reassonable way to solve the
problem.

Use of a single cell for each new appended
element is another way that is very similar
to the structure approach that you used.
Use of pointers here is more efficient
than growing and re-allocating the entire
array. I'm not sure about use of structures,
as I thought that structures were not
efficient users of memory as compared to
cell arrays. This is probably changing
with each matlab release of course. (See
below for a comparison.)

An issue is how many total elements that
you may expect to see. In some earlier
testing, I found that cell arrays with
millions of cells caused some serious
problems, however a few thousand, or even
ten thousand or so cells was not a problem.

I am starting to wonder if its time to
reassess those growdata codes as Matlab
evolves. In some recent tests it seemed
that the problem I had seen with creating
large cell arrays with millions of elements
may no longer be as much of an issue in
recent releases. The underlying matlab
codes must have been improved. I'll want
to compare how use of a structure compares
to a cell array for memory and speed.

Some quick testing shows a structure and
a cell array to be similar in time and
memory here.

tic
for i = 1:1000
  a(i).X =rand(1,3);
end,
whos a
toc

  Name Size Bytes Class Attributes
  a 1x1000 84064 struct

Elapsed time is 0.023403 seconds.

tic
for i = 1:1000
  b{i} =rand(1,3);
end
whos b
toc

  Name Size Bytes Class Attributes
  b 1x1000 84000 cell

Elapsed time is 0.022017 seconds.

The question is, do these scale well with
larger numbers of append operations?

tic
for i = 1:100000
  a(i).X =rand(1,3);
end
whos a
toc

  Name Size Bytes Class Attributes
  a 1x100000 8400064 struct

Elapsed time is 118.482410 seconds.

This has grown by more than 100x. In fact,
it grew in time by a factor of roughly
10000, or 100^2. It is this quadratic
growth in time that I was trying to avoid
with the growdata codes.

How about growdata2 here?

tic
g=growdata2;
for i = 1:100000
  g(rand(1,3))
end
g=g();
whos g
toc

  Name Size Bytes Class Attributes
  g 100000x3 2400000 double

Elapsed time is 12.672389 seconds.

So if you are appending only a few thousand
elements, then either the structure or the
cell array will work quite well. If you
expect to append hundreds of thousands or
more chunks, then the hybrid approach in
the growdata tools is still more reasonable.

Your mileage may vary with a newer release
yet than R2006b. I've not gotten the latest
release installed until I install a new OS
version here.

John