Path: news.mathworks.com!newsfeed-00.mathworks.com!newsfeed2.dallas1.level3.net!news.level3.com!postnews.google.com!r3g2000prh.googlegroups.com!not-for-mail
From: NZTideMan <mulgor@gmail.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: decrease in speed due to appending a row vector
Date: 24 May 2007 20:35:02 -0700
Organization: http://groups.google.com
Lines: 181
Message-ID: <1180064102.055006.108000@r3g2000prh.googlegroups.com>
References: <ef580c1.-1@webcrossing.raydaftYaTP>
NNTP-Posting-Host: 202.78.152.105
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
X-Trace: posting.google.com 1180064153 25486 127.0.0.1 (25 May 2007 03:35:53 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Fri, 25 May 2007 03:35:53 +0000 (UTC)
In-Reply-To: <ef580c1.5@webcrossing.raydaftYaTP>
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Maxthon; .NET CLR 1.1.4322),gzip(gfe),gzip(gfe)
Complaints-To: groups-abuse@google.com
Injection-Info: r3g2000prh.googlegroups.com; posting-host=202.78.152.105;
Xref: news.mathworks.com comp.soft-sys.matlab:410984



On May 25, 2:50 pm, "John D'Errico" <woodch...@rochester.rr.com>
wrote:
> NZTideMan wrote:
>
> > On May 25, 12:35 pm, "John D'Errico"
>
> <woodch...@rochester.rr.com>
>
>
>
>
>
> > wrote:
> >> Brian wrote:
>
> >> > I don't know how big will the vector grow each time. The
> space
> >> > depends on the input data which will change most of the
> time
> > and I
> >> > have no way of know the final size.
>
> >> Take a look at my growdata or growdata2
> >> codes. They came from discussions here
> >> on c.s-s.m, and were written to allow
> >> arrays to grow in size with a minimum
> >> penalty in time. My goal was to allow
> >> arrays to grow in size to potentially
> >> millions of elements, without advance
> >> knowledge of the final count. These codes
> >> use cell arrays, each cell of which is
> >> preallocated to use moderately large
> >> but still manageable chunks of memory.
> >> Arrays with huge numbers of cells can
> >> have problems too.
>
> >> <http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objec...>
>
> >> Of course, its always better to just
> >> preallocate your large arrays whenever
> >> you can.
>
> >> HTh,
> >> John
>
> > I didn't know about your growdata routines, John, so I have devised
> > the following way for accumulating results of a Monte Carlo
> > simulation:
> > I generate a structure:
> > s(irun).ymx=a;
> > where irun is the Monte Carlo run number, and the result of the
> > simulation a(1,n) is a row vector where n is the number of events
> > (drawn from a Poisson distribution). n varies from run to run.
> > I understand that when you do it this way, each array in s uses
> > contiguous memory, but they are not contiguous within the
> > structure.
> > Thus, the memory allocation problem does not arise.
> > I'd appreciate your comments on this.
>
> This is also a reassonable way to solve the
> problem.
>
> Use of a single cell for each new appended
> element is another way that is very similar
> to the structure approach that you used.
> Use of pointers here is more efficient
> than growing and re-allocating the entire
> array. I'm not sure about use of structures,
> as I thought that structures were not
> efficient users of memory as compared to
> cell arrays. This is probably changing
> with each matlab release of course. (See
> below for a comparison.)
>
> An issue is how many total elements that
> you may expect to see. In some earlier
> testing, I found that cell arrays with
> millions of cells caused some serious
> problems, however a few thousand, or even
> ten thousand or so cells was not a problem.
>
> I am starting to wonder if its time to
> reassess those growdata codes as Matlab
> evolves. In some recent tests it seemed
> that the problem I had seen with creating
> large cell arrays with millions of elements
> may no longer be as much of an issue in
> recent releases. The underlying matlab
> codes must have been improved. I'll want
> to compare how use of a structure compares
> to a cell array for memory and speed.
>
> Some quick testing shows a structure and
> a cell array to be similar in time and
> memory here.
>
> tic
> for i = 1:1000
>   a(i).X =rand(1,3);
> end,
> whos a
> toc
>
>   Name Size Bytes Class Attributes
>   a 1x1000 84064 struct
>
> Elapsed time is 0.023403 seconds.
>
> tic
> for i = 1:1000
>   b{i} =rand(1,3);
> end
> whos b
> toc
>
>   Name Size Bytes Class Attributes
>   b 1x1000 84000 cell
>
> Elapsed time is 0.022017 seconds.
>
> The question is, do these scale well with
> larger numbers of append operations?
>
> tic
> for i = 1:100000
>   a(i).X =rand(1,3);
> end
> whos a
> toc
>
>   Name Size Bytes Class Attributes
>   a 1x100000 8400064 struct
>
> Elapsed time is 118.482410 seconds.
>
> This has grown by more than 100x. In fact,
> it grew in time by a factor of roughly
> 10000, or 100^2. It is this quadratic
> growth in time that I was trying to avoid
> with the growdata codes.
>
> How about growdata2 here?
>
> tic
> g=growdata2;
> for i = 1:100000
>   g(rand(1,3))
> end
> g=g();
> whos g
> toc
>
>   Name Size Bytes Class Attributes
>   g 100000x3 2400000 double
>
> Elapsed time is 12.672389 seconds.
>
> So if you are appending only a few thousand
> elements, then either the structure or the
> cell array will work quite well. If you
> expect to append hundreds of thousands or
> more chunks, then the hybrid approach in
> the growdata tools is still more reasonable.
>
> Your mileage may vary with a newer release
> yet than R2006b. I've not gotten the latest
> release installed until I install a new OS
> version here.
>
> John- Hide quoted text -
>
> - Show quoted text -

That's pretty convincing!
I guess the lesson is that if you have toy problems, using a structure
works fine, but for real-world problems growdata2 is the way to go.
But an advantage of the structure approach is that at the end I have
all the runs separated.
Of course, I can concatenate them if I choose using square brackets.