Path: news.mathworks.com!not-for-mail
From: Loren Shure <loren@mathworks.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: decrease in speed due to appending a row vector
Date: Mon, 28 May 2007 01:23:09 -0400
Organization: The MathWorks
Lines: 174
Message-ID: <MPG.20c4314b49ef68e7989746@news.mathworks.com>
References: <ef580c1.-1@webcrossing.raydaftYaTP> <ef580c1.1@webcrossing.raydaftYaTP> <ef580c1.2@webcrossing.raydaftYaTP> <1180055867.161190.184610@u36g2000prd.googlegroups.com> <ef580c1.5@webcrossing.raydaftYaTP>
NNTP-Posting-Host: shurel.mathworks.co.kr
Mime-Version: 1.0
Content-Type: text/plain; charset="iso-8859-15"
Content-Transfer-Encoding: 7bit
X-Trace: fred.mathworks.com 1180329794 2861 172.18.27.116 (28 May 2007 05:23:14 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Mon, 28 May 2007 05:23:14 +0000 (UTC)
User-Agent: MicroPlanet-Gravity/2.70.2067
Xref: news.mathworks.com comp.soft-sys.matlab:411392



In article <ef580c1.5@webcrossing.raydaftYaTP>, 
woodchips@rochester.rr.com says...
> NZTideMan wrote:
> >
> >
> > On May 25, 12:35 pm, "John D'Errico"
> <woodch...@rochester.rr.com>
> > wrote:
> >> Brian wrote:
> >>
> >> > I don't know how big will the vector grow each time. The
> space
> >> > depends on the input data which will change most of the
> time
> > and I
> >> > have no way of know the final size.
> >>
> >> Take a look at my growdata or growdata2
> >> codes. They came from discussions here
> >> on c.s-s.m, and were written to allow
> >> arrays to grow in size with a minimum
> >> penalty in time. My goal was to allow
> >> arrays to grow in size to potentially
> >> millions of elements, without advance
> >> knowledge of the final count. These codes
> >> use cell arrays, each cell of which is
> >> preallocated to use moderately large
> >> but still manageable chunks of memory.
> >> Arrays with huge numbers of cells can
> >> have problems too.
> >>
> >> <http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objec...>
> >>
> >> Of course, its always better to just
> >> preallocate your large arrays whenever
> >> you can.
> >>
> >> HTh,
> >> John
> >
> > I didn't know about your growdata routines, John, so I have devised
> > the following way for accumulating results of a Monte Carlo
> > simulation:
> > I generate a structure:
> > s(irun).ymx=a;
> > where irun is the Monte Carlo run number, and the result of the
> > simulation a(1,n) is a row vector where n is the number of events
> > (drawn from a Poisson distribution). n varies from run to run.
> > I understand that when you do it this way, each array in s uses
> > contiguous memory, but they are not contiguous within the
> > structure.
> > Thus, the memory allocation problem does not arise.
> > I'd appreciate your comments on this.
> 
> This is also a reassonable way to solve the
> problem.
> 
> Use of a single cell for each new appended
> element is another way that is very similar
> to the structure approach that you used.
> Use of pointers here is more efficient
> than growing and re-allocating the entire
> array. I'm not sure about use of structures,
> as I thought that structures were not
> efficient users of memory as compared to
> cell arrays. This is probably changing
> with each matlab release of course. (See
> below for a comparison.)
> 
> An issue is how many total elements that
> you may expect to see. In some earlier
> testing, I found that cell arrays with
> millions of cells caused some serious
> problems, however a few thousand, or even
> ten thousand or so cells was not a problem.
> 
> I am starting to wonder if its time to
> reassess those growdata codes as Matlab
> evolves. In some recent tests it seemed
> that the problem I had seen with creating
> large cell arrays with millions of elements
> may no longer be as much of an issue in
> recent releases. The underlying matlab
> codes must have been improved. I'll want
> to compare how use of a structure compares
> to a cell array for memory and speed.
> 
> Some quick testing shows a structure and
> a cell array to be similar in time and
> memory here.
> 
> tic
> for i = 1:1000
>   a(i).X =rand(1,3);
> end,
> whos a
> toc
> 
>   Name Size Bytes Class Attributes
>   a 1x1000 84064 struct
> 
> Elapsed time is 0.023403 seconds.
> 
> tic
> for i = 1:1000
>   b{i} =rand(1,3);
> end
> whos b
> toc
> 
>   Name Size Bytes Class Attributes
>   b 1x1000 84000 cell
> 
> Elapsed time is 0.022017 seconds.
> 
> The question is, do these scale well with
> larger numbers of append operations?
> 
> tic
> for i = 1:100000
>   a(i).X =rand(1,3);
> end
> whos a
> toc
> 
>   Name Size Bytes Class Attributes
>   a 1x100000 8400064 struct
> 
> Elapsed time is 118.482410 seconds.
> 
> This has grown by more than 100x. In fact,
> it grew in time by a factor of roughly
> 10000, or 100^2. It is this quadratic
> growth in time that I was trying to avoid
> with the growdata codes.
> 
> How about growdata2 here?
> 
> tic
> g=growdata2;
> for i = 1:100000
>   g(rand(1,3))
> end
> g=g();
> whos g
> toc
> 
>   Name Size Bytes Class Attributes
>   g 100000x3 2400000 double
> 
> Elapsed time is 12.672389 seconds.
> 
> So if you are appending only a few thousand
> elements, then either the structure or the
> cell array will work quite well. If you
> expect to append hundreds of thousands or
> more chunks, then the hybrid approach in
> the growdata tools is still more reasonable.
> 
> Your mileage may vary with a newer release
> yet than R2006b. I've not gotten the latest
> release installed until I install a new OS
> version here.
> 
> John
> 

structs and cells should generally perform about the same.  Depends on 
what you are doing (accessing, appending, overwriting), where you use 
them (command line, function, script), but structs have a slight to 
larger advantage sometimes.

-- Loren
http://blogs.mathworks.com/loren/