Path: news.mathworks.com!newsfeed-00.mathworks.com!oleane.net!oleane!news.maxwell.syr.edu!news-rtr.nyroc.rr.com!news-out.nyroc.rr.com!twister.nyroc.rr.com.POSTED!53ab2750!not-for-mail
From: John D'Errico <woodchips@rochester.rr.com>
Newsgroups: comp.soft-sys.matlab
Mail-Copies-To: nobody
Subject: Re: growing array in Matlab?
Organization: I'm not really very organized
References: <ddvkd9$7ar$1@ruby.cit.cornell.edu> <woodchips-EBB412.12532017082005@syrcnyrdrs-02-ge0.nyroc.rr.com> <de02f3$k3c$1@ruby.cit.cornell.edu>
User-Agent: MT-NewsWatcher/3.4 (PPC Mac OS X)
Message-ID: <woodchips-EDAAE9.21275017082005@syrcnyrdrs-01-ge0.nyroc.rr.com>
Lines: 149
Date: Thu, 18 Aug 2005 01:27:50 GMT
NNTP-Posting-Host: 66.66.154.144
X-Complaints-To: abuse@rr.com
X-Trace: twister.nyroc.rr.com 1124328470 66.66.154.144 (Wed, 17 Aug 2005 21:27:50 EDT)
NNTP-Posting-Date: Wed, 17 Aug 2005 21:27:50 EDT
Xref: news.mathworks.com comp.soft-sys.matlab:296576


In article <de02f3$k3c$1@ruby.cit.cornell.edu>,
 "Stephen Vavasis" <vavasis@cs.cornell.edu> wrote:

> First, thanks for pointing out the obvious bug in my test case.  I renamed 
> the outer loop variable, and I got similar asymptotic results (naive 
> algorithm runs quadratically, block-add runs linearly).
> 
> With regard to a function like "grow_array"-- I don't see how such an 
> approach could ever be subquadratic in the case of appending one element at 
> a time.  The reason is that the idiom
>      a = fun(a,b,c,d);
> always makes a complete new copy of a, even if you change only one element, 
> because Matlab guarantees the "call-by-value" semantics.  See below for some 
> more timings to illustrate this; you can tell that copying is taking place 
> because the running time is growing proportionally to m*n.


Ok, I have a new version, written as an m-file, that is
truly linear in the time used. 

It also has the advantage that it will not fail if
millions of rows are appended, as the simple cell array
solution does.

The solution that I chose was simple. I never return an
argument until the end. I store everything of interest
in persistent variables. Mainly one cell array, with big
cells to minimize the concatenate problems at the end.
Each individual cell is big enough that most of the time
its just a direct insertion into an existing preallocated
array. I do grow the cell size when large sets of data
are appended.

The flaws?

1. It cannot be used to append to two different arrays
at once, because of the persistent array implementation.

2. Its considerably faster than grow_array, but still
not as fast for small sets of data as direct use of a
cell array, or directly appending rows of zeros coupled
with direct indexing.

The reason it is slower than the direct solutions is
probably due to the function call overhead.

These flaws are not that bad, since I only would use
it in cases where I am forced to append potentially
millions of times and I can't predict the final array
size to preallocate.

Any ideas to improve on this?

John


========================================================
function A=growdata(newdata)
% incremental growth of an array by appending rows
%
% usage always involves 3 steps.
%  (initial call):   growdata
%  (growth call):    growdata(newdata)
%  (final call):     A = growdata;
% 
%
% Example usage:
%  
%  growdata
%  for i = 1:100000
%    growdata(rand(1,3))
%  end
%  A = growdata;

% recall the stored data
persistent celldata n columncount rowcount totalrows
 
% which mode are we in?
if (nargout==0) && (nargin>0)
  % its a growth step
  
  % size of the appending data
  [r,c]=size(newdata);
  
  % was this the first call after initialization?
  if isnan(columncount)
    % this is actually an initialization step.
    % set the column count
    columncount = c;
    
    defaultrows = 20000;
    rowcount = max(defaultrows,r);
    
    celldata = {[newdata;zeros(rowcount - r,columncount)]};
    n = r;
    totalrows = n;
    
    if n==rowcount;
      celldata{end+1} = zeros(rowcount,columncount);
      n = 0;
    end
    
  else
    % its an appending call
    if c ~= columncount
      error 'Columns not compatible for appending'
    end
    
    % stuff into the last cell
    celldata{end}(n+(1:r),:) = newdata;
    n = n+r;
    totalrows = totalrows + r;
    
    % do we need to append another cell?
    if n>=rowcount
      rowcount = max(n,10*r);
      celldata{end+1} = zeros(rowcount,columncount);
      n = 0;
    end
    
  end
  
elseif (nargout==0) && (nargin==0)
  % its an initialization call, with no data
  columncount = NaN;
 
  defaultrows = 20000;
  rowcount = defaultrows;
    
  celldata = {zeros(rowcount,0)};
  n = 0;
  totalrows = 0;
  
elseif (nargout>0) && (nargin==0)
  % its an unpacking step
  
  % first drop any extraneous rows in the last cell
  celldata{end} = celldata{end}(1:n,:);
  
  % concatenate
  A = cat(1,celldata{:});
  
  % and clear the data
  clear celldata n columncount rowcount totalrows
  
else
  % cannot both append and unpack in the same step.
  error 'Cannot append more data and unpack in the same call'
end