Path: news.mathworks.com!newsfeed-00.mathworks.com!oleane.net!oleane!news.maxwell.syr.edu!news-rtr.nyroc.rr.com!news-out.nyroc.rr.com!twister.nyroc.rr.com.POSTED!53ab2750!not-for-mail From: John D'Errico <woodchips@rochester.rr.com> Newsgroups: comp.soft-sys.matlab Mail-Copies-To: nobody Subject: Re: growing array in Matlab? Organization: I'm not really very organized References: <ddvkd9$7ar$1@ruby.cit.cornell.edu> <woodchips-EBB412.12532017082005@syrcnyrdrs-02-ge0.nyroc.rr.com> <de02f3$k3c$1@ruby.cit.cornell.edu> User-Agent: MT-NewsWatcher/3.4 (PPC Mac OS X) Message-ID: <woodchips-EDAAE9.21275017082005@syrcnyrdrs-01-ge0.nyroc.rr.com> Lines: 149 Date: Thu, 18 Aug 2005 01:27:50 GMT NNTP-Posting-Host: 66.66.154.144 X-Complaints-To: abuse@rr.com X-Trace: twister.nyroc.rr.com 1124328470 66.66.154.144 (Wed, 17 Aug 2005 21:27:50 EDT) NNTP-Posting-Date: Wed, 17 Aug 2005 21:27:50 EDT Xref: news.mathworks.com comp.soft-sys.matlab:296576 In article <de02f3$k3c$1@ruby.cit.cornell.edu>, "Stephen Vavasis" <vavasis@cs.cornell.edu> wrote: > First, thanks for pointing out the obvious bug in my test case. I renamed > the outer loop variable, and I got similar asymptotic results (naive > algorithm runs quadratically, block-add runs linearly). > > With regard to a function like "grow_array"-- I don't see how such an > approach could ever be subquadratic in the case of appending one element at > a time. The reason is that the idiom > a = fun(a,b,c,d); > always makes a complete new copy of a, even if you change only one element, > because Matlab guarantees the "call-by-value" semantics. See below for some > more timings to illustrate this; you can tell that copying is taking place > because the running time is growing proportionally to m*n. Ok, I have a new version, written as an m-file, that is truly linear in the time used. It also has the advantage that it will not fail if millions of rows are appended, as the simple cell array solution does. The solution that I chose was simple. I never return an argument until the end. I store everything of interest in persistent variables. Mainly one cell array, with big cells to minimize the concatenate problems at the end. Each individual cell is big enough that most of the time its just a direct insertion into an existing preallocated array. I do grow the cell size when large sets of data are appended. The flaws? 1. It cannot be used to append to two different arrays at once, because of the persistent array implementation. 2. Its considerably faster than grow_array, but still not as fast for small sets of data as direct use of a cell array, or directly appending rows of zeros coupled with direct indexing. The reason it is slower than the direct solutions is probably due to the function call overhead. These flaws are not that bad, since I only would use it in cases where I am forced to append potentially millions of times and I can't predict the final array size to preallocate. Any ideas to improve on this? John ======================================================== function A=growdata(newdata) % incremental growth of an array by appending rows % % usage always involves 3 steps. % (initial call): growdata % (growth call): growdata(newdata) % (final call): A = growdata; % % % Example usage: % % growdata % for i = 1:100000 % growdata(rand(1,3)) % end % A = growdata; % recall the stored data persistent celldata n columncount rowcount totalrows % which mode are we in? if (nargout==0) && (nargin>0) % its a growth step % size of the appending data [r,c]=size(newdata); % was this the first call after initialization? if isnan(columncount) % this is actually an initialization step. % set the column count columncount = c; defaultrows = 20000; rowcount = max(defaultrows,r); celldata = {[newdata;zeros(rowcount - r,columncount)]}; n = r; totalrows = n; if n==rowcount; celldata{end+1} = zeros(rowcount,columncount); n = 0; end else % its an appending call if c ~= columncount error 'Columns not compatible for appending' end % stuff into the last cell celldata{end}(n+(1:r),:) = newdata; n = n+r; totalrows = totalrows + r; % do we need to append another cell? if n>=rowcount rowcount = max(n,10*r); celldata{end+1} = zeros(rowcount,columncount); n = 0; end end elseif (nargout==0) && (nargin==0) % its an initialization call, with no data columncount = NaN; defaultrows = 20000; rowcount = defaultrows; celldata = {zeros(rowcount,0)}; n = 0; totalrows = 0; elseif (nargout>0) && (nargin==0) % its an unpacking step % first drop any extraneous rows in the last cell celldata{end} = celldata{end}(1:n,:); % concatenate A = cat(1,celldata{:}); % and clear the data clear celldata n columncount rowcount totalrows else % cannot both append and unpack in the same step. error 'Cannot append more data and unpack in the same call' end