Incremental growth of an array by concatenation (use of horzcat, vertcat, cat, or []) forces MATLAB to dynamically reallocate memory each time the array grows in size. For large arrays this can be very consumptive of memory (since all arrays must be contiguous in RAM) and of time.
The best solution, whenever possible, is to pre-allocate any arrays which are to be grown to a known size. Then use direct matrix indexing to insert new elements. For arrays which have an unknown final size, incremental growth is necessary. When that growth takes many thousands or even millions of increments, then dynamic concatenation becomes a misery. If you know the maximum size an array may ever be, then pre-allocate the array with zeros, then throw away the extraneous rows at the end. Sometimes the maximum size is not known however.
My old code - grow_array, was one such solution, but it too suffered from a slowdown for very large problems. (Grow_array sufferes from quadratic growth - doubling the number of concatenations will multiply the total time by 4.) It was also pointed out that use of a simple cell array, followed by concatenation of those cells into a matrix was faster. However concatenation of cells also fails when hundreds of thousands of cells must be concatenated.
The files growdata and growdata2 are an attempt to fix those problems. They use cell concatenation, but in large blocks to avoid problems with too many cells. They also avoid the problems that grow_array had, by avoiding variable copies. The two files use two subtly diferent strategies, each of which has its own subtle flaws.
Growdata uses internal persistent variables. For this reason, one can only grow at most ONE array at a time using that utility.
Growdata2 allows the user to grow multiple arrays in parallel, using nested arrays and function handles, a facility that only became available with release 14 of MATLAB. Also, growdata2 seems to be consistently twice as slow as growdata. This appears to be due to the overhead associated with nested functions in my release of MATLAB.
Both growdata and growdata2 are linear in the time required as a function of the number of increments made. It is still true that both of these solutions will be slower than direct indexing into a pre-allocated array, but there are times when pre-allocation just won't work.
Example usage of growdata:
growdata
for i = 1:100000
growdata(rand(1,3))
end
A = growdata;
Example usage of growdata2:
fun = growdata;
for i = 1:100000
fun(rand(1,3))
end
A = fun();
Example (parallel) usage of growdata2:
% append as new rows (by default)
fun1 = growdata2;
% append as new columns
fun2 = growdata2('columns');
% append as new rows
fun3 = growdata2('rows');
for i = 1:10000
fun1(rand(2,3))
fun2(sin(rand(2,1)))
fun3(rand(round(rand(1)*2),2))
end
A = fun1();
B = fun2();
C = fun3();
In this case, A will have final size [20000,3], B will have size [2, 10000], and C will have a random number of rows, with 2 columns.
These files were inspired by a c.s-s.m thread on efficient incremental growth of an array. I would appreciate any new ideas on this subject.
Contributors: Stephen Vavasis, Loren Shure, John Creighton, Steve Amphlett, John D'Errico |