Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
faster way to concatenate arrays?

Subject: faster way to concatenate arrays?

From: jarobert

Date: 14 Aug, 2010 23:42:39

Message: 1 of 6

I apologize if this has been addressed before. I'm wondering if there are any data concatenation techniques that could speed up piecewise concatenation of a bunch of multi-dimensional arrays without preallocating the arrays.

I'm doing a rolling fit of a model - I only want to keep some of the data in memory at a time, fit my model, drop some of the (older) data from one end and add some (newer) data to the other end and fit the next month, etc.

It's fastest to just concatenate all data at once and work on only the part of the array I need, but it consumes a lot of memory. I'm willing to give up some speed for efficiency, but right now the trade off is pretty extreme.



This illustrates my problem on a 3-d array, concatenated along the 1st dimension:


% Build some fake data. Assume dimensions are [days (30), predictors (10), minutes (1440) ]. This is 50 months of fits, with 50 months of data for each fit, so 100 months of data altogether. (OK, I could use 99 months...)


data = cell(100,1);

for ii=1:numel(zz)
    data{ii} = randn(30,10,1440);
end


% Concatenate across all months in the dates dimension.


tic;
x = cat(1,data{:});
toc


>> Elapsed time is 0.435406 seconds.


% Do a rolling concatenation, dropping the oldest month, adding the newest.

tic;
x = cat(1,data{1:50}); % build the initial matrix.
for ii=51:numel(data)
    x = cat(1,x(31:end,:,:),data{ii}); % update it on subsequent months.
end
toc


>> Elapsed time is 18.168127 seconds.



I suspect I'm out of luck, but is there any way to speed up the concatenation? It's ~ 40x as slow in this example.

Thanks,
John

Subject: faster way to concatenate arrays?

From: Roger Stafford

Date: 15 Aug, 2010 04:39:04

Message: 2 of 6

jarobert <jarobert.nyc@gmail.com> wrote in message <1196785851.5729.1281829389441.JavaMail.root@gallium.mathforum.org>...
> I apologize if this has been addressed before. I'm wondering if there are any data concatenation techniques that could speed up piecewise concatenation of a bunch of multi-dimensional arrays without preallocating the arrays.
>
> I'm doing a rolling fit of a model - I only want to keep some of the data in memory at a time, fit my model, drop some of the (older) data from one end and add some (newer) data to the other end and fit the next month, etc.
>
> It's fastest to just concatenate all data at once and work on only the part of the array I need, but it consumes a lot of memory. I'm willing to give up some speed for efficiency, but right now the trade off is pretty extreme.
>
>
>
> This illustrates my problem on a 3-d array, concatenated along the 1st dimension:
>
>
> % Build some fake data. Assume dimensions are [days (30), predictors (10), minutes (1440) ]. This is 50 months of fits, with 50 months of data for each fit, so 100 months of data altogether. (OK, I could use 99 months...)
>
>
> data = cell(100,1);
>
> for ii=1:numel(zz)
> data{ii} = randn(30,10,1440);
> end
>
>
> % Concatenate across all months in the dates dimension.
>
>
> tic;
> x = cat(1,data{:});
> toc
>
>
> >> Elapsed time is 0.435406 seconds.
>
>
> % Do a rolling concatenation, dropping the oldest month, adding the newest.
>
> tic;
> x = cat(1,data{1:50}); % build the initial matrix.
> for ii=51:numel(data)
> x = cat(1,x(31:end,:,:),data{ii}); % update it on subsequent months.
> end
> toc
>
>
> >> Elapsed time is 18.168127 seconds.
>
>
>
> I suspect I'm out of luck, but is there any way to speed up the concatenation? It's ~ 40x as slow in this example.
>
> Thanks,
> John
- - - - - - - - - -
  I think if I were faced with your kind of problem I would seriously consider using a "virtual" access scheme on your arrays rather than doing numerous concatenations. As you have shown, concatenation on a large array requires an inordinate amount of displacement of data and is therefore profligate of cpu time.

  When you would ordinarily wish to use cat to extend an array for placing further data in it and at the same time an earlier portion of that same array is no longer in use, why not compute an appropriate offset to the index you would usually use so as to utilize the unused portion instead of creating a new one? Subtraction of offsets from indices, while adding to the complexity of programming, is very much faster that the concatenation process. You could use for example a for-loop with an index referring to a non-existant part of an array provided that you were careful to always replace it with a corrected version before actually indexing into the array. (Note: You can't correct a for-loop index itself - you need to use a separate altered index variable.)

  In case of desperation you could even handle a couple of non-contiguous intervals within an array, say at the end and wrapping around to the beginning, as if they were contiguous and beyond the limits of the actual array by making clever use of the mod function with your indices.

Roger Stafford

Subject: faster way to concatenate arrays?

From: Walter Roberson

Date: 15 Aug, 2010 16:28:50

Message: 3 of 6

jarobert wrote:

> I'm doing a rolling fit of a model - I only want to keep some of the data in memory at a time, fit my model, drop some of the (older) data from one end and add some (newer) data to the other end and fit the next month, etc.
>
> It's fastest to just concatenate all data at once and work on only the part of the array I need, but it consumes a lot of memory. I'm willing to give up some speed for efficiency, but right now the trade off is pretty extreme.

Do some buffering: concatenate as much as you can afford to at one time,
process subsections of that, retain the last chunk of the buffer and
concatenate on as much more as you can afford, process subsections of
that, and so on.

Subject: faster way to concatenate arrays?

From: jarobert

Date: 15 Aug, 2010 17:24:14

Message: 4 of 6

Thanks to both of you who have replied so far. I like both of these ideas. I will try them out and see which performs better. Neither one should be very hard to implement.

Thanks again.

John

Subject: faster way to concatenate arrays?

From: Rune Allnor

Date: 16 Aug, 2010 03:57:54

Message: 5 of 6

On 15 Aug, 01:42, jarobert <jarobert....@gmail.com> wrote:
> I apologize if this has been addressed before.  I'm wondering if there are any data concatenation techniques that could speed up piecewise concatenation of a bunch of multi-dimensional arrays without preallocating the arrays.
>
> I'm doing a rolling fit of a model - I only want to keep some of the data in memory at a time, fit my model, drop some of the (older) data from one end and add some (newer) data to the other end and fit the next month, etc.
>
> It's fastest to just concatenate all data at once and work on only the part of the array I need, but it consumes a lot of memory.  I'm willing to give up some speed for efficiency, but right now the trade off is pretty extreme.

Welcome to the real world.

That's the trade-off programmers have been addressing since the
dawn of time; since people started doing computations.

I am assuming you get the data from files. If so, the two-step
combo that might reduce speed signifcantly, is to

1) Reduce the number of file read operations
2) Reduce the number of memory allocations

Only one of the two doesn't matter much; you need both.

The problem with matlab is that each new concat two expensive
operations:

1) Allocate new memory that can hold *both* what you have already
   loaded *and* what is to be loaded in the present iteration
2) Copy *everything*, both old and new data, into this new array.

This repeated copying is expensive, and the main reason why
your cprogram is slow. A programming language with more flexible
memory managment mechanisms would allow you to allocate only
the memory needed to store the data that was recently loaded,
leaving all the old data alone. With the obvious time savings
since no old data are copied.

Rune

Subject: faster way to concatenate arrays?

From: Bruno Luong

Date: 16 Aug, 2010 08:39:05

Message: 6 of 6

jarobert <jarobert.nyc@gmail.com> wrote in message <1196785851.5729.1281829389441.JavaMail.root@gallium.mathforum.org>...
> I apologize if this has been addressed before. I'm wondering if there are any data concatenation techniques that could speed up piecewise concatenation of a bunch of multi-dimensional arrays without preallocating the arrays.
>
> I'm doing a rolling fit of a model - I only want to keep some of the data in memory at a time, fit my model, drop some of the (older) data from one end and add some (newer) data to the other end and fit the next month, etc.

If you are willing to keep all the data in memory, there is a way to do rolling access to the array without copying the memory around. For this you need to use my FEX submissiom called InplaceArray here: http://www.mathworks.com/matlabcentral/fileexchange/24576

To use this package, it is also important that you reorganize the data so that the rolling dimension (day) is the last one (3). I provide below a code to illustrate based on the example you gave earlier.

If you can't afford the memory to concatenate all of them, you might cut them by chunk.

%%

data = cell(100,1);

for ii=1:numel(data)
    data{ii} = randn(10,1440,30);
end

tic;
x = cat(3,data{:});

szwin = size(x);
szwin(3) = 50;
for ii=50:numel(data)
    % xi is the same as rolling windows x(:,:,ii-50+(1:50))
    % for exampke for ii=60, xi is x(:,:,11:60))
    offset = (ii-50)*szwin(1)*szwin(2);
    xi = inplacearray(x, offset, szwin); % FEX
    % do the work with xi
    % ...
    
    if ~isequal(xi,x(:,:,ii-50+(1:50)))
        fprintf('never get here\n')
    end
    
    % important instruction
    releaseinplace(xi); % FEX
end
toc

% Bruno

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us