## Allocating data into a structure faster

on 9 Sep 2011

Jan

I'm trying to improve the efficiency of my code. Essentially I'm importing data into matlab every couple seconds, the data is a cell array with 3 columns for each data point [time, name, data], like this:
A = [7.3468e+005] 'M001' [0.5000]
[7.3468e+005] 'M016' [0.0861]
[7.3468e+005] 'M002' [0.2693]
[7.3468e+005] 'M009' [0.7381]
[7.3468e+005] 'M004' [ 10]
And I want to organize it based on the data names. Currently I'm using a for loop and dynamic field names to put the data into a structure like this:
for xx = 1:1:A_length
data.(A{xx,2}).time = [data.(A{xx,2}).time([2:2500]); A{xx,1}];
data.(A{xx,2}).data = [data.(A{xx,2}).data([2:2500]); A{xx,3}];
end
So the newest data is always at the end and the structure is always preallocated to 2500 data points.
I'm looking for ideas to speed this up. I was hoping to avoid the for loop all together, like this:
data.(A{:,2}).time = [data.(A{:,2}).time([2:2500]); A{:,1}];
data.(A{:,2}).data = [data.(A{:,2}).data([2:2500]); A{:,3}];
But I've been told that's a no go. Any ideas?

Fangjun Jiang

Fangjun Jiang

on 9 Sep 2011
Do you have freedom to re-design your structure? I would use A.M016=[7.3468e+005,0.0861]; for example to reduce the level of structure.
Chris

Chris

on 15 Sep 2011
I might be able to, I'd have to look into it. But even if I could put A into a better format I still have to sort it into the data structure using a for loop.
I'm not sure that
data.([data_name]).time = A.([data_name])(2);
is any faster than
data.([data_name]).time = A{xx,3};

Jan

on 9 Sep 2011

Under MATLAB 2009a time([2:2500]) is 10 times slower than time(2:2500). Actaully [a:b] creates the vector a:b and converts it to a vector again, which does not have such a big overhead. But internally time(a:b) checks only for a and b if they exceed the dimensions, and for time([a:b]) this check is performed for each element of the index vector.
Another explanation: in time([a:b]) the vector a:b is created explictely, but in time(a:b) it is not.
Getting A{xx,2} repeatedly wastes time. Better create a temporary variable:
for xx = 1:A_length
tmp = A{xx,2};
data.(tmp).time = [data.(tmp).time(2:2500); A{xx,1}];
data.(tmp).data = [data.(tmp).data(2:2500); A{xx,3}];
end
Nevertheless, using a different data structure would be much faster, e.g. collecting time and data in a matrix and the names 'M001' etc in a cell string. Then the vectorization would be possible and easy.

Chris

Chris

on 13 Sep 2011
Both ideas helped. Looks like the script is running 30-50% faster now. Thanks.