Bad performance when setting first column element of a matrix

1 view (last 30 days)
I have a loop in which I read a lot of data and store it in a matrix, like this:
l = numel(c_d); % c_d is data to be stored
typeData.data{j}(row, 1:l) = c_d; % row and j have been determined earlier
The performance of this seemed bad, so I tried pre-allocating the matrix stored in typeData.data{j}. This did not seem to matter so I tried the following: (results of profiling in comments)
l = numel(c_d); % c_d is data to be stored
c_d = double(c_d); % c_d might be a uint8 so cast to double to make sure that doesn't matter.
temp = typeData.data{j};
[r, c] = size(temp);
if cnt > r || l > c
disp('Not allocated'); % Check on allocation. Never hit during profiling!
end
% c_d is a 4 element vector
if l == 4
temp(row, 1) = c_d(1); % Takes 1 to 1.5 s in profiler
temp(row, 2) = c_d(2); % Takes < 0.01 s
temp(row, 3) = c_d(3); % Takes < 0.01 s
temp(row, 4) = c_d(4); % Takes < 0.01 s
local(r, c) = NaN(r, c); % Pre allocate a local matrix
local(row, 1) = c_d(1); % Takes < 0.01 s
local(row, 2) = c_d(2); % Takes < 0.01 s
local(row, 3) = c_d(3); % Takes < 0.01 s
local(row, 4) = c_d(4); % Takes < 0.01 s
else
typeData.data{j}(row, 1:l) = c_d;
end
typeData.data{j} = temp;
This loop is run about 13000 times in the above example. Setting the first column element takes the most time.
I'm wondering why setting the first column element takes so much time. It seems something in the structure of the temp matrix is different than the local matrix, but I have no idea what it can be. According to the debugger, both local and xoop are equal sized matrices using doubles. Can someone shed some light on this?
Remarks:
  • The matrix stored in typeData.data{j} is pre-allocated using NaNs, but using something else, like zeros, does not matter.
  • Not Pre-allocating the matrix stored in typeData.data{j} does not give a (significant) difference for performance. Pre-allocating a much larger matrix degrades the performance.
  • Obviously there is more code around this, but I tried to keep this post small by not pasting all of it ;) Please ask about it!

Accepted Answer

Jan
Jan on 11 Oct 2011
This is the expected behaviour.
temp = typeData.data{j};
Now temp is a shared data copy or the j.th cell element. This means, that temp has an own header, but shares the data with the cell element.
temp(row, 1) = c_d(1); % Takes 1 to 1.5 s in profiler
If you modify temp, a deep data copy is created at first. This means that the data are duplicated and the modification is inserted in the new array. Of course, this is time consuming.
temp(row, 2) = c_d(2); % Takes < 0.01 s
Now only an element of temp is changed, which is fast.
As you see, pre-allocating the elements of a cell is not helpful, but wastes memory. Only pre-allocate the cell itself and create the data at once instead of copying them.
This should be fast even with pre-allocation, because it writes directly to the already reserved memory:
typeData.data{j}(row, 1:l) = c_d;
You can use this for further investigations:
format debug
Now you see the data pointer pr: In case of a shared data copy the pointer remains the same, for a deep copy you get a new pointer.
  1 Comment
Jan
Jan on 12 Oct 2011
Not sure I wanted to know that ;) , but it clearly explains the situation. It also explains why |typeData.data{j}(row, 1:l) = c_d| is slow, because typeData is itself a shared copy. Thanks a lot!

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!