Cell Array Size and Saving

30 views (last 30 days)
Sven
Sven on 11 Jul 2018
Commented: Sven on 12 Jul 2018
Hi,
I wanted to save my quite complex and large class to a file and experienced a much larger filesize than I would have expected. So I examined which parts were driving the size. I was quite surprised how simple cell arrays of string consumed overdimensional space. Is there any easy way to avoid this?
Here my MWE:
Names50 = cell(50,1);
Names2 = cell(2,1);
for i=1:length(Names2)
Names{i} = 'a';
end % for i
for i=1:length(Names50)
Names{i} = 'b';
end % for i
When I check for saving size with a small routine I found, I get quite confusing results:
getSize(Names2) --> 228
getSize(Names50) --> 5700
getSize(Names2{1}) --> 2
The single element is just 2 bytes, while a cell array of 2*2 bytes is 228, or even 5700 if there are 50 rows. Is the overhead so unproportional large in cell arrays? Can that somehow be avoided when saving?
Thanks in advance
Best
Sven
P.S.: Codes for getSize:
function [ bytes ] = getSize( variable )
props = properties(variable);
if size(props, 1) < 1, bytes = whos(varname(variable)); bytes = bytes.bytes;
else %code of Dmitry
bytes = 0;
for ii=1:length(props)
currentProperty = getfield(variable, char(props(ii)));
s = whos(varname(currentProperty));
fprintf('Property: %s : %d bytes\n',props{ii},s.bytes)
bytes = bytes + s.bytes;
end
end
end
function [ name ] = varname( ~ )
name = inputname(1);
end

Accepted Answer

Guillaume
Guillaume on 11 Jul 2018
Yes, there is necessary overhead for cell arrays. Note that whos (which your getsize uses|) does not actually show all the memory used by variables.
By necessity a cell array cannot just store the content of the data (your 2 bytes consumed by 'a'). It also needs to store:
  • where that content is actually stored in memory (since the content of the cell array can be anything, the content is not actually stored inside the cell array, just a pointer to the content)
  • the matrix header for that content which includes:
  • the type of content
  • how many dimensions that content has
  • the length of each dimension of that content
This result in an overhead of 112 bytes per non-empty cell (empty cells only need 8 bytes to store a null pointer)
To that you need to add more bytes that whos doesn't show and that are required for every variable in matlab:
  • the type of the variable (i.e it's a cell array)
  • how many dimensions that variable has
  • the length of each dimension
  1 Comment
Sven
Sven on 12 Jul 2018
Thank you very much for this detailed answer. I feared there was an overhead, but did not expect it to be that large and for each cell. So I guess there is no workaround.

Sign in to comment.

More Answers (0)

Products


Release

R2016a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!