performance when copying data from object array to cell

Hi, I encounter performance issues, when copying data from an object property to a cell. My test class looks as follows:
classdef Class < handle
%CLASS Summary of this class goes here
% Detailed explanation goes here
properties
data = [1 2 3 4 5];
end
methods
end
end
My test script is the following:
% settings
N = 20000; % try 1000/10000
% create N objects
cls(N) = Class();
% collect data
tic;
data = {cls.data};
toc;
% result:
% N = 1000: T = 0.008s
% N = 10000: T = 0.6s
% N = 20000: T = 2.4s
I'd expect linear scaling of computation time with array size. This however does not hold. Can someone give a hint about how to increase copy performance in this example? Is there a reason why it does not scale linearly?
Thanks, Daniel

 Accepted Answer

Darim,
Since you are not pre-allocating the data cell, Matlab is probably expanding the size of data iteratively. With pre-allocation the timing is linear. Try for yourself.
% settings
N = 20000; % try 1000/10000
% create N objects
cls(N) = Class();
% collect data
tic;
data = cell(1,N);
for i = 1:numel(cls),
data{i} = cls(i).data;
end
toc;

8 Comments

Kirby,
this solves my problem, thanks. I've assumed that avoiding for-loops would speed up Matlab in most cases, but in this case it is obviously not advised.
Is there any solution without the for-loop?
Thanks again, Daniel
Kirby,
I've stated an enhanced Problem, i.e. I'd like to dynamically access the property. This is my test script and result on the current machine I am working on:
%%initialize
clear all
%%create peers
% settings
N = 30000; % try 1000/10000/20000/30000
% create N objects
cls(N) = Class();
%%Orignial solution
clear('data')
% collect data
tic;
data = {cls.data};
toc;
% result:
% N = 1000: T = 0.003s
% N = 10000: T = 0.32s
% N = 20000: T = 1.11s
% N = 30000: T = 2.44s
%%Kirby solution
clear('data');
% collect data
tic;
data = cell(1,N);
for i = 1:numel(cls),
data{i} = cls(i).data;
end
toc;
% result:
% N = 1000: T = 0.011s
% N = 10000: T = 0.063s
% N = 20000: T = 0.12s
% N = 30000: T = 0.18s
%%Enhanced problem
clear('data');
% Now we'd like to access the property dynamically
pName = 'data';
tic;
data = cell(1,N);
for i = 1:numel(cls),
data{i} = cls(i).(pName);
end
toc;
% result:
% N = 1000: T = 0.031s
% N = 10000: T = 0.33s
% N = 20000: T = 0.52s
% N = 30000: T = 0.77s
It is slower than your solution, due to dynamic access of the property data via string pName. Matlab jit very likely is not able to optimize this.
Can you suggest a solution for the enhanced problem, which has performance similar to your previous solution?
Thanks again, Daniel
data = { cls.( pName ) };
works fast on my PC when I run that. I imagine in this case the overhead of resolving the dynamic string 30,000 times is what causes the overhead whereas returning to the single line approach removes this.
Your original solution is fastest of all in my run, using R2016b, even if I increase N to 100,000. This option is not much slower, though tic-toc is not a reliable way to measure speed in general.
e.g. for N = 100,000 I get:
Elapsed time is 0.076481 seconds.
Elapsed time is 0.185008 seconds.
Elapsed time is 1.158645 seconds.
Elapsed time is 0.097908 seconds.
Adam,
your solution correponds to my original approach. I consider this to be the neatest version, as we don't have to consider pre-allocation of memory.
I am using R2015b, though and I think some major jit-Compiler improvements were made for R2016.
I'd like to be compatible with earlier Matlab versions and therefore I am looking for the best solution 2015b.
Can you suggest a workaround, which provides similar performance?
Thanks, Daniel
Being compatible with earlier versions of Matlab from a performance point of view is very difficult given the continuous improvement in performance of many areas of Matlab. Backward compatibility of function names is relatively easy, but as you know, with different versions and improvements different solutions become more performant.
What time does
data = { cls.( pName ) };
give on your machine as this wasn't included in your timings, esp. compared to your other dynamic string approach?
It may be that there isn't a fast way to do this in R2015 which is why it was improved for R2016
Timing is the same as for
data = {cls.data}
i.e.
% result:
% N = 1000: T = 0.003s
% N = 10000: T = 0.32s
% N = 20000: T = 1.11s
% N = 30000: T = 2.44s
Which makes sense since the dynamic property access has to be resolved only once.
In addition, I've quickly installed 2016b an can confirm your numbers:
%%Adam solution (use 2016b)
tic;
data = { cls.( pName ) };
toc;
% result:
% N = 1000: T = 0.0009s
% N = 10000: T = 0.007s
% N = 20000: T = 0.014s
% N = 30000: T = 0.018s
So, from this perspective, the neat solution looks great.
Thanks, Daniel
Adam's approach is definitely best for the latest Matlab release. As for 2015b, I don't see a direct way to dynamically access a single property from the class without suffering the string resolution time.
If you know all the properties you might want to extract from your class, and if the contents of those properties are not too large, you could extract all properties during the loop (using hard coded names) into a MxN cell array with corresponding collection of property name strings like {'prop1','prop2',...,'propN'}.
After the loop, you can extract a specific property from the MxN cell array as needed. The performance of this approach relies entirely on the class you're working with. It might be faster than dynamic property access in your case.
Kirby, Adam,
since you were so supportive regarding my problem, I'd like to share the information I've received from Matlab support.
[Quote:] "The test case creates N number of objects with data that are assigned to exactly the same array. This causes MATLAB to create a long list of arrays sharing the same data to avoid making data copies. However, when creating the cell array, MATLAB ends up traversing this long list for each element. An optimization to handle this case better was introduced in 2016a.
In a real-world scenario, would thousands of the objects really still have the default value? You can see that the timing is linear as expected when using random data in each object instance:"
They enhanced the class:
%File: TestClass1.m
classdef TestClass1
%CLASS Summary of this class goes here
% Detailed explanation goes here
properties
data %= [1 2 3 4 5];
end
methods
function obj = TestClass1
obj.data = rand(1,5);
end
end
end
They tested:
%File: runTestClass1.m
function runTestClass1
runOneTest(1000);
runOneTest(2000);
runOneTest(10000);
end
function runOneTest(N)
% create N objects
for k = N:-1:1
cls(k) = TestClass1;
end
% collect data
tic;
data = {cls.data};
toc;
end
And they received:
>> runTestClass1
Elapsed time is 0.000493 seconds.
Elapsed time is 0.000863 seconds.
Elapsed time is 0.004480 seconds.
As well I did: I can confirm.
Cool!
Thanks again, Daniel

Sign in to comment.

More Answers (0)

Products

Asked:

on 9 Jan 2017

Edited:

on 12 Jan 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!