Matlab efficiency - Pass by reference

14 views (last 30 days)
AJ
AJ on 22 Jun 2015
Commented: James Tursa on 5 Oct 2018
My Matlab project uses a main class object to manipulate data. The class object is fairly complex from a data structure perspective, with data trees, tables, structures of multiple depths, etc. Some of the methods call external functions to process the data. Many are, I'll say, "pass by reference" e.g.
[R.Struct1] = my_external_function(R.Struct1,var1,R.struct2)
R is the class object, and Struct1 is a property of the object. my_external_function modifies R.Struct1 with tight restrictions: Struct1 is not allowed to change size, and, in general, assignments are made using indexing, e.g. (within the my_external_function),
Struct1.node_tree.count(1) = Struct1.node_tree.count(1) + 1;
The problem I have is performance when running in the Matlab environment. The Matlab Profiler says that, say, 7.586 seconds (569 calls) are spent in child function "my_external_function". Yet when I dive into the child function, it reports that only 0.286 seconds (569 calls) were spent there. Is the unaccounted-for time due to unintended pass-by-value/overhead?
This is with R2014b. R2015b is about 20% worse. :(
I'll note here that when the my_external_function is converted to C (using Matlab coder), Struct1 is correctly passed as a single pointer (pass by reference). All my code must remain Coder compliant.
Thanks for any insight.
aj
  6 Comments
James Tursa
James Tursa on 23 Jun 2015
The new classdef style classes are certainly not mex friendly as you note. Mainly because the only routines for accessing them, mxGetProperty and mxSetProperty, use deep data copies.
In contrast, the old style classes using the @classname directory structure only, are very mex friendly because internally the variables are stored as structs, so all of the struct API functions such as mxGetField etc that work with pointers to the field elements (rather than deep data copies of the field elements) can be used on these objects.
If you want to work with pointers to the properties of classdef objects you have to use hacks. E.g., this one will get a pointer to a classdef property:
The reverse task of putting a shared data copy of a variable into a classdef property instead of a deep data copy is not yet posted to the FEX, although I have a working prototype.
James Tursa
James Tursa on 5 Oct 2018
Update: The FEX submission above now contains the code for putting shared data copies of variables into classdef properties.

Sign in to comment.

Answers (2)

James Tursa
James Tursa on 22 Jun 2015
Edited: James Tursa on 22 Jun 2015
I don't know if the in-place parsing is smart enough to deal with struct fields or not. E.g., see this Blog by Loren:
And this related post:
  6 Comments
James Tursa
James Tursa on 23 Jun 2015
Edited: James Tursa on 23 Jun 2015
I will add a few comments, which basically demonstrate that I don't understand what is going on yet. Philip's attempt at avoiding the deep data copy is a good one, but the timings just don't make sense to me.
Temp = R.Struct1;
The above line of code should have taken a very small fraction of the time, since it only results in a shared data copy of R.Struct1 into Temp. I have no idea why the profiler is showing a 0.42 fraction.
R.Struct1=[]; % Make sure there are no other references to Temp
The above line does not necessarily ensure there are no other shared data copies with Temp. It only ensures that the R.Struct1 is not one of them. If there are other shared data copies (or reference copies, etc) of R.Struct1 floating around, they will still be shared with Temp, so any subsequent changes to Temp would cause a deep data copy of Temp. MATLAB gives the user no tools to determine the shared status of a variable (shared data copy, reference copy, shared parent copy, etc), so the only way to ensure there is no sharing is to be very careful how you use the variable. (There is a way to determine some of this in a mex routine, but that is beyond the scope of this thread)
R.Struct1 = Temp;
Again, this should only produce a shared data copy of Temp in R.Struct1 which should only take a small fraction of time. I have no idea how it is getting a 0.36 fraction of time.
Maybe Jan is on the right track and the profiler can't really be trusted to show which lines are taking the bulk of the time in this case.

Sign in to comment.


Philip Borghesani
Philip Borghesani on 23 Jun 2015
Edited: Philip Borghesani on 23 Jun 2015
Does R.Struct1 contain objects (especially handle objects) or data that could be in other objects? Of particular concern is any back references to R.
Structures must be searched when added to and removed from an object via property access to determine if there are circular references and keep track of them properly. With large data structures and classes this can represent significant overhead.
If only some fields of Struct1 are being accessed by my_external_function then you might be better off passing them separately or passing the entire object if many fields are being modified.
[R.struct1.fld1,R.struct1.node_tree.count,...]=my_external_function(R.struct1.fld1,R.struct1.node_tree.count,...
  5 Comments
Philip Borghesani
Philip Borghesani on 24 Jun 2015
James your are correct there is currently no in place optimization for
s.a=foo(s.a)
So the "temp trick" would be needed to get the in place optimization.
That does not seem to be AJ's issue though, the time appears to be in extracting the struct from the object and putting it back in. Because R is a class (still don't know if it is a handle class or if that makes any difference) extracting and inserting a large structure can have high overhead. How many nodes are in node_tree?
AJ
AJ on 24 Jun 2015
OK, I did an analysis on the class structure. R is a classdef object that has 178040 mxArray pointers within it (in a recursive sense). None are handles or classes, although there are some enumerations. If I look at the mxGetData() pointers for each, there is a HUGE number of duplicate values, for two reasons:
1. When I initialize fields of a structure to, say, 0.0, all similar fields get the same data pointer. Interesting.
2. When I have arrays of structures (all fixed size), I used repmat to create them. This, of course, creates references, not copies.
3. During processing, relationships are established among the data structures, and I'll bet there is a lot of "cross-pollinating" of data references.
It will not be possible (within practicality) to make my structures reference-free. So I guess I'll have to suck it up for now. Thanks for everyone's contributions.
aj

Sign in to comment.

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!