Thread Subject: copy-on-write with MEX files

Subject: copy-on-write with MEX files

From: Matt

Date: 23 Jul, 2008 22:25:05

Message: 1 of 12


I'm trying to understand if there is a difference in the
way copy-on-write is implemented for MEX functions as
compared to Mfiles.

My understanding is as follows. As part of the copy-on-
write system, MATLAB will pass input data by reference to
an Mfile function if the function performs read-only
operations on that data. Otherwise, it will pass the data
by value.

However, when it comes to MEX functions, the MATLAB
interpreter obviously cannot know the contents of the
function and cannot determine whether the operations it
will perform are read-only or otherwise. If MATLAB wanted
to have all the same safety nets as for Mfiles, it would
have to always pre-duplicate the input data and send that
to the MEX function instead (effectively passing the data
by value).

One reason for doing it this way would be to ensure that if
the MEX file crashes, the input data would remain unaltered
in the caller workspace. This will be true for Mfiles, and
the MATLAB software designers may have wanted it to be true
for MEX files as well.

On the other hand, if data is always passed by value, it
would obviously reduce the possibilities for computational
efficiency that people are usually interested in using MEX
files to obtain.

In any case, I have several colleagues who believe MEX
files always receive input by value. Can anyone resolve
this?

Subject: copy-on-write with MEX files

From: James Tursa

Date: 24 Jul, 2008 03:19:02

Message: 2 of 12

"Matt " <mjacobson.removethis@xorantech.com> wrote in
message <g68b41$cu8$1@fred.mathworks.com>...
>
> I'm trying to understand if there is a difference in the
> way copy-on-write is implemented for MEX functions as
> compared to Mfiles.
>
> My understanding is as follows. As part of the copy-on-
> write system, MATLAB will pass input data by reference to
> an Mfile function if the function performs read-only
> operations on that data. Otherwise, it will pass the data
> by value.
>

No. MATLAB does not pre-parse the m-file to determine if an
input variable is changed. MATLAB *always* passes the
variables by reference. Once inside the m-file, if the input
variable is changed, *then* MATLAB makes a copy. That is the
copy-on-write behavior.

> However, when it comes to MEX functions, the MATLAB
> interpreter obviously cannot know the contents of the
> function and cannot determine whether the operations it
> will perform are read-only or otherwise. If MATLAB wanted
> to have all the same safety nets as for Mfiles, it would
> have to always pre-duplicate the input data and send that
> to the MEX function instead (effectively passing the data
> by value).
>
> One reason for doing it this way would be to ensure that if
> the MEX file crashes, the input data would remain unaltered
> in the caller workspace. This will be true for Mfiles, and
> the MATLAB software designers may have wanted it to be true
> for MEX files as well.
>

There is no safety net for MEX files. There is no
copy-on-write behavior for MEX files. The variables are
passed by reference to the MEX file, same as an m-file. If
you change an input variable in a MEX file, then the
original is changed (as well as all other variables that
share the same data memory). This can mess up the workspace
of course, which is why you see warnings to never do this in
the doc. However, it *is* safe to do if you *know* that the
data memory is not shared with another variable. This can be
tricky and is not recommended unless you really need to
(e.g., very large variables that you don't want duplicated).

> On the other hand, if data is always passed by value, it
> would obviously reduce the possibilities for computational
> efficiency that people are usually interested in using MEX
> files to obtain.
>
> In any case, I have several colleagues who believe MEX
> files always receive input by value. Can anyone resolve
> this?
>

Variables are never passed by value to m-files or MEX files.

James Tursa

Subject: copy-on-write with MEX files

From: Peter Boettcher

Date: 24 Jul, 2008 12:24:44

Message: 3 of 12

"James Tursa" <aclassyguywithaknotac@hotmail.com> writes:

> "Matt " <mjacobson.removethis@xorantech.com> wrote in
> message <g68b41$cu8$1@fred.mathworks.com>...
>>
>> I'm trying to understand if there is a difference in the
>> way copy-on-write is implemented for MEX functions as
>> compared to Mfiles.
>>
>> My understanding is as follows. As part of the copy-on-
>> write system, MATLAB will pass input data by reference to
>> an Mfile function if the function performs read-only
>> operations on that data. Otherwise, it will pass the data
>> by value.
>>
>
> No. MATLAB does not pre-parse the m-file to determine if an
> input variable is changed. MATLAB *always* passes the
> variables by reference. Once inside the m-file, if the input
> variable is changed, *then* MATLAB makes a copy. That is the
> copy-on-write behavior.

To expand on this a little, there is a C-level function in MATLAB that
"unshares" a variable. Internal functions in MATLAB that write to
variables (subsasgn, etc) call this function before writing to the
memory. Function parameters are exactly like assignments in the same
workspace. Nothing happens at function call time except a shared-data
assignment.

>> However, when it comes to MEX functions, the MATLAB interpreter
>> obviously cannot know the contents of the function and cannot
>> determine whether the operations it will perform are read-only or
>> otherwise. If MATLAB wanted to have all the same safety nets as for
>> Mfiles, it would have to always pre-duplicate the input data and send
>> that to the MEX function instead (effectively passing the data by
>> value).
>>
>> One reason for doing it this way would be to ensure that if
>> the MEX file crashes, the input data would remain unaltered
>> in the caller workspace. This will be true for Mfiles, and
>> the MATLAB software designers may have wanted it to be true
>> for MEX files as well.
>>
>
> There is no safety net for MEX files. There is no
> copy-on-write behavior for MEX files. The variables are
> passed by reference to the MEX file, same as an m-file. If
> you change an input variable in a MEX file, then the
> original is changed (as well as all other variables that
> share the same data memory). This can mess up the workspace
> of course, which is why you see warnings to never do this in
> the doc. However, it *is* safe to do if you *know* that the
> data memory is not shared with another variable. This can be
> tricky and is not recommended unless you really need to
> (e.g., very large variables that you don't want duplicated).

The MEX API clearly defines the input variables as "const". You can
defeat this, of course, but you shouldn't unless you're willing to
really dig into what happens at this level, and need that extra 5%
efficiency for some in-place computation.

-Peter

Subject: copy-on-write with MEX files

From: Matt

Date: 24 Jul, 2008 15:59:02

Message: 4 of 12



OK. That does clear it all up!

Much obliged to you both.

Subject: copy-on-write with MEX files

From: Ryan Ollos

Date: 28 Nov, 2008 08:34:04

Message: 5 of 12

Peter Boettcher <boettcher@ll.mit.edu> wrote in message <muyd4l35v1v.fsf@G99-Boettcher.llan.ll.mit.edu>...

> To expand on this a little, there is a C-level function in MATLAB that
> "unshares" a variable. Internal functions in MATLAB that write to
> variables (subsasgn, etc) call this function before writing to the
> memory.

Maybe you can help me clarify how this works. Suppose the following:

X = ones(10, 1);
Y = X;
Z= X;
X(1) = 2;

Based on my understanding and the tests I have done, it seems like the following happens:

Memory is allocated and X is set to point to address a1.
Y is set to point to a1.
Z is set to point to a1.
A deep copy of the memory pointed by X, Y, and Z is performed (address a2), Y and Z are changed to point to a2, and a new value is assigned to the location pointed to by X(1).

However, I suppose its possible that the situation is reversed and X points to a2 while Y and Z continue to point to a1. However, this seems like it would introduce a number of problems with sharing data between the MATLAB and MEX workspaces.

Can you clarify if the behaviour I have described is correct?

Thanks!

Subject: copy-on-write with MEX files

From: Ryan Ollos

Date: 28 Nov, 2008 09:04:02

Message: 6 of 12

"Ryan Ollos" <ryano@physiosonics.com> wrote in message <ggoads$sqr$1@fred.mathworks.com>...
> Peter Boettcher <boettcher@ll.mit.edu> wrote in message <muyd4l35v1v.fsf@G99-Boettcher.llan.ll.mit.edu>...
>
> > To expand on this a little, there is a C-level function in MATLAB that
> > "unshares" a variable. Internal functions in MATLAB that write to
> > variables (subsasgn, etc) call this function before writing to the
> > memory.
>
> Maybe you can help me clarify how this works. Suppose the following:
>
> X = ones(10, 1);
> Y = X;
> Z= X;
> X(1) = 2;
>
> Based on my understanding and the tests I have done, it seems like the following happens:
>
> Memory is allocated and X is set to point to address a1.
> Y is set to point to a1.
> Z is set to point to a1.
> A deep copy of the memory pointed by X, Y, and Z is performed (address a2), Y and Z are changed to point to a2, and a new value is assigned to the location pointed to by X(1).
>
> However, I suppose its possible that the situation is reversed and X points to a2 while Y and Z continue to point to a1. However, this seems like it would introduce a number of problems with sharing data between the MATLAB and MEX workspaces.
>
> Can you clarify if the behaviour I have described is correct?
>
> Thanks!

I found some info in another newsgroup post, where someone noted that you can type 'format debug' and get the memory address location printed to the terminal.

It seems to be the case for my example that Y and Z continue to point to a1, while X points to a2. I suppose this is a computational advantage because it changes a fewer number of pointers.

Subject: copy-on-write with MEX files

From: Jan Simon

Date: 29 Nov, 2008 01:20:19

Message: 7 of 12

Dear Matt!

> I'm trying to understand if there is a difference in the
> way copy-on-write is implemented for MEX functions as
> compared to Mfiles.

Look into the file matrix.h (in <matlabroot>\extern\include\). There you find the code for array_access_inlining, which shows the implementation of a Matlab array.
Besides the obvious fields as pointer to real and imaginary part of the data, number of dimensions and size of an element, you find a bunch of flags and several pointers to void called "reserved", "reserved1" etc.
It might be easy to find out, which field is the vector keeping the dimensions, the class and the perhaps the name of the array. But a reverse-engineering of the bunch of flags wil be impossible and as far as I know: this might be conflicting with the terms of use.

Sometimes one really needs passing variables by reference to a MEX file and changing the original memory - i.e. if your data occupy 50% of your available RAM: Then just create a new class, which works with memory persistently stored in a MEX file: see mexMakeMemoryPersistent. Then the object does not contain the data itself, but just a pointer to it and passing by reference is performed implicitly. A drawback is, that the complete processing must be done in MEX functions.

Kind regards, Jan

Subject: copy-on-write with MEX files

From: Matt J

Date: 27 Dec, 2009 16:25:05

Message: 8 of 12

"Ryan Ollos" <ryano@physiosonics.com> wrote in message <ggoc62$8p1$1@fred.mathworks.com>...

> I found some info in another newsgroup post, where someone noted that you can type 'format debug' and get the memory address location printed to the terminal.
=========

For some reason, this does not seem to be reflecting copy-on-write behavior. When I do the following, the "format debug" output seems to say that X and XX are pointing to different data locations, even though we know from copy-on-write rules that they should be pointing to the same one:

>> format debug
>> X=27

X =


Structure address = 25cdc08
m = 1
n = 1
pr = 193319b0
pi = 0
    27

>> XX=X

XX =


Structure address = 2579b98
m = 1
n = 1
pr = 193319f0
pi = 0
    27

Subject: copy-on-write with MEX files

From: James Tursa

Date: 27 Dec, 2009 16:46:02

Message: 9 of 12

"Matt J " <mattjacREMOVE@THISieee.spam> wrote in message <hh81p0$qdk$1@fred.mathworks.com>...
> "Ryan Ollos" <ryano@physiosonics.com> wrote in message <ggoc62$8p1$1@fred.mathworks.com>...
>
> > I found some info in another newsgroup post, where someone noted that you can type 'format debug' and get the memory address location printed to the terminal.
> =========
>
> For some reason, this does not seem to be reflecting copy-on-write behavior. When I do the following, the "format debug" output seems to say that X and XX are pointing to different data locations, even though we know from copy-on-write rules that they should be pointing to the same one:
>
> >> format debug
> >> X=27
>
> X =
>
>
> Structure address = 25cdc08
> m = 1
> n = 1
> pr = 193319b0
> pi = 0
> 27
>
> >> XX=X
>
> XX =
>
>
> Structure address = 2579b98
> m = 1
> n = 1
> pr = 193319f0
> pi = 0
> 27

Here is what I get:

>> format debug
>> X = 27
X =
Structure address = 2d8d6a0
m = 1
n = 1
pr = 1cda1a0
pi = 0
    27
>> XX = X
XX =
Structure address = 31b62e0
m = 1
n = 1
pr = 1cda1a0
pi = 0
    27

Shared memory. What version of MATLAB are you running? On what machine? Maybe your version does scalars differently (just guessing here). Try this to see what you get:

format debug
X = rand(3)
XX = X

James Tursa

Subject: copy-on-write with MEX files

From: Matt J

Date: 27 Dec, 2009 16:50:20

Message: 10 of 12

Peter Boettcher <boettcher@ll.mit.edu> wrote in message <muyd4l35v1v.fsf@G99-Boettcher.llan.ll.mit.edu>...

> > No. MATLAB does not pre-parse the m-file to determine if an
> > input variable is changed. MATLAB *always* passes the
> > variables by reference. Once inside the m-file, if the input
> > variable is changed, *then* MATLAB makes a copy. That is the
> > copy-on-write behavior.
>
> To expand on this a little, there is a C-level function in MATLAB that
> "unshares" a variable. Internal functions in MATLAB that write to
> variables (subsasgn, etc) call this function before writing to the
> memory. Function parameters are exactly like assignments in the same
> workspace. Nothing happens at function call time except a shared-data
> assignment.
>

Somewhat related to my last post, I'm seeing things that contradict this. As I test, I have the following two very similar functions

function A=tst(A) %READ-ONLY
  A,
end

function A=tst2(A) %NOT READ-ONLY
    A,
    A=A+1;
end


The following output seems to show that the non-read-only version tst2() assigns a different pointer value pr to the input data as soon as it is passed to the workspace of tst2(). This does not occur with the read-only version, suggesting that some sort of pre-parsing for non-read operations does indeed occur...

>>format debug
>> X=27

X =


Structure address = 2579578
m = 1
n = 1
pr = 19332230
pi = 0
    27

>> tst(X);

A =


Structure address = 25795b0
m = 1
n = 1
pr = 19332230
pi = 0
    27

>> tst2(X);

A =


Structure address = 2579658
m = 1
n = 1
pr = 193338f0
pi = 0
    27

Subject: copy-on-write with MEX files

From: Matt J

Date: 27 Dec, 2009 17:16:04

Message: 11 of 12

"James Tursa" <aclassyguy_with_a_k_not_a_c@hotmail.com> wrote in message <hh830a$e7q$1@fred.mathworks.com>...

> Shared memory. What version of MATLAB are you running? On what machine?
========================

R2009b under Windows XP Pro SP3.


Maybe your version does scalars differently (just guessing here). Try this to see what you get:
>
> format debug
> X = rand(3)
> XX = X
>
>> X=rand(3), XX=X
================================

Yep, I am indeed seeing the expected memory sharing for non-scalars. I've also repeated my function pre-parsing experiment below, and it appears that no pre-parsing is done for write operations on a non-scalar.

It all still seems a little strange though. I can see why maybe memory sharing for scalar objects is counter-productive, but clearly pre-parsing of a function is done for scalar write-operations. Isn't that counter-productive as well? And since MATLAB doesn't know in advance whether a function argument is going to be a scalar or not, it would have to repeat this parsing every time the function is called, no?


X =


Structure address = 2579578
m = 3
n = 3
pr = 1e9fe150
pi = 0
    0.3629 0.6046 0.3735
    0.2629 0.7566 0.1733
    0.7640 0.9488 0.4833


XX =


Structure address = 25795e8
m = 3
n = 3
pr = 1e9fe150
pi = 0
    0.3629 0.6046 0.3735
    0.2629 0.7566 0.1733
    0.7640 0.9488 0.4833

>> tst(X);

A =


Structure address = 25796c8
m = 3
n = 3
pr = 1e9fe150
pi = 0
    0.3629 0.6046 0.3735
    0.2629 0.7566 0.1733
    0.7640 0.9488 0.4833

>> tst2(X);

A =


Structure address = 25795b0
m = 3
n = 3
pr = 1e9fe150
pi = 0
    0.3629 0.6046 0.3735
    0.2629 0.7566 0.1733
    0.7640 0.9488 0.4833

Subject: copy-on-write with MEX files

From: James Tursa

Date: 27 Dec, 2009 19:17:02

Message: 12 of 12

"Matt J " <mattjacREMOVE@THISieee.spam> wrote in message <hh84ok$51h$1@fred.mathworks.com>...
>
> It all still seems a little strange though. I can see why maybe memory sharing for scalar objects is counter-productive, but clearly pre-parsing of a function is done for scalar write-operations. Isn't that counter-productive as well? And since MATLAB doesn't know in advance whether a function argument is going to be a scalar or not, it would have to repeat this parsing every time the function is called, no?

Seems that way. I have replicated your results with R2008b. All m-files are, of course, pre-parsed and various optimizations are performed before it even gets run once. It appears that a deep copy of a scalar is passed in cases where there might be a data change, but deep copies of non-scalars are not made at call time. e.g., try this file:

function A=tst3(A) %NOT READ-ONLY
    A,
    if( any(A == 29) )
        A=A+1;
    end
end

In this case a deep copy of a scalar is passed, even though it might not be necessary. But a shared data copy of a non-scalar is passed. This is one of those optimizations that may or may not be consistent from version to version of MATLAB, since TMW is always changing the JIT Accelerator.

Since the original title of this thread concerned mex files, I thought I would throw in the following. For m-files, a shared data copy is passed to the function (new structure header but same data area) except for the scalar stuff mentioned above. But for mex routines, the actual variable is passed (same structure header). i.e., the prhs pointers point directly at the inputs in a mex routine, not at a shared data copy of the inputs.

James Tursa

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
obscure behavior Matt J 27 Dec, 2009 14:50:07
mex Ryan Ollos 28 Nov, 2008 03:38:44
copy on write Ryan Ollos 28 Nov, 2008 03:38:36
copyonwrite Ryan Ollos 28 Nov, 2008 03:38:24
copyonwrite Matt J 23 Jul, 2008 18:25:16
mex Matt J 23 Jul, 2008 18:25:16
rssFeed for this Thread

Contact us at files@mathworks.com