MATLAB Answers

0

thread safety of mx mex functions

Asked by Jim Hokanson on 13 Sep 2018
Latest activity Commented on by Jim Hokanson on 14 Sep 2018
Although I'm aware that mx mex functions are generally considered not to be thread safe, it is unclear to me if this applies to all of them. I would think this would mostly be an issue for functions that manipulate anything in memory that might cause the garbage collector to get out of sync OR if simultaneous changes are made to the same data in different threads.
I'd like to adjust the size of various mxArrays in a multi-threaded loop (after some processing, using mxSetN) rather than holding onto the sizes in an array and making the size changes in a separate single threaded loop. In my mind this involves changing a single header attribute and seems like it should be thread safe.
I can try this code but my question is whether or not my line of thinking is correct or if I am missing some details.

  4 Comments

Show 1 older comment

Correct, I'm suggesting:

  1. properties of a single mxArray are not changed in different threads, each thread touches unique mxArrays
  2. no memory allocation is done in multiple threads, just property changing

In my use case the call is only to mxSetN to shrink a string, never to grow it.

Correct, setting the dimensions is not time-critical. However it would require additional memory allocation to hold onto the shrink size for all strings so that I could set them outside of the parallel loop. This also isn't that big of a deal, although perhaps not ideal, but in general I want to understand the thread safety issues of working with mx functions.

It seems to me that if you avoid issues with touching the same data in multiple threads and you don't mess with the memory manager that you might be ok .... Is that correct?

Not sure MATLAB uses sharing data internally. If you set the size to 0, it change Pr pointer to NULL and change the sharing chain.
You better not messing with it, unless your name is James Tursa (and in the latest API, undocumented memory sharing function are simply removed).
Hi Bruno, your comments seem to me like good clarifications to my points but it's not clear to me that they invalidate my hypothesis?
In those cases you're changing aspects of memory linkage since your data are not linked with the header, which in my mind falls into the memory management category to avoid.
Thoughts?

Sign in to comment.

1 Answer

Answer by James Tursa
on 13 Sep 2018
Edited by James Tursa
on 13 Sep 2018
 Accepted Answer

Here is my short novel on this topic ...
To understand how dimensions are stored in mxArray variables, it will help to review the dimensions portion of the mxArray header itself (my understanding, not official):
Dimension storage in mxArray
struct mxArray {
:
size_t ndim;
unsigned int RefCount;
unsigned int flags;
union {
size_t M; /* Row size for 2D matrices, or */
size_t *dims; /* Pointer to dims array for nD > 2 arrays */
} Mdims;
size_t N; /* Column size for 2D matrices */
:
};
The number of dimensions is stored directly in the mxArray struct itself (ndim). For 2D variables, the dimensions M and N are stored directly in the mxArray struct itself. But for nD arrays with n>2, the M spot in the mxArray actually contains a pointer to a separate area of memory that contains the dimension values (kind of like how the pr and pi spots point to the real and imaginary data areas of the mxArray).
Which API functions are thread-safe?
It is my belief that API functions that invoke the MATLAB Memory Manager for anything are not thread-safe. That would include all of the mxCreateEtc functions, the allocation functions mxMalloc and friends, and mxArrayToString, etc. Also on this list would be mxDestroyArray and mxFree.
API functions that simply get values from the mxArray should be thread-safe. So functions like mxGetPr, mxGetData, mxGetClassID, mxGetCell, etc should be OK. I would put mxGetScalar in this category also.
Setting values in an mxArray becomes a gray area. Obviously different threads writing to the same memory can walk on each other so that practice is not thread-safe. But what about writing into the mxArray struct itself? That is your real question, and that is where the dimensions is a gray area unfortunately. Certain actions can invoke the MATLAB Memory Manager while other actions will not. What follows is my understanding of what happens with the API calls involving the dimensions:
mxGetDimensions
This one can be tricky.
On 64-bit MATLAB compiled with 64-bit mwSize (or 32-bit MATLAB with 32-bit mwSize), this simply gets the pointer to the dimensions and should be thread-safe. The pointer to the dimensions will point directly into the mxArray for 2D variables (will point at the M spot), but for nD arrays with n>2 this will simply be the pointer that is stored on top of the M spot (via the union above). Note that in this case, if you write into the memory behind the pointer returned by mxGetDimensions, you will be altering the dimensions of the mxArray directly.
On 64-bit MATLAB compiled in compatibility mode with 32-bit mwSize there is a mismatch (mxArray dimensions are 64-bit size_t but mxGetDimensions returns a pointer to 32-bit mwSize). The underlying mxArray always has 64-bit size_t values stored in the dimensions spots regardless of how you compile the mex routine, but the result of the mxGetDimensions call is a pointer to 32-bit integers. How the heck is that supposed to work? Well, what MATLAB does is create a copy of the 64-bit dimensions array in a separate area of memory holding 32-bit integers, and the result of the mxGetDimensions call points to that. Since there is a copy involved, that could potentially invoke the MATLAB Memory Manager and thus might not be thread-safe. I don't know this for a fact (i.e., maybe this memory is already set aside somewhere when the variable got created and is not allocated at the time of the mxGetDimensions call), but it is worth noting for caution. Note that in this case, if you write into the memory behind the pointer returned by mxGetDimensions, you will be altering the dimensions of a copy and will not be affecting the original mxArray at all.
mxSetDimensions
This will almost certainly invoke the MATLAB Memory Manager and is not thread-safe in general, since MATLAB makes a copy of the dimensions input into newly allocated memory and stores that copy as part of the mxArray. If you are going from a 3D array to a 2D matrix, then maybe the current dimensions array gets free'd, etc. If you are going from a 2D matrix to a 2D matrix, then no memory allocation functionality is needed and this particular case is probably thread-safe.
mxGetM
Simply gets the value of M so should be thread-safe.
mxGetN
Simply gets the product of the dimensions 2-end so should be thread-safe.
mxSetM
Simply overwrites the 1st dimension value without changing the number of dimensions, so should be thread-safe (as long as another thread is not writing into the same spot of course).
mxSetN
If the variable is currently a 2D variable, then mxSetN simply overwrites the 2nd dimension with a new value without changing the number of dimensions, so should be thread-safe (as long as another thread is not writing into the same spot of course).
If the variable is nD with n>2, then mxSetN will cause the number of dimensions to be changed from n to 2 and then will overwrite the 2nd dimension with a new value. In this case, I am guessing that the MATLAB Memory Manager may be invoked to free the memory behind the current nD dimensions array. Thus this is probably not a thread-safe practice.
In particular, it is my observation that setting one of the dimensions to 0 does not invalidate any of the data pointers (pr, pi, ir, jc). The doc for mxSetDimensions specifically states this to be a fact, and I would expect that mxSetM and mxSetN behave this way as well although it is not stated explicitly in the doc for these functions.
mxSetPr, mxSetPi, mxSetIr, mxSetJc, mxSetDoubles, mxSetCell, etc.
All of these functions that set pointer fields in the mxArray struct should be considered not thread-safe, since they all invoke the MATLAB Memory Manager to remove the pointer involved from the garbage collection list.
R2018a and later
Lots of cautions here since the mxArray struct header definition changed (pi is no longer present). R2018a and later allow mex routines compiled in earlier versions (and compiled in R2018a with the -R2017b option) to run. However, there could be copying going on in the background for certain API function calls that invoke the MATLAB Memory Manager. So there might be cases that seem OK at first turn out to be problematic from a thread-safety standpoint. I'm still learning about this myself ...
Shared Data Copies, Reference Copies, Shared Parent Copies, and Handle Classdef Objects
If X and Y are shared data copies of each other, they have separate mxArray header structs but the data pointers are the same. You can safely change the dimensions of one in a mex routine without affecting the other. E.g.,
X = rand(3,4);
Y = reshape(X,2,6); % <-- Y is a shared data copy of X
If X and Y are reference copies of each other, they are essentially the same variable sharing both the mxArray header struct and the data pointers. Changing the dimensions of one in a mex routine will change the dimensions of the other.
X = rand(4,4);
Y = cell(1,2);
Y(1:2) = {X}; % Y{1} is a reference copy of Y{2}
If X and Y are shared parent copies of each other, they are essentially the same as reference copies for the purposes of this discussion. Changing the dimensions of one in a mex routine will change the dimensions of the other.
X.f = rand(5,5);
Y = X; % <-- Y.f is a shared parent copy of X.f
There are no API functions available that tell you the sharing status of a variable. You can hack into the mxArray to detect shared data copies (CrossLink) and reference copies (RefCount), but it is extremely difficult and sometimes impossible to detect shared parent copies. If the variable is part of a cell or struct variable or is a property of a classdef object, you may have no way to tell for certain if dimension changes you make in a mex routine will affect another variable.
Handle Classdef Objects
These are designed to behave like reference copies, so changing the dimensions of one in a mex routine will change the dimensions of the other.

  5 Comments

Regarding the speed, the speedup comment is meant definitely as more of a "neat" thing, rather than being life changing, but still interesting for me! Far and away the biggest time consumer for my project is memory allocation. On my TODO list is to implement reference count incrementing based on our previous discussions for repeated values rather than creating new mxArrays.
"... On my TODO list is to implement reference count incrementing ..."
Be sure to read this in case you haven't already:
Yikes! Thanks.

Sign in to comment.