You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
An empty do loop giving seg fault with OpenMP while mexing with fortran
5 views (last 30 days)
Show older comments
Hello,
I have a code which uses a 'parallel do' construct. When I run the code in serial, it works w/o any problem. But when I parallelize it, it gives seg fault. I tried omitting some parts of the code out for testing purposes. At the end, I endded up with the below which still gave the seg fault error.
!$OMP PARALLEL DO DEFAULT(SHARED) PRIVATE(i,adjDist,j,k)
do i = 1,nNods
adjDist = zero
do j = 1,nDim
do k = 1,nAdj
if (nodAdj(i,k).eq.0) cycle
adjDist(k,j) = XYZ(nodAdj(i,k),j) - XYZ(nods(i),j)
enddo
enddo
enddo
!OMP END PARALLEL DO
Here 'adjDist' is a (nAdj x nDim) array and all the other variables are properly defined shared variables. Do you have any idea about what causes this to give seg fault? Before I was using the following statement;
adjDist(:,j) = XYZ(nodAdj(i,:),j) - XYZ(nods(i),j)
not to have the third do-loop with index 'k', but then I even changed it not to allocate/deallocate any memory, although I am not sure if Fortran allocates any extra memory in such a case. But in any case, apparently that is not the reason.
Thanks!
Ugur
Answers (1)
James Tursa
on 25 Aug 2016
Why is adjDist private? This means each thread will work with its own individual copy of adjDist. Is this what you are really trying to do for this test case? How large is adjDist?
Some combinations of Fortran/OpenMP/MATLAB are not compatible. What versions are you running? Can you code up a very simple test case with one simple parallel loop and see if that runs OK?
18 Comments
Ugur
on 25 Aug 2016
Edited: Ugur
on 25 Aug 2016
adjDist is intentionally private as I normally use it in the rest of the code which I omitted in this test run. It is 6x3, so not big at all. And yes, I had managed to use openmp for other programs before. I use 2014a and gcc7.3 as the MATLAB and Fortran versions. The other matrices are big in size but I already checked to see if they were properly transferred from MATLAB workspace.
James Tursa
on 25 Aug 2016
Edited: James Tursa
on 25 Aug 2016
What happens if you take the array adjDist off of the private list and make it shared? (And maybe move the adjDist = zero line to outside the parallel section)
Ugur
on 25 Aug 2016
By the way, just to give further information, it can start threads in parallel, but I guess it faces a problem when killing them at the end of the code. I see an unreasonable increase in memory usage while cpu usage goes down. And then it crashes.
James Tursa
on 25 Aug 2016
I'm just wondering if it is messing up allocating/deallocating those private arrays for adjDist, so I'd like to see what a run does without it.
James Tursa
on 25 Aug 2016
Hmmm ... how about something even simpler. Get rid of all of the SHARED and PRIVATE meta commands (so the default will be only the index variable i is private) and see what happens. E.g.
!$OMP PARALLEL DO
Ugur
on 25 Aug 2016
Crashed again. Here is the gateway function, I am using. I do not know if I do a mistake here, but it seems that it is not what is inside the do loop what causes the problem. Somehow the threads occupy illegal memory, if such a thing is possible...
#include "fintrf.h"
!
SUBROUTINE MEXFUNCTION(NLHS, PLHS, NRHS, PRHS)
use mexf90
implicit none
integer*8, intent(in) :: PRHS(*) ! Pointers carrying the input data
integer*8, intent(out) :: PLHS(*) ! Pointers carrying the output data
integer*4, intent(in) :: NLHS,NRHS ! REMAINS THE SAME FOR 64BIT system
!-----------------------------------------------------------------------
integer*8 :: err, nNods, ...
integer*8, pointer :: XYZ, elmConn, ...
real*8 :: dvType, zeroLim
integer*4 :: classId
! ERROR CHECKING FOR INPUT
IF (NRHS .NE. 8) THEN
CALL MEXERRMSGTXT('MultMexError: 8 INPUT ARGUMENT IS REQUIRED')
ENDIF
IF (NLHS .NE. 1) THEN
CALL MEXERRMSGTXT('MultMexError: 1 OUTPUT ARGUMENT IS REQUIRED')
ENDIF
! ASSIGN POINTERS TO THE VARIOUS PARAMETERS
! Input matrix
XYZ =>MXGETPR(PRHS(1))
nPts = MXGETM(PRHS(1))
nDim = MXGETN(PRHS(1))
...
classId = mxClassIDFromClassName('double')
plhs(1) = mxCreateNumericArray(int(3,8),[nNods, nAdj, nDim],classId,0)
returnVar =>mxGetPr(plhs(1))
CALL myFunction(XYZ, nPts, nDim, ...)
END SUBROUTINE MEXFUNCTION
James Tursa
on 25 Aug 2016
Edited: James Tursa
on 25 Aug 2016
What is in the mexf90 module that you are using? The exact same stuff from this thread?
https://www.mathworks.com/matlabcentral/newsreader/view_thread/307087
Or something different?
Also, I am confused as to what data types you are using. The XYZ declaration implies that the data is INTEGER*8, but then the returnVar points to plhs(1) data area, which you created as 'double'. What exactly are the data types of your inputs and outputs? And what does the signature of myFunction look like?
Also, I advise to never use literal arguments to the mex API functions in Fortran. Always use variables to ensure that the data types match up. (In C you can get away with type mismatches since everything is passed by value and argument types automatically get promoted ... not so in Fortran). So in your mxCreateNumericArray routine call, create variables with the exact types you want and pass those in instead of the literals you are currently using. E.g., the signature is this:
mwPointer mxCreateNumericArray(ndim, dims, classid, ComplexFlag)
mwSize ndim
mwSize dims(ndim)
integer*4 classid, ComplexFlag
So create the appropriate mwSize ndim and dims variables, and the integer*4 classid and ComplexFlag variables, and pass those in as arguments. That way you don't have to worry about whether mwSize is integer*4 or integer*8 for your mex compile settings, and you don't have to worry about whether literal integers are integer*4 or integer*8, etc.
Ugur
on 25 Aug 2016
It is the module where mex function interfaces are defined. The one related to mxCreateNumericArray is as follows;
function mxCreateNumericArray(ndim, dims, classid, ComplexFlag)
integer*4 :: classid, ComplexFlag
integer*8 :: ndim, dims(ndim), mxCreateNumericArray
end function mxCreateNumericArray
What you ask is a trick I use to avoid copying input and output variables using mxCopy functions. I define all the inputs and outputs as an integer pointer in the gateway function pointing the memory location where the input/output array is stored. No matter of what type an array is, as far as I know, Fortran uses an int*4 pointer in 32 bit systems and int*8 pointer in 64 bit systems to pass the array to a subroutine. This trick I used in many other programs with or without OpenMP, so either I was very lucky untill today or what I do wrong is sth else. The part of myFunction that I define the inputs is as follows;
subroutine myFunction(XYZ, nPts, nDim, ...
use omp_lib
use mexf90
implicit none
integer*4, intent(in) :: nPts, nDim, ...
real*8, intent(in) :: XYZ(nPts,nDim), ...
real*8, intent(out) :: returnVar(nNods,nAdj,nDim)
real*8 :: zero, one, adjDist(nAdj,nDim), ...
integer*4 :: i, j, k, ...
James Tursa
on 26 Aug 2016
I will post a more lengthy reply later. But first, what does your interface for mxGetPr look like?
James Tursa
on 8 Sep 2016
Edited: James Tursa
on 9 Sep 2016
Sorry for the delay. Finally getting back to this. First, some comments about the interface for mxGetPr (applies to mxGetPi also).
Fortran pointers, in general, are not like C pointers. In C, a pointer is basically just an address where the compiler knows what type is being pointed to (so it knows how to dereference and do pointer arithmetic). In Fortran, pointers have all of that, plus they contain information on the shape and stepping size of what is being pointed to. How that information is stored in this pointer is up to the compiler vendor, but it is some sort of descriptor (i.e., structure). So Fortran pointers and C pointers are not the same. But you are using a Fortran pointer like a C pointer in your interface code above. How is this expected to work? Well, as it happens, Fortran pointers to scalars are a special case. Since they can't change shape (always 1 element exactly) and there is nothing to "step" over, this information is meaningless for a pointer to scalar. So compiler vendors typically just implement a Fortran pointer-to-scalar just like a simple C pointer ... simply an address with nothing else attached. That is why the above interface works, because the compiler vendor chose to implement a Fortran pointer-to-scalar this way (they weren't required to) and because you are passing this pointer through an implicit interface to myFunction.
That being said, your interface above is for an integer*8 data type, not a real*8 double type. So your interface should probably be this since you are using it with real*8 data:
real*8,pointer :: mxGetPr ! <-- real*8 is the data type being pointed to, not the pointer itself
mwPointer :: pm
However, I don't think this is the cause of the problem. Since the interface to myFunction looks like it is implicit, even though you have declared XYZ to point to integer*8 (even though it is really pointing to real*8), what will likely happen is that the address will simply get put on the argument stack for the function call and myFunction will end up using this memory properly as real*8 data.
If you get rid of the OpenMP stuff and just do your mxGetPr stuff, do you still have a crash problem?
Ugur
on 1 Oct 2016
Edited: Ugur
on 1 Oct 2016
Hi James,
I am so sorry that I gave up checking if there was a new answer on the issue. Thanks for your explanations about the pointers. I now have a deeper understanding. But I cannot really change the type of the pointer for mxgetpr, because I also use it to get integer arrays from the matlab workspace in the same function. In that case, I would require two api functions: one for real, and one for integer arrays.
Coming to your question, no, I don't get any crash when I do not use OpenMP. That is the point what I don't understand in deed. What makes it more sensitive about the memory usage when OpenMP is used so that it gives seg fault?
And a funny thing is that, if I run the code once switching off OpenMP, then I can turn OpenMP on and run the code in parallel. But if I restart Matlab and directly try to run the code in parallel, then it crashes.
Ugur
on 2 Oct 2016
Edited: Ugur
on 2 Oct 2016
I finally figured out what the problem is. The error is not due to the mex function itself. I just realized that I only receive the segmentation fault when I call the mex function in another matlab function, but it works without any problem being called in the the main program. I do not know what the difference is in terms of the memory management. Do you have an idea about any possible reason?
James Tursa
on 4 Oct 2016
No, not without seeing more details. Is it being called from an m-file, or from another mex routine?
Ugur
on 4 Oct 2016
The main program calling the m-file and the m-file calling the mex function. And this is causing the seg fault. If I call the mex function directly from the main program then it works.
James Tursa
on 4 Oct 2016
Off hand, the first thing I would look at is the argument list for each of those calls. Check to see that you are passing in the same stuff (same number of inputs, same sizes, same classes, etc). It could be that there is a difference that the mex routine doesn't check for.
Ugur
on 4 Oct 2016
Well, I think I checked this, and indeed, what I did is to split the m-file from the line it calls the mex function, and write the rest into the main program. So the arguments remain the same but it crashes in the former while it works in the latter case.
See Also
Categories
Find more on Fortran with MATLAB in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)