segfault when using "matmul" function in FORTRAN mex code

When I try to multiply two matrices of complex type using "matmul" function, I receive a segfault with the following code;
matD = matmul(transpose(matB),matC*matB)
while this runs without an error
matC = matC*matB
matD = matmul(transpose(matB),matC)
I tried to run a similar code directly in Fortran and it did not give any error (as expected). What may cause such an instability in mex?

 Accepted Answer

How large are your variables? I would hazard a guess that they are large, in the 10's or 100's MB or larger.
One of the problems with Fortran is that temporary variables typically get memory from the stack (relatively small memory available), rather than the heap (relatively large memory available). So calculations that immediately get used as arguments or in other calculations are temporary variables that take stack memory, whereas calculations that immediately get assigned to a variable do not ... such calculations are done in-place in the left-hand-side variable. So look at your two formulations:
matD = matmul(transpose(matB),matC*matB)
Here transpose(matB) is a temporary result that is only used as an argument, so the memory for it will typically be taken from the stack. Likewise, matC*matB is a temporary result that is only used for an argument, so its memory will also be taken from the stack. If they are too large, you will blow the stack memory (a fixed amount of memory much smaller than the heap) and the program will crash.
Now look at your second formulation:
matC = matC*matB
matD = matmul(transpose(matB),matC)
The matC*matB calculation is assigned to a variable. Thus, the result does not require additional stack memory ... the results can be done "in-place" in the matC variable (since the * is "element-wise multiplication" in Fortran). There is still an issue with the transpose(matB) calculation taking stack memory, but it appears that was not a tipping point for the crash.
As to why your Fortran stand-alone program worked fine but the mex routine using the same code failed, it is likely simply the result of having different amounts of stack memory available for the two different implementations. E.g., had you started MATLAB with some type of command that would increase its stack space, the mex routine may have worked fine.
That is one reason I tend to avoid certain constructs like using "any" on a temporary calculation result. E.g.,
if( any( A == B ) ) then
! etc
endif
If A and B are large, then the result A == B will be large and come from the stack, potentially crashing the program. To guard against this, I often fall back to F77 style programming because it is protected against this. E.g., the above could be coded as follows:
flag = .false.
do i=1,n
if( A(i) == B(i) ) then
flag = .true.
exit
endif
enddo
if( flag ) then
! etc
endif
Seems kind of backwards to have to resort to F77 style code to protect against stack overflows, but there it is. There might be compiler directives to tell the compiler to create temporary and local variables from the heap, but doing so ties your code to non-standard compiler directives. Rather than doing this, I tend to just fall back on F77 style coding because I know it will be safe for large variables. I don't want to have to rely on compiler directives, or increasing the stack size when starting the program, etc. I would rather write code from the outset that I know will work regardless of stack size (unless of course the heap runs out).
Another potential issue is the matB, matC, and matC variables themselves. If they are non-allocated variables then their memory will also typically come from the stack. So you should never do something like this in Fortran (actually you can have this exact problem in other languages such as C or C++ as well):
subroutine whatever(large_number)
integer large_number
real*8 X(large_number)
The reason is that X will take stack memory, and potentially crash the program. Local large variables should always be allocated since that memory will come from the heap. E.g.,
subroutine whatever(large_number)
integer large_number, status
real*8, allocatable :: X(:)
allocate( X(large_number), stat = status )
if( status /= 0 ) then
! handle the memory allocation error
endif
! code to use X
deallocate(X)

8 Comments

The variables are really small indeed, double complex 19x11 arrays which is not more than a couple of kBs. But it seems like I receive stack overflow when I try to embed some matrix operations as the arguments of a function.
I'll have another question based on your answer. I have compiled the mex files using the -largeArrayDims option. What does that option do actually? Does it steal from the stack memory to leave more space on the heap such that it is easier to have stack overflow?
Hi again James,
What you said solved the problem at that part but now I do receive another segfault when I try to dynamically allocate a matrix inside a function as you mentioned above. The function to take the inverse of a matrix using lapack (actually you already answered a question about that and it worked perfectly in the test case - http://www.mathworks.com/matlabcentral/answers/224034-segfault-in-matrix-inversion-using-lapack-in-mex). If I do a static allocation as follows;
function inv(A) result(Ainv)
use mexf90
complex*16, dimension(:,:), intent(in) :: A
complex*16, dimension(size(A,1),size(A,2)) :: Ainv
complex*16, dimension(size(A,1)) :: work ! work array for LAPACK
integer*8, dimension(size(A,1)) :: ipiv ! pivot indices
integer*8 :: n, info
! External procedures defined in LAPACK
external ZGETRF
external ZGETRI
! Store A in Ainv to prevent it from being overwritten by LAPACK
Ainv = A
n = size(A,1)
! ZGETRF computes an LU factorization of a general M-by-N matrix A
! using partial pivoting with row interchanges.
call ZGETRF(n, n, Ainv, n, ipiv, info)
if (info /= 0) then
CALL MEXERRMSGTXT('Matrix is numerically singular!')
end if
! ZGETRI computes the inverse of a matrix using the LU factorization
! computed by DGETRF.
call ZGETRI(n, Ainv, n, ipiv, work, n, info)
if (info /= 0) then
CALL MEXERRMSGTXT('Matrix inversion failed!')
end if
end function inv
I receive a segfault while calling ZGETRF. If I use dynamic allocation instead;
function inv(A) result(Ainv)
use mexf90
complex*16, dimension(:,:), intent(in) :: A
complex*16, allocatable :: Ainv(:,:)
complex*16, allocatable :: work(:) ! work array for LAPACK
integer*8, allocatable :: ipiv(:) ! pivot indices
integer*8 :: n, info
! External procedures defined in LAPACK
external ZGETRF
external ZGETRI
n = size(A,1)
allocate(Ainv(n,n),work(n),ipiv(n))
...
end function
I receive the segfault on allocation. What may cause such an error?
PS: It allocates some other matrices w/o any problem before it reaches to that point.
Hmmm ... if the variables are small then I wouldn't expect a stack overflow. What is the exact text of the seg fault message you are getting?
The -largeArrayDims option forces the use of API library function interfaces that use 8-byte integers instead of 4-byte integers for the macro defined integers. I.e., it would force macros such as mwSize to translate as integer*8, and it would force the mex file to link with a MATLAB API library that used 8-byte integers for API function arguments. If you had local variable arrays defined as mwSize, then I could see that potentially affecting how easily you could overflow the stack. But again, the fix is to not do that in the first place ... always use allocatable variables for large local variable arrays since that memory will come from the heap.
It is for sure that I'm having a memory problem but each time a switch to another type of declaration of variables, the error changes as well. When I write the code in this way;
#include "fintrf.h"
!
SUBROUTINE MEXFUNCTION(NLHS, PLHS, NRHS, PRHS)
use mexf90
implicit none
integer*8, intent(in) :: PRHS(*) ! Pointers carrying the input data
integer*8, intent(out) :: PLHS(*) ! Pointers carrying the output data
integer*4, intent(in) :: NLHS,NRHS ! REMAINS THE SAME FOR 64BIT system
!-----------------------------------------------------------------------
integer*8 :: err, mI1, nI1, mO1, nO1, ...
real*8 :: sca1, ...
integer*8, pointer :: input1_re, input1_im, input2, ..., output
character(200) :: errMsg
integer*4 :: txt, classId
integer*4, external :: mexprintf
! ERROR CHECKING FOR INPUT
IF (NRHS .NE. 9) THEN
CALL MEXERRMSGTXT('MultMexError: 9 INPUT ARGUMENT IS REQUIRED')
ENDIF
IF (NLHS .NE. 1) THEN
CALL MEXERRMSGTXT('MultMexError: 1 OUTPUT ARGUMENT IS REQUIRED')
ENDIF
! ASSIGN POINTERS TO THE VARIOUS PARAMETERS
! Input matrix
input1_re =>MXGETPR(PRHS(1))
input1_im =>MXGETPI(PRHS(1))
mI1 = MXGETM(PRHS(1))
nI1 = MXGETN(PRHS(1))
input2 =>MXGETPR(PRHS(2))
...
sca1 = MXGETSCALAR(PRHS(5))
...
! Do something meaningful...
classId = mxClassIDFromClassName('double')
plhs(1) = mxCreateDoubleMatrix(mO1,nO1,0)
output =>mxGetPr(plhs(1))
call functionWithMatrices(input1_re, input1_im, mI1, nI1, ..., sca1, ..., mO1, nO1, output)
! Send the result to the return argument
!(For an alternative way of sending the results to the return arguments - see referenceF90.f90)
END SUBROUTINE MEXFUNCTION
subroutine functionWithMatrices(input1_re, input1_im, mI1, nI1, ..., sca1, ..., mO1, nO1, output)
use mexf90
!use omp_lib
interface
function inv(A) result(Ainv)
use mexf90
complex*16, dimension(:,:), intent(in) :: A
complex*16 :: Ainv(size(A,1),size(A,2))
end function inv
end interface
integer*8, intent(in) :: mI1, nI1, mO1, nO1, ...
real*8, intent(in) :: input1_re(mI1,nI1), input1_im(mI1,nI1), sca1, ...
real*8, intent(out) :: output(mO1,nO1)
...
complex*16 :: mat2Invert(mI1,nI1), invMat(mI1,nI1)
...
do i = 1,loop1
do ii = 1,loop2
...
invMat = inv(mat2Invert)
...
enddo
do ii = 1,loop3
output(i,ii) = ...
enddo
enddo
end subroutine functionWithMatrices
function inv(A) result(Ainv)
use mexf90
complex*16, dimension(:,:), intent(in) :: A
complex*16 :: Ainv(size(A,1),size(A,2))
complex*16 :: work(size(A,1)) ! work array for LAPACK
integer*8 :: ipiv(size(A,1)) ! pivot indices
integer*8 :: n,info
! External procedures defined in LAPACK
external ZGETRF
external ZGETRI
! Store A in Ainv to prevent it from being overwritten by LAPACK
Ainv = A
n = size(A,1)
! ZGETRF computes an LU factorization of a general M-by-N matrix A
! using partial pivoting with row interchanges.
call ZGETRF(n, n, Ainv, n, ipiv, info)
if (info /= 0) then
CALL MEXERRMSGTXT('Matrix is numerically singular!')
end if
! ZGETRI computes the inverse of a matrix using the LU factorization
! computed by DGETRF.
call ZGETRI(n, Ainv, n, ipiv, work, n, info)
if (info /= 0) then
CALL MEXERRMSGTXT('Matrix inversion failed!')
end if
end function inv
where everything is in stack memory, I receive a "corrupted double-linked list error" while calling ZGETRF. If I use allocatable variables instead of static ones in inv function as follows;
function inv(A) result(Ainv)
use mexf90
complex*16, dimension(:,:), intent(in) :: A
complex*16, allocatable :: Ainv(:,:)
complex*16, allocatable :: work(:) ! work array for LAPACK
integer*8, allocatable :: ipiv(:) ! pivot indices
then I receive this error: "Program received signal SIGSEGV, segmentation fault. 0x0000003f33a7e296 in _int_malloc () from /lib64/libc.so.6" when I try to allocate the variables
n = size(A,1)
allocate(Ainv(n,n),work(n),ipiv(n))
Ugur
Ugur on 22 Jun 2015
Edited: Ugur on 22 Jun 2015
Hi James,
I just tried writing 3 different allocate statements for the last case, then it manage to allocate the memory but still it gives a "corrupted double-linked list error" while calling ZGETRF.
I haven't forgotten about this, but don't have a solution for you as of yet. Maybe you could simplify things to start with. Get rid of the mexf90 and passed-in-variables stuff and instead just code up some fixed size local variables to use, and then pass those into the ZGETRF and ZGERTI routines to see if that part in and of itself actually works. Then build the code up from there piece by piece to see which part causes the segfault.
Many thanks for your interest on the issue! I am doing now exactly what you suggested, indeed. When I try a standalone application that just inverts a matrix, it works. But when I try to introduce a bunch of matrix operations in advance, there the problem begins. Maybe I can ask one last question about the issue: When I debug the code using Valgrind-memcheck, it indicates an "illegal write" whenever I allocate a matrix. I think the way Fortran allocates a block in heap memory is considered illegal in C environment (and so in MATLAB) but MATLAB can compensate this up to a certain extent. And when too much allocation/deallocation takes place, it crashes. (I know it sounds silly but that's the only explanation I can give after this many trials) Can you suggest me a better way to allocate some memory, which is valid in Fortran and not illegal in MATLAB?
Hi James,
Please ignore all the things I wrote above. I found where the memory corruption comes from! It was my mistake, indeed. At a certain point the code was writing outside an array (stupid me!) But gdb was indicating the segmentation fault error when I try to launch another function, or allocate another array. And this was changing everytime I changed the code since different memory blocks were put next to each other I guess. Anyway, many many thanks for all your interest in the issue :)

Sign in to comment.

More Answers (0)

Categories

Asked:

on 18 Jun 2015

Commented:

on 24 Jun 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!