segfault when using "matmul" function in FORTRAN mex code

Question

0 votes

When I try to multiply two matrices of complex type using "matmul" function, I receive a segfault with the following code;

matD = matmul(transpose(matB),matC*matB)

while this runs without an error

matC  = matC*matB
matD  = matmul(transpose(matB),matC)

I tried to run a similar code directly in Fortran and it did not give any error (as expected). What may cause such an instability in mex?

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

James Tursa on 18 Jun 2015

Edited: James Tursa on 18 Jun 2015

Open in MATLAB Online

0 votes

How large are your variables? I would hazard a guess that they are large, in the 10's or 100's MB or larger.

One of the problems with Fortran is that temporary variables typically get memory from the stack (relatively small memory available), rather than the heap (relatively large memory available). So calculations that immediately get used as arguments or in other calculations are temporary variables that take stack memory, whereas calculations that immediately get assigned to a variable do not ... such calculations are done in-place in the left-hand-side variable. So look at your two formulations:

matD = matmul(transpose(matB),matC*matB)

Here transpose(matB) is a temporary result that is only used as an argument, so the memory for it will typically be taken from the stack. Likewise, matC*matB is a temporary result that is only used for an argument, so its memory will also be taken from the stack. If they are too large, you will blow the stack memory (a fixed amount of memory much smaller than the heap) and the program will crash.

Now look at your second formulation:

matC  = matC*matB
matD  = matmul(transpose(matB),matC)

The matC*matB calculation is assigned to a variable. Thus, the result does not require additional stack memory ... the results can be done "in-place" in the matC variable (since the * is "element-wise multiplication" in Fortran). There is still an issue with the transpose(matB) calculation taking stack memory, but it appears that was not a tipping point for the crash.

As to why your Fortran stand-alone program worked fine but the mex routine using the same code failed, it is likely simply the result of having different amounts of stack memory available for the two different implementations. E.g., had you started MATLAB with some type of command that would increase its stack space, the mex routine may have worked fine.

That is one reason I tend to avoid certain constructs like using "any" on a temporary calculation result. E.g.,

if( any( A == B ) ) then
    ! etc
endif

If A and B are large, then the result A == B will be large and come from the stack, potentially crashing the program. To guard against this, I often fall back to F77 style programming because it is protected against this. E.g., the above could be coded as follows:

flag = .false.
do i=1,n
    if( A(i) == B(i) ) then
        flag = .true.
        exit
    endif
enddo
if( flag ) then
    ! etc
endif

Seems kind of backwards to have to resort to F77 style code to protect against stack overflows, but there it is. There might be compiler directives to tell the compiler to create temporary and local variables from the heap, but doing so ties your code to non-standard compiler directives. Rather than doing this, I tend to just fall back on F77 style coding because I know it will be safe for large variables. I don't want to have to rely on compiler directives, or increasing the stack size when starting the program, etc. I would rather write code from the outset that I know will work regardless of stack size (unless of course the heap runs out).

Another potential issue is the matB, matC, and matC variables themselves. If they are non-allocated variables then their memory will also typically come from the stack. So you should never do something like this in Fortran (actually you can have this exact problem in other languages such as C or C++ as well):

subroutine whatever(large_number)
integer large_number
real*8 X(large_number)

The reason is that X will take stack memory, and potentially crash the program. Local large variables should always be allocated since that memory will come from the heap. E.g.,

subroutine whatever(large_number)
integer large_number, status
real*8, allocatable :: X(:)
allocate( X(large_number), stat = status )
if( status /= 0 ) then
    ! handle the memory allocation error
endif
! code to use X
deallocate(X)

8 Comments
Show 6 older comments Hide 6 older comments

Ugur on 19 Jun 2015

Open in MATLAB Online

Hi again James,

What you said solved the problem at that part but now I do receive another segfault when I try to dynamically allocate a matrix inside a function as you mentioned above. The function to take the inverse of a matrix using lapack (actually you already answered a question about that and it worked perfectly in the test case - http://www.mathworks.com/matlabcentral/answers/224034-segfault-in-matrix-inversion-using-lapack-in-mex). If I do a static allocation as follows;

function inv(A) result(Ainv)
  use mexf90
  complex*16, dimension(:,:), intent(in)    :: A
  complex*16, dimension(size(A,1),size(A,2))  :: Ainv
  complex*16, dimension(size(A,1)) :: work  ! work array for LAPACK
  integer*8, dimension(size(A,1)) :: ipiv   ! pivot indices
  integer*8 :: n, info
  ! External procedures defined in LAPACK
  external ZGETRF
  external ZGETRI
  ! Store A in Ainv to prevent it from being overwritten by LAPACK
  Ainv = A
  n = size(A,1)
  ! ZGETRF computes an LU factorization of a general M-by-N matrix A
  ! using partial pivoting with row interchanges.
  call ZGETRF(n, n, Ainv, n, ipiv, info)
  if (info /= 0) then
   CALL MEXERRMSGTXT('Matrix is numerically singular!')
  end if
  ! ZGETRI computes the inverse of a matrix using the LU factorization
  ! computed by DGETRF.
  call ZGETRI(n, Ainv, n, ipiv, work, n, info)
  if (info /= 0) then
   CALL MEXERRMSGTXT('Matrix inversion failed!')
  end if
end function inv

I receive a segfault while calling ZGETRF. If I use dynamic allocation instead;

function inv(A) result(Ainv)
  use mexf90
  complex*16, dimension(:,:), intent(in)  :: A
  complex*16, allocatable          :: Ainv(:,:)
  complex*16, allocatable          :: work(:)    ! work array for LAPACK
  integer*8, allocatable          :: ipiv(:)    ! pivot indices
integer*8                 :: n, info
  ! External procedures defined in LAPACK
  external ZGETRF
  external ZGETRI
  n = size(A,1)
  allocate(Ainv(n,n),work(n),ipiv(n))
      ...
  end function

I receive the segfault on allocation. What may cause such an error?

PS: It allocates some other matrices w/o any problem before it reaches to that point.

Ugur on 22 Jun 2015

Edited: Ugur on 22 Jun 2015

Open in MATLAB Online

It is for sure that I'm having a memory problem but each time a switch to another type of declaration of variables, the error changes as well. When I write the code in this way;

#include "fintrf.h"
!
SUBROUTINE MEXFUNCTION(NLHS, PLHS, NRHS, PRHS)
use mexf90
implicit none
integer*8, intent(in)  :: PRHS(*)  ! Pointers carrying the input data
integer*8, intent(out)  :: PLHS(*)  ! Pointers carrying the output data
integer*4, intent(in)  :: NLHS,NRHS  ! REMAINS THE SAME FOR 64BIT system
!-----------------------------------------------------------------------
integer*8     :: err, mI1, nI1, mO1, nO1, ...
real*8      :: sca1, ...
integer*8, pointer  :: input1_re, input1_im, input2, ..., output
character(200)    :: errMsg
integer*4    :: txt, classId
integer*4, external  :: mexprintf
! ERROR CHECKING FOR INPUT
IF (NRHS .NE. 9) THEN
  CALL MEXERRMSGTXT('MultMexError: 9 INPUT ARGUMENT IS REQUIRED')
ENDIF
IF (NLHS .NE. 1) THEN
  CALL MEXERRMSGTXT('MultMexError: 1 OUTPUT ARGUMENT IS REQUIRED')
ENDIF
! ASSIGN POINTERS TO THE VARIOUS PARAMETERS
! Input matrix
input1_re  =>MXGETPR(PRHS(1))
input1_im  =>MXGETPI(PRHS(1))
mI1    = MXGETM(PRHS(1))
nI1    = MXGETN(PRHS(1))
input2    =>MXGETPR(PRHS(2))
...
sca1    = MXGETSCALAR(PRHS(5))
...
! Do something meaningful...
classId  = mxClassIDFromClassName('double')
plhs(1)  = mxCreateDoubleMatrix(mO1,nO1,0)
output  =>mxGetPr(plhs(1))
call functionWithMatrices(input1_re, input1_im, mI1, nI1, ..., sca1, ..., mO1, nO1, output)
! Send the result to the return argument
!(For an alternative way of sending the results to the return arguments - see referenceF90.f90)
END SUBROUTINE MEXFUNCTION
subroutine functionWithMatrices(input1_re, input1_im, mI1, nI1, ..., sca1, ..., mO1, nO1, output)
use mexf90
!use omp_lib
interface
  function inv(A) result(Ainv)
    use mexf90
    complex*16, dimension(:,:), intent(in)  :: A
    complex*16        :: Ainv(size(A,1),size(A,2))
    end function inv
end interface
integer*8, intent(in)   :: mI1, nI1, mO1, nO1, ...
real*8, intent(in)   :: input1_re(mI1,nI1), input1_im(mI1,nI1), sca1, ...
real*8, intent(out)   :: output(mO1,nO1)
...
complex*16    :: mat2Invert(mI1,nI1), invMat(mI1,nI1)
...
do i = 1,loop1
  do ii = 1,loop2
    ...
    invMat    = inv(mat2Invert)
    ...
  enddo
  do ii = 1,loop3
    output(i,ii)  = ...
  enddo
enddo
end subroutine functionWithMatrices
function inv(A) result(Ainv)
  use mexf90
  complex*16, dimension(:,:), intent(in)  :: A
  complex*16        :: Ainv(size(A,1),size(A,2))
  complex*16        :: work(size(A,1))  ! work array for LAPACK
  integer*8        :: ipiv(size(A,1))  ! pivot indices
  integer*8         :: n,info
  ! External procedures defined in LAPACK
  external ZGETRF
  external ZGETRI
  ! Store A in Ainv to prevent it from being overwritten by LAPACK
  Ainv   = A
  n  = size(A,1)
  ! ZGETRF computes an LU factorization of a general M-by-N matrix A
  ! using partial pivoting with row interchanges.
  call ZGETRF(n, n, Ainv, n, ipiv, info)
  if (info /= 0) then
   CALL MEXERRMSGTXT('Matrix is numerically singular!')
  end if
  ! ZGETRI computes the inverse of a matrix using the LU factorization
  ! computed by DGETRF.
  call ZGETRI(n, Ainv, n, ipiv, work, n, info)
  if (info /= 0) then
   CALL MEXERRMSGTXT('Matrix inversion failed!')
  end if
end function inv

where everything is in stack memory, I receive a "corrupted double-linked list error" while calling ZGETRF. If I use allocatable variables instead of static ones in inv function as follows;

function inv(A) result(Ainv)
  use mexf90
  complex*16, dimension(:,:), intent(in)  :: A
  complex*16, allocatable      :: Ainv(:,:)  
  complex*16, allocatable      :: work(:)  ! work array for LAPACK
  integer*8, allocatable      :: ipiv(:)  ! pivot indices

then I receive this error: "Program received signal SIGSEGV, segmentation fault. 0x0000003f33a7e296 in _int_malloc () from /lib64/libc.so.6" when I try to allocate the variables

n = size(A,1)
allocate(Ainv(n,n),work(n),ipiv(n))

Ugur on 24 Jun 2015

Many thanks for your interest on the issue! I am doing now exactly what you suggested, indeed. When I try a standalone application that just inverts a matrix, it works. But when I try to introduce a bunch of matrix operations in advance, there the problem begins. Maybe I can ask one last question about the issue: When I debug the code using Valgrind-memcheck, it indicates an "illegal write" whenever I allocate a matrix. I think the way Fortran allocates a block in heap memory is considered illegal in C environment (and so in MATLAB) but MATLAB can compensate this up to a certain extent. And when too much allocation/deallocation takes place, it crashes. (I know it sounds silly but that's the only explanation I can give after this many trials) Can you suggest me a better way to allocate some memory, which is valid in Fortran and not illegal in MATLAB?

Ugur on 24 Jun 2015

Hi James,

Please ignore all the things I wrote above. I found where the memory corruption comes from! It was my mistake, indeed. At a certain point the code was writing outside an array (stupid me!) But gdb was indicating the segmentation fault error when I try to launch another function, or allocate another array. And this was changing everytime I changed the code since different memory blocks were put next to each other I guess. Anyway, many many thanks for all your interest in the issue :)

Sign in to comment.

segfault when using "matmul" function in FORTRAN mex code

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

8 Comments
Show 6 older comments Hide 6 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

segfault when using "matmul" function in FORTRAN mex code

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

8 Comments Show 6 older comments Hide 6 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

8 Comments
Show 6 older comments Hide 6 older comments