Linux CUDA-based Shared Library Crashes MATLAB with Segfault on Kernel Call
Show older comments
Hey all,
I'm running into an issue with a CUDA-based shared library I've written to solve a system of PDEs that I load through loadlibrary. I've written this using pretty generic CUDA / no outside libraries, etc. When compiled to the shared library, executing it from MATLAB results in a crash (due to a segfault) once the execution arrives at the kernel calls (there are 5 distinct kernels, each called about twice). I've commented out the kernels one-by-one and they all lead to a segfault, leading me to believe there is some issue with the kernel calling mechanism, maybe?
I believe the kernels to be well functioning - I've written a c++ caller of the .so, and it works fine (passing cuda-memcheck as well). This code also works in Windows just fine, exactly as is (called from MATLAB). Therefore, I believe this to be a specific MATLAB issue or possibly compile flags issue. The odd thing is that writing a quick, trivial kernel appears to work within MATLAB (same flags as below) - the kernel does execute correctly.
So, I understand that you don't have my code...so I'm not asking for code debugging. My questions are moreso related to shared library compiling requirements by MATLAB. I use the following flags to compile through nvcc:
-std=c++14 -shared -x cu -cudart static -O2 -gencode=arch=compute_50,code=\"sm_50,compute_50\" -m64 -Xcompiler "-fPIC -Wno-narrowing" -w -Wno-deprecated-gpu-targets
and the following to link:
-shared -w
Do you see any issues with that? I mark the functions as extern "C" when compiling with g++ and have a clause for when MATLAB compiles the thunk library to simply use extern (due to using gcc).
#ifdef __linux__
#ifdef __cplusplus
#define EXTC extern "C"
#else
#define EXTC extern
#endif
...
Any issues there? I don't think there is - as mentioned, other funciton calls work fine.
I'm at a bit of a loss as to how to move forward here. Does anyone have any insight?
Thanks.
7 Comments
Joss Knight
on 12 Dec 2020
Edited: Joss Knight
on 12 Dec 2020
There could be a million issues. Are you passing GPU data to your kernel that you allocated in MATLAB? Is your library compiled with the same CUDA version as MATLAB?
It's going to be pretty hard to advise without seeing at least some basic reproduction code. Have you thought about writing a minimal bit of code that reproduces the issue that you can post here?
Tom Gade
on 12 Dec 2020
Joss Knight
on 13 Dec 2020
If you don't have Parallel Computing Toolbox and/or you are not creating any gpuArray data in MATLAB then no, it doesn't matter what toolkit MATLAB is using.
It's so hard to say what might be wrong, since you seem to be claiming there are no bugs in your CUDA code. One guess is that there are bugs, but it's only when your library is running in the MATLAB process that reading or writing off the end of an array is causing a crash. Sometimes cuda-memcheck doesn't notice these things, especially an illegal read, unless you compile with device debugging.
Try running cuda-memcheck with MATLAB. It's simple enough to launch MATLAB with the -r flag to run some code and then exit; with any luck the segfault will be triggered and cuda-memcheck will tell you where the problem is.
You could have an alignment issue, or you could be getting your datatype wrong. Try taking the data you copied to device and copying it back to a new host array. Display the array contents and check they're the same as before. Try copying the data to a newly allocated host array and then copy that data to device. That could fix an alignment issue.
Finally, use the NVIDIA NSight debugger to step through your CUDA code.
Tom Gade
on 13 Dec 2020
Joss Knight
on 16 Dec 2020
Try launching MATLAB with -softwareopengl and see whether there's some sort of graphics issue here.
Answers (0)
Categories
Find more on Loops and Conditional Statements in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!