I have a identical .cu files, one where I use variables defined as float and another where I use variables defined as double.
The double precision one works perfectly when called on by the kernel, whereas the float version does not. I get an error when I gather() the output variables:
Errror using gpuArray/gather An unexpected error occurred during CUDA execution. The CUDA error was: CUDA_ERROR_UNKNOWN
From the documentation, it appears that the feval() function will automatically cast my input arrays to the correct type, however, I have also tried individually transforming each input/output array to float using single(), but I get a similar error.
Here is the format:
__global__ void SegForceNBodyCUDA(double const *SoA, double const a, double const MU, double const NU, int const S, double *f0x, double *f0y, double *f0z, double *f1x, double *f1y, double *f1z);
__global__ void SegForceNBodyCUDA(float const *SoA, float const a, float const MU, float const NU, int const S, float *f0x, float *f0y, float *f0z, float *f1x, float *f1y, float *f1z);
Both .cu files compile correctly without errors/warnings.
Thanks for sending the code.
I’ve done some initial investigation and it looks like you have an illegal memory access somewhere. Here is what cuda-memcheck reports:
Running CUDA Single Precision, Optimised... warning: Cuda API error detected: cuModuleGetGlobal_v2 returned (0x1f4) warning: Cuda API error detected: cuModuleGetGlobal_v2 returned (0x1f4) [Launch of CUDA Kernel 102 (SegForceNBodyCUDA<<<(4,1,1),(256,1,1)>>>) on Device 0] Memcheck detected an illegal access to address (@local)0xfff830
Program received signal CUDA_EXCEPTION_1, Lane Illegal Address. [Switching focus to CUDA kernel 102, grid 103, block (0,0,0), thread (5,0,0), device 0, sm 12, warp 2, lane 5] 0x0000000010052d98 in SegForceNBodyCUDA(float const*, float, float, float, int, float*, float*, float*, float*, float*, float*) ()
I couldn't see anything obviously wrong in the kernel, but it's quite a lot of code. The most likely culprit is reading/writing past the end of an input/output array. However, the fact that the illegal memory address includes "@local" may indicate a problem with how data is being passed around internally to the kernel (i.e. in thread-local memory). I don't think there is any problem with the way the kernel is being called by MATLAB.
Best of luck debugging this - these type of problems can be tricky to isolate.