I'm offloading computation to a GPU using the "CUDAKernel" object from Parallel Computing Toolbox (PCT). This involves compiling CUDA C (".cu" files) as PTX code, and passing both files to the "feval" function.
When I use the "restrict" keyword in my CUDA C program, it compiles fine as PTX. But then when I try to create the CUDAKernel object, I get the following error:
kernel = parallel.gpu.CUDAKernel('CUDAKernel_test.ptx','CUDAKernel_test.cu','CUDAKernel_test');
Error using parallel.internal.gpu.handleKernelArgs>iParseToken (line 329)
Unable to parse declaration: const double * __restrict weights
My function declaration looks like:
__global__ void CUDAKernel_test(double* arg1, const double* __restrict arg2)
The restrict keyword is a message to the compiler that the associated pointer is not aliased. This allows the compiler to make certain optimizations it otherwise couldn't.
If anyone knows why this wouldn't be supported, please let me know.