MATLAB Answers

Gaszton
0

GPU CUDA kernel malloc error

Asked by Gaszton
on 10 May 2011
Hello, i have a geforce 425m card with compute capability 2.1 I wrote a kernel that is using malloc inside the kernel. First the ptx file didnot compiled. After I tried to set the nvcc parameter arch=sm_21 ( nvcc -I "D:\...VC\include" -arch=sm_21 -use_fast_math -ptx SR2.cu ) With this it compiled succesfully, i was just wondering why do i need the specify that. After that i tried to create the kernel in matlab:
ckernel=parallel.gpu.CUDAKernel('SR2.ptx', 'SR2.cu');
But i a get the error:
??? Error using ==> parallel.gpu.CUDAKernel
An error occurred during PTX compilation of <image>.
The information log was:
: Considering profile 'compute_20' for gpu='sm_21' in
'cuModuleLoadDataEx_2a9
The error log was:
The CUDA error code was: CUDA_ERROR_INVALID_IMAGE.
Before modifying the kernel to use malloc, and not specifying nvcc arch=sm_21, i was able to run my kernel from MATLAB without any problem.
I think that there is some configuration problem with CUDA. I hope someone has some idea how to solve this.
Thanks,
Gaszton

  1 Comment

Gaszton
on 10 May 2011
Seems like that there is no options in the cuModuleLoadDataEx for compute capability 2.1:
CUjit_target_enum; possible values are:
CU_TARGET_COMPUTE_10
CU_TARGET_COMPUTE_11
CU_TARGET_COMPUTE_12
CU_TARGET_COMPUTE_13
CU_TARGET_COMPUTE_20
http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/docs/online/group__CUDA__MODULE_g9e8047e9dbf725f0cd7cafd18bfd4d12.html#g9e8047e9dbf725f0cd7cafd18bfd4d12
But in the cuda toolkit 3.2 release notes i found:
Added CU_TARGET_COMPUTE_21 to JIT options.

Sign in to comment.

1 Answer

Edric Ellis
Answer by Edric Ellis
on 11 May 2011
 Accepted Answer

You can get that error message if you have a mismatch between the CUDA runtime in use by Parallel Computing Toolbox and the version of nvcc that you're using. If you're using R2010b, you need to use CUDA-3.1; for R2011a, you can use CUDA-3.2. I was able to compile and use the following trivial kernel:
// simple.cu
__global__ void fcn( double * out ) {
int * x = (int *) malloc( 1024 );
out[0] = x[0];
free( x );
}
By compiling like so:
$ /usr/local/cuda32/cuda/bin/nvcc -arch compute_20 -ptx simple.cu
and then using within MATLAB R2011a like so:
>> k = parallel.gpu.CUDAKernel( 'simple.ptx' );
>> gather(k.feval(0))
ans =
1.768515945000000e+09

  2 Comments

Gaszton
on 11 May 2011
Thank you for your help,
I have R2010b, and cuda toolkit 3.2.
Everything worked, until i specified the -arch options to nvcc.
If i dont specify that, what is the default? i wonder why it is not 2.1 if i have a card that has 2.1 compute capability.
If i compile my cu with -arch compute_20 or sm_20 , i still get error from matlab.
I should install CUDA toolkit 3.1, and try out if it works?
with cuda_3.1 am i able to use kernel malloc?
Thank you,
Gaszton
Gaszton
on 11 May 2011
Seems like, CUDA 3.1 does not support kernel malloc.
Otherwise with 3.1 i am able to use sm21 code in matlab.

Sign in to comment.