This is machine translation

Translated by Microsoft
Mouse over text to see original. Click the button below to return to the English verison of the page.


Create GPU CUDA kernel object from PTX and CU code


KERN = parallel.gpu.CUDAKernel(PTXFILE,CPROTO)
KERN = parallel.gpu.CUDAKernel(PTXFILE,CUFILE)


KERN = parallel.gpu.CUDAKernel(PTXFILE,CPROTO) and KERN = parallel.gpu.CUDAKernel(PTXFILE,CPROTO,FUNC) create a CUDAKernel object that you can use to call a CUDA kernel on the GPU. PTXFILE is the name of the file that contains the PTX code, or the contents of a PTX file as a character vector; and CPROTO is the C prototype for the kernel call that KERN represents. If specified, FUNC must be a character vector that unambiguously defines the appropriate kernel entry name in the PTX file. If FUNC is omitted, the PTX file must contain only a single entry point.

KERN = parallel.gpu.CUDAKernel(PTXFILE,CUFILE) and KERN = parallel.gpu.CUDAKernel(PTXFILE,CUFILE,FUNC) create a kernel object that you can use to call a CUDA kernel on the GPU. In addition, they read the CUDA source file CUFILE, and look for a kernel definition starting with '__global__' to find the function prototype for the CUDA kernel that is defined in PTXFILE.

For information on executing your kernel object, see Run a CUDAKernel.


If contains the following:

* Add a constant to a vector.
__global__ void addToVector(float * pi, float c, int vecLen)  {
   int idx = blockIdx.x * blockDim.x + threadIdx.x;
   if (idx < vecLen) {
       pi[idx] += c;

and simpleEx.ptx contains the PTX resulting from compiling into PTX, both of the following statements return a kernel object that you can use to call the addToVector CUDA kernel.

kern = parallel.gpu.CUDAKernel('simpleEx.ptx', ...
kern = parallel.gpu.CUDAKernel('simpleEx.ptx', ...
                                     'float *,float,int');
Was this topic helpful?