This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.


Kernel executable on GPU


A CUDAKernel object represents a CUDA kernel, that can execute on a GPU. You create the kernel when you compile PTX or CU code, as described in Run CUDA or PTX Code on GPU.


existsOnGPUDetermine if gpuArray or CUDAKernel is available on GPU
fevalEvaluate kernel on GPU
setConstantMemorySet some constant memory on GPU


A CUDAKernel object has the following properties:

Property NameDescription
ThreadBlockSizeSize of block of threads on the kernel. This can be an integer vector of length 1, 2, or 3 (since thread blocks can be up to 3-dimensional). The product of the elements of ThreadBlockSize must not exceed the MaxThreadsPerBlock for this kernel, and no element of ThreadBlockSize can exceed the corresponding element of the GPUDevice property MaxThreadBlockSize.
MaxThreadsPerBlockMaximum number of threads permissible in a single block for this CUDA kernel. The product of the elements of ThreadBlockSize must not exceed this value.
GridSizeSize of grid (effectively the number of thread blocks that will be launched independently by the GPU). This is an integer vector of length 3. None of the elements of this vector can exceed the corresponding element in the vector of the MaxGridSize property of the GPUDevice object.
SharedMemorySizeThe amount of dynamic shared memory (in bytes) that each thread block can use. Each thread block has an available shared memory region. The size of this region is limited in current cards to ~16 kB, and is shared with registers on the multiprocessors. As with all memory, this needs to be allocated before the kernel is launched. It is also common for the size of this shared memory region to be tied to the size of the thread block. Setting this value on the kernel ensures that each thread in a block can access this available shared memory region.
EntryPoint(read-only) A character vector containing the actual entry point name in the PTX code that this kernel is going to call. An example might look like '_Z13returnPointerPKfPy'.
MaxNumLHSArguments(read-only) The maximum number of left hand side arguments that this kernel supports. It cannot be greater than the number of right hand side arguments, and if any inputs are constant or scalar it will be less.
NumRHSArguments(read-only) The required number of right hand side arguments needed to call this kernel. All inputs need to define either the scalar value of an input, the elements for a vector input/output, or the size of an output argument.
ArgumentTypes(read-only) Cell array of character vectors, the same length as NumRHSArguments. Each of the character vectors indicates what the expected MATLAB type for that input is (a numeric type such as uint8, single, or double followed by the word scalar or vector to indicate if we are passing by reference or value). In addition, if that argument is only an input to the kernel, it is prefixed by in; and if it is an input/output, it is prefixed by inout. This allows you to decide how to efficiently call the kernel with both MATLAB arrays and gpuArray, and to see which of the kernel inputs are being treated as outputs.

Introduced in R2011b

Was this topic helpful?