CUDAKernel
Kernel executable on GPU
Constructor
Description
A CUDAKernel
object represents a CUDA kernel that can execute on
a GPU. You create the kernel using CU and PTX code. For an example of how to create
and use a CUDAKernel
object, see Run CUDA or PTX Code on GPU.
Methods
existsOnGPU | Determine if gpuArray or CUDAKernel is available on GPU |
feval | Evaluate kernel on GPU |
setConstantMemory | Set some constant memory on GPU |
Properties
A CUDAKernel
object has the following properties:
Property Name | Description |
---|---|
ThreadBlockSize | Size of block of threads on the kernel. This can be an integer
vector of length 1, 2, or 3 (since thread blocks can be up to
3-dimensional). The product of the elements of
ThreadBlockSize must not exceed the
MaxThreadsPerBlock for this kernel, and no
element of ThreadBlockSize can exceed the
corresponding element of the property
MaxThreadBlockSize . |
MaxThreadsPerBlock | Maximum number of threads permissible in a single block for this
CUDA kernel. The product of the elements of
ThreadBlockSize must not exceed this value.
|
GridSize | Size of grid (effectively the number of thread blocks that will
be launched independently by the GPU). This is an integer vector of
length 3. None of the elements of this vector can exceed the
corresponding element in the vector of the
MaxGridSize property of the GPUDevice
object. |
SharedMemorySize | The amount of dynamic shared memory (in bytes) that each thread block can use. Each thread block has an available shared memory region. The size of this region is limited in current cards to ~16 kB, and is shared with registers on the multiprocessors. As with all memory, this needs to be allocated before the kernel is launched. It is also common for the size of this shared memory region to be tied to the size of the thread block. Setting this value on the kernel ensures that each thread in a block can access this available shared memory region. |
EntryPoint | (read-only) A character vector containing the actual entry point
name in the PTX code that this kernel is going to call. An example
might look like '_Z13returnPointerPKfPy' . |
MaxNumLHSArguments | (read-only) The maximum number of left hand side arguments that this kernel supports. It cannot be greater than the number of right hand side arguments, and if any inputs are constant or scalar it will be less. |
NumRHSArguments | (read-only) The required number of right hand side arguments needed to call this kernel. All inputs need to define either the scalar value of an input, the elements for a vector input/output, or the size of an output argument. |
ArgumentTypes | (read-only) Cell array of character vectors, the same length as
NumRHSArguments . Each of the character
vectors indicates what the expected MATLAB type for that input is (a
numeric type such as uint8 ,
single , or double followed
by the word scalar or vector
to indicate if we are passing by reference or value). In addition,
if that argument is only an input to the kernel, it is prefixed by
in ; and if it is an input/output, it is
prefixed by inout . This allows you to decide how
to efficiently call the kernel with both MATLAB arrays and gpuArray,
and to see which of the kernel inputs are being treated as
outputs. |
gpuArray
| GPUDevice
| parallel.gpu.CUDAKernel