When you build an application that uses the generated CUDA® C++ code, you must provide a CUDA C++ main function that calls the generated code. By default, for code generation
of source code, static libraries, dynamic libraries, and executables by using the
codegen command, GPU Coder™ generates example CUDA C++ main files (
main.cu source file and
main.h header file in the
examples subfolder of the
build folder). This example main file is a template that helps you incorporate generated
CUDA code into your application. The example main function declares and initializes
data, including dynamically allocated data. It calls entry-point functions but does not use
values that the entry point functions return.
When generating code for deep convolutional neural networks (CNN), the code generator takes advantage of NVIDIA® cuDNN, TensorRT for NVIDIA GPUs or the ARM® Compute Library for the ARM Mali GPUs. These libraries have specific data layout requirements for the input tensor holding images, video, and any other data. When authoring custom main functions for building an application, you must create input buffers that provide data to the generated entry-point functions in the format expected by these libraries.
For deep convolutional neural networks (CNN), a 4-D tensor descriptor is used to define the format for batches of 2-D images with the following letters:
N – the batch size
C – the number of feature maps (number of channels)
H – the height
W – the width
The most commonly used 4-D tensor formats is shown, where the letters are sorted in decreasing order of the strides.
Of these, GPU Coder uses the
NCHW format (column-major layout by default). To
use row-major layout pass the
-rowmajor option to the
codegen command. Alternatively, configure your code for row-major
layout by modifying the
cfg.RowMajor parameter in the code generation
For example, consider a batch of images with the following dimensions:
W=4. If the image pixel elements are represented by a sequence of
integers, the input images can be pictorially represented as follows.
When creating the input buffer in the main function, the 4-D image is laid out in the
memory in the
NCHW format as:
Beginning with the first channel (
C=0), the elements are arranged
contiguously in row-major order.
Continue with second and subsequent channels until the elements of all the channels are laid out.
Proceed to the next batch (if
N > 1).
A long short-term memory (LSTM) network is a type of recurrent neural network (RNN) that can learn long-term dependencies between time steps of sequence data. For LSTM, the data layout format can be described with the following letters:
N – the batch size
S – the sequence length (number of time steps)
d – the number of units in one input sequence
For LSTM, GPU Coder uses the
SNd format by default.