GPU Coder™ relies on functionality provided by MATLAB® Coder™, so the first step in the troubleshooting process is to ensure that you have MATLAB Coder compatible code. To see programming requirements and best practices for MATLAB Coder, see MATLAB Programming for Code Generation.
GPU Coder has varying support for functions compatible with MATLAB Coder and Image Processing Toolbox™. A list of the functions that have been tested with GPU Coder is provided in MATLAB Algorithm Design for GPU. These functions are categorized into ones that are fully supported, functions that are unsupported, and functions that are supported under certain conditions. For example, there are certain functions that work in vector-based operations but not when used within a loop body. It is however recommended where possible to rewrite the toolbox functions with pure MATLAB.
GPU Coder uses program parallelism analysis to detect parallel for loops. Traditional serial algorithms can vary significantly in how parallelizable they are. Some problems are embarrassingly parallel and are easy to divide up into pieces. On the other hand, some algorithms require some amount of refactoring to expose their inherent parallelism. The parallel analysis that GPU Coder performs is conservative. As a result there are cases where loops are truly parallel, but dependence analysis fails to detect the parallelism.
Loops must be statically bound to determine kernel dimensions. For example, while loops, loops with break statements and loops whose iteration range cannot be statically determinable are not easily mappable to CUDA® kernels and have to be rewritten. Refer to the section on kernel analysis for more information.
After considering and rectifying these issues, you are now ready to generate
CUDA code. The easiest way to accomplish code generation is to drop in the
coder.gpu.kernelfun in to the entry point function. You can then follow
the steps described in Get Started with GPU Coder to generate
CUDA code from either the command line or by using GPU Coder app.
To assess the performance of generated CUDA code, we can use MATLAB
toc functions and determine execution
time. If the resulting GPU acceleration is not satisfactory, you can perform advance
Memory bottleneck analysis
Analysis with NVIDIA Visual Profiler (