GPU Performance Analyzer
Description
The GPU Performance Analyzer displays information about GPU and CPU activities, events, and performance metrics from the generated CUDA® code. Use the GPU Performance Analyzer to find performance bottlenecks in a design.
The GPU Performance Analyzer displays the profiling data in chronological order in the Profiling Timeline pane. The timeline contains four rows:
The Functions row, which shows function calls in the generated code
The Loops row, which shows loops that execute on the CPU
The CPU Overhead row, which shows memory transfers and GPU kernel launches
The GPU Activities row, which shows memory transfers and GPU kernel execution
The Diagnostics pane reports potential performance bottlenecks, such as long CPU loops, repetitive memory transfers, or GPU kernels with few threads, and it suggests ways to address them. The analyzer also contains:
A Call Tree pane that shows the function call hierarchy
A Profiling Summary pane that contains overview statistics for the generated code
An Event Statistics pane that displays detailed statistics for the selected event
A Code pane that you can use to trace events and generated code to the source MATLAB® code
If the entry-point function that you generate code from contains a deep learning network, you can use the Open Deep Learning Dashboard button to open a dashboard that contains the statistics for the network. For more information, see Analyzing Network Performance Using the Deep Learning Dashboard.
The GPU Performance Analyzer creates a
gpuProfiler.mldatx file that contains the profiling data in the
html subfolder of the code generation folder. You can reopen the
profiling data from a profiling session by opening the MLDATX file.
Open the GPU Performance Analyzer
MATLAB command prompt:
Use the
gpuPerformanceAnalyzerfunction.Use the
codegenfunction with both the-gpuprofileand-testoptions.Use the
gpuprofilefunction with thevieweroption.
Examples
Related Examples
Limitations
On the Functions and Loops rows, you can navigate between caller and callee functions and loops using the up and down arrows on the right side of the event bar. For short events, it might not be possible to navigate back to the calling function or loop by using the up and down arrows. In such cases, use the call tree to navigate back to the caller function or loop.
At low zoom levels, GPU Performance Analyzer represents a densely populated area of short events separated by short distances as a single event. At higher levels of zoom, GPU Performance Analyzer displays the individual events. However, if the event duration is extremely short, it may not be possible to render this event on the timeline plot, even at high zoom levels.
GPU Performance Analyzer displays all the GPU events in a single row. In case of multiple CUDA streams, the GPU Activities row may contain overlapping events and the calculation in the Profiling Summary panel may be inaccurate. For example, deep learning libraries such as cuDNN may use multiple CUDA streams.









