Main Content

gpucoder.profile

Create an execution profile report for generated CUDA code

Description

example

gpucoder.profile(func_name,codegen_inputs) generates an execution profiling report of the CUDA code generated for the design file func_name. The codegen_inputs argument specifies the inputs to the design file. You must install the Embedded Coder® product to generate the profiling report.

Note

The profiling workflow depends on the nvprof tool from NVIDIA®. In CUDA® toolkit v10.1, NVIDIA restricts access to performance counters to admin users. To enable GPU performance counters for all user accounts, see the instructions in https://developer.nvidia.com/nvidia-development-tools-solutions-ERR_NVGPUCTRPERM-permission-issue-performance-counters.

gpucoder.profile(___,Name,Value) generates an execution profiling report with one or more profiling options specified as a name-value pair argument.

Examples

collapse all

Perform fine-grain analysis for a MATLAB algorithm and its generated CUDA code through software-in-the-loop (SIL) execution profiling. You must install the Embedded Coder product to generate the execution profiling report.

Write an entry-point function that performs N-D fast Fourier transform. To map the FFT to the GPU, use the coder.gpu.kernelfun pragma. By default, the EnableCUFFT property is enabled, so the code generator uses the cuFFT library to perform the FFT operation.

function [Y] = gpu_fftn(X)
  coder.gpu.kernelfun();
  Y = fftn(X);
end

To generate the execution profiling report, use the gpucoder.profile function.

cfg = coder.gpuConfig('exe');
cfg.GpuConfig.MallocMode = 'discrete';
gpucoder.profile('gpu_fftn',{rand(2,4500,4)},'CodegenConfig',cfg,...
    'CodegenArguments','-d profilingdir','Threshold',0.001);

The code execution profiling report provides metrics based on data collected from a SIL execution. Execution times are calculated from data recorded by instrumentation probes added to the SIL test harness or inside the code generated for each component. For more information, see View Execution Times (Embedded Coder).

Input Arguments

collapse all

Name of the entry-point function or design file.

Example: gpucoder.profile('xdot',{1000,rand(1000,1),1,1,rand(1000,1),1,1})

Compile-time inputs to the entry-point function or design file.

Example: gpucoder.profile('xdot',{1000,rand(1000,1),1,1,rand(1000,1),1,1})

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: gpucoder.profile('xdot', {1000,rand(1000,1),1,1,rand(1000,1),1,1},'NumCalls',2,'CodegenConfig',cfg,'CodegenArguments','-d discrete','Threshold',0.01)

Specify the number of times the profiled section of the code is run. The default is 6. The first run is excluded from the report because it is generally an outlier.

Specify the code generation configuration object used to generate CUDA code and the profiling report. When you do not specify this value, a default coder.EmbeddedCodeConfig object is used.

Specify any additional codegen arguments as a string. The default value is NULL (empty string).

To control the GPU calls that are displayed in the report, use the threshold value. If the maximum execution time from the executions is x seconds, the software reports all GPU calls that exceed x * threshold.

Introduced in R2018b