Why does my GPU not outperform my CPU/another GPU?

12 views (last 30 days)

Show older comments

MathWorks Support Team on 6 Jul 2017

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/347648-why-does-my-gpu-not-outperform-my-cpu-another-gpu

Edited: MathWorks Support Team on 23 Dec 2024

Accepted Answer: MathWorks Support Team

Why does my GPU not outperform my CPU / another GPU?

Sign in to answer this question.

Accepted Answer

MathWorks Support Team on 13 Dec 2024

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/347648-why-does-my-gpu-not-outperform-my-cpu-another-gpu#answer_273141

Edited: MathWorks Support Team on 23 Dec 2024

Open in MATLAB Online

There are multiple factors which determine a GPU's performance. The headline number of cores a GPU has is not enough to accurately gauge performance.

To isolate if the code or the GPU is the primary issue, try the following (details for each step below):

Run a standard benchmark test.
If the benchmark shows different behavior between the precision types: calculate the expected ratio between Single and Double Precision for their GPU card.
If the card is a laptop (mobile) card, adjust expectation for performance.
If performance in the benchmarks is good and clear then it is likely a problem originating in code.

Standard GPU Benchmark:

gpuBench is a benchmarking test written by MathWorks Parallel Computing Team and available on the File Exchange

This test will do a variety of tests involving both memory and compute intensive tasks in both single and double precision. It will also offer comparison between a relatively normal display card and a reasonable compute card. The performances are matched with the version of MATLAB being used.

>> gpuBench

Comparing GPU Devices:

To answer this question you will need the GPU device specifications and for completeness the CPU specs can help as well. There are then three key topics to consider when making the comparison.

1. Double vs Single Precision

Double precision and single precision performance can be wildly different between graphics cards with the same total number of cores (the variation is due to whether the cores are mostly FP32 (single) or FP64 (double)). Most GPUs are designed for mostly single precision performance since this is what graphics display demands. In comparison CPUs will not have a drop in performance for double precision. A full list of NVIDIA's GPUs with hardware statistics is a useful reference guide for information about any specific graphics card.

If Nvidia has declared their double precision performance it will be listed. If double precision performance is not listed, then although the compute capability may be above 1.3 (needed for double precision) then the performance is significantly lower (in the order of 32-64x slower at double precision than single precision).

To calculate the ratio between Single Precision and Double Precision:

Find the GPU on the wiki page above.
Get the stated single precision and double precision performance values from the table. (if there is no double precision GFLOPS value assume ratio is 32x to 64x slower for double precision)
Divide the stated single precision GFLOPS by the double precision GFLOPS to get a ratio of how slower double is to single.

At the time of writing a high end compute card can get this ratio as low as 2x.

2. Mobile vs Desktop

Is the graphics card inside a laptop? If yes, then it is highly likely the card is a mobile graphics card. In many cases this card is suffixed with an M (not all M's mean mobile again cross reference with the wiki page above to definitively check).

Mobile graphics cards are smaller and less powerful due to the heat and power restrictions their environment imposes on them. If using a mobile GPU for computation, speed expectations should be adjusted down.

3. Display vs Compute

Is the graphics card acting also as the display card? If there is only 1 graphics card in the machine and no on-board graphics, then this is likely the case. In this situation the operating system will commonly impose a Kernel Timeout. This is shown on the "gpuDevice" output as:

KernelExecutionTimeout: 1

A Kernel Execution Timeout's purpose is to make sure the OS is always able to print updates to the screen. If a computation on the GPU takes too much time then the operation will be killed. This tends to disrupt the CUDA environment for MATLAB and further use of the GPU by MATLAB (for either OpenGL or GPU computation) will require a restart of MATLAB. The following article has instructions on how to extend or disable this timeout period.

However, note that performance may be lowered even without hitting this timeout due to the need to share the resource with other programs. Where possible, do not use a GPU for both computation and display. NVIDIA's "nvidia-smi" utility can be used to monitor all the processes that are using each GPU card on your machine.

For Windows machines only:

It is possible to set a card in WDDM (for both display and compute) or TCC (compute only) mode using NVIDIA's "nvidia-smi" utility. Changing between these two modes requires the machine to be restarted. Note that TCC mode is not available to all the GPU cards (for example, the GeForce family of cards do not have TCC mode).

4. Obtain GPU execution time measurements

For specific parts of a larger workflow that performs GPU computations, you can measure the execution time of particular functions using gputimeit, which runs a function multiple times to average out variation and compensate for overhead. The gputimeit function also ensures that all operations on the GPU are complete before recording the time. Execution time measurements obtained using gputimeit among different GPU cards are comparable only if obtained from the same machine and, ideally, from the same MATLAB process.

Refer to the "Measure and Improve GPU Performance" documentation for more information:

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

More Answers (0)

Sign in to answer this question.

Categories

Parallel Computing Parallel Computing Toolbox GPU Computing

Find more on GPU Computing in Help Center and File Exchange

Tags

Products

Parallel Computing Toolbox

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!