Why does my GTX Titan Black GPU underperform in double precision calculations in MATLAB R2015a?

5 views (last 30 days)
I experience unexpectedly slow performance of the GPU in double precision benchmarks.
I have a fast PC (Intel i7-4790 3.6GHz, 16GB of 1600MHz memory, Windows 7 64bit, and a nVidia GeForce GTX Titan Black GPU card, in PCIe 3.0x16 slot, with 850W power supply. I have downloaded the video drivers and CUDA toolkit and installed matlab Parallel Computing Toolbox:
>> gpuDevice
ans =
CUDADevice with
Name: 'GeForce GTX TITAN Black'
Index: 1
ComputeCapability: '3.5'
SupportsDouble: 1
DriverVersion: 7
ToolkitVersion: 6.5000
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 6.4425e+09
AvailableMemory: 6.2105e+09
MultiprocessorCount: 15
ClockRateKHz: 980000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
I then downloaded the GPU benchmarking tool by by the MathWorks Parallel Computing Toolbox Team (version of Updated 05 Jan 2015), from http://www.mathworks.com/matlabcentral/fileexchange/34080-gpubench
and executed the “gpuBench”.
The results show that my GPU performs similarly to Quadro K6000 in single precision benchmarks (with deviations up to 40%, as expected: both the cards have the same no of CUDA cores but the memory bandwidth is higher for my Titan Black and the amount of memory is higher K6000)
However, the GeForce GTX Titan Black performs 4 times (!) slower than Quadro K6000 in the double precision benchmarks! This is unexpected for several reasons.
A) both cards are fairly similar:
Specification type K6000 / Titan Black
CUDA cores: 2880 / 2880
Clock: 902MHz /889MHz
Memory clock: 6 Gbps/ 7Gbps
Memory bandwidth: 288GB/s / 336GB/s
B) There are benchmarking tests done by the MathWorks
Parallel Computing Toolbox Team shown in the file “Older benchmarks for GPUs” attached. From those results, a GPU very similar to mine, GeForce GTX Titan (an
older GPU with 2688 CUDA cores, 837MHz clock, 6Gbps memory clock and 288GB/s memory bandwidth) shows benchmarks very much similar to Quadro K6000:
Card                        DOUBLE                         SINGLE
               Benchmark MTimes,Backlash, FFT,  MTimes,Backlash,FFT
K6000                       1092       421         160      3017      831         334
GTX Titan                  1106      352         150      2933      582         298
My GPU                      252      163         110      4221      994         409
These results indicate that my GPU card (GeForce GTX Titan Black) should be faster than or similar to the Quadro K6000. However, the performance in the double precision is terrible (4x slower).

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 22 Apr 2015
In this particular case, double precision computing needs to be enabled which can be done using the NVIDIA Control Panel. The below external article show how this may be done.
In general, double precision can often be much slower across GPUs as some of them are optimized by design for single precision computation only and not scientific calculations involving double precision numbers.
As we are unable to provide recommendation for GPU hardware, please contact NVIDIA directly for further information on this disparity in performance. 

More Answers (0)




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!