Why is Titan V training performance so poor?

7 views (last 30 days)
I wanted to speed up my neural network training so upgraded from a GTX1080 to a Titan V expecting a large increase in performance due to improved architecture, memory speed, etc.
Well, the 1080 is crushing the Titan V.
Transfer learning on alexnet and training on the same pool of images with identical settings
opts = trainingOptions('sgdm','InitialLearnRate',0.001, 'Plots', 'training-progress', 'MiniBatchSize', 512)
the Titan moves at approximately 164 seconds per iteration while the 1080 is cruising at a 62 seconds per iteration.
I'm flabbergasted that a GPU that is outclassed in every way somehow manages to win by such a large margin.
Does anyone have a similar experience or any explanation for why this might be happening?
Thanks in advance.
L.

Accepted Answer

Joss Knight
Joss Knight on 15 Jan 2018
Edited: Joss Knight on 16 Jan 2018
I am posting here the same information with which I responded to your tech support request. Perhaps others will find this useful.
In my tests of transfer learning AlexNet, the Titan V was 5x faster than the K20c, 2x faster than the GTX1080 (same series but faster than the 970) and 1.3x faster than the Titan XP. This was running R2017b Updates 2 and 4.
GeForce cards on Windows in WDDM mode are significantly affected by the OS's supervisory interference, particularly when it comes to the speed of memory allocation. This makes them much slower for certain functionality that requires a lot of memory allocation than on Linux. The Titan V, which is very new and does not yet have fully optimised drivers, seems to be particularly affected by this.
The solution is to put the Titan V into TCC mode. You will need to drive your graphics from another GPU or on-board graphics. Go to C:\Program Files\NVIDIA Corporation\NVSMI and run
nvidia-smi
to find out which GPU is your Titan V. Let us say it is GPU 1. Then type
nvidia-smi -i 1 -dm 1
and reboot.
In my own experiments I found that the Titan V was still slower on Windows for transfer learning of AlexNet than on Linux, but I do have a much slower CPU in my Windows machine, so it's probably just because of that. It may also be, as I say, that the Windows driver is not yet fully optimised - it is early days for the Titan V drivers.
An alternative work-around is to reduce the amount of raw allocation that is happening during training. You can either reduce the MiniBatchSize, or you can use a special Feature command to increase the amount of memory MATLAB is allowed to reserve:
>> feature('GpuAllocPoolSizeKb', intmax('int32'))
This has the side-effect of making MATLAB more likely to conflict with other applications using the GPU, but you can experiment with different pool sizes to find a balance. In WDDM mode you should see a considerable increase in performance due to the reduction in raw memory allocations, although in my experiments it didn't quite reach the performance of using TCC mode instead.
It's worth elaborating - you cannot judge how well a card will perform based entirely on raw computing power - all GPU algorithms require a combination of compute, memory i/o and CPU compute to function. GPUBench gives a reasonable indication of expected FLOPs for different kinds of algorithm, and Deep Learning is another kind of algorithm again.
MathWorks does not generally give hardware advice, so it is up to the customer to decide whether the Titan V is cost effective. Some things to take into consideration are:
  1. The Titan cards (V and XP) can be put into TCC mode whereas the 970 and 1080 cannot.
  2. The Titan cards support Remote Desktop when the card is not driving the display, the 970 and 1080 do not.
  3. The Titan V has a Tensor Core, which means that when MATLAB supports half precision Deep Learning, its performance will greatly increase over the Pascal and Maxwell architectures.
  4. The Titan V has excellent double-precision performance, unlike any other GeForce card. This means you can use it for other MATLAB functionality such as system modelling that requires the accuracy of double precision.
Hope this helps.
  6 Comments
Joss Knight
Joss Knight on 31 Jul 2018
It means you are not running as administrator.
Could you post this as a new question?
gycsu
gycsu on 29 Nov 2018
I got this message on CenOS7:
Changing driver models is not supported for GPU 00000000:65:00.0 on this platform.
Treating as warning and moving on.
All done.
I was using root to do it. What is the solution for this? Also my Titan V is very slow when using Matlab2018B. Appreciate your response.

Sign in to comment.

More Answers (2)

Mert Su
Mert Su on 23 Mar 2020
Edited: Mert Su on 23 Mar 2020
I am also perplexed that a GTX 1660 has a compute capability of 7.5 compared to a TitanV's 7.0.
I have two machines; one for work ($5,000) one for home ($850) use.
Both machines have Win 10 x64.
Titan V is on Intel i7-8700K, 32 GB Ram, Samsung 860 512 GB Nvme
Gtx 1660 is on Ryzen 5, 16 GB Ram, Intel 660p 512 GB Nvme
I believe this has nothing to do with Matlab because NVIDIA does not list the compute capabilities of Geforce 16 series on their website. A $170 GPU crushes a $3,000 GPU...
  1 Comment
Joss Knight
Joss Knight on 24 Mar 2020
NVIDIA always take care to keep this Wikipedia page up to date, and you can find the GTX 16 Series there.
NVIDIA's bizarre naming and numbering conventions aside, the compute capability has to do with the underlying chipset and instruction set support and not to do with the performance capabilities of the card. In every compute capability category there are weaker lower-powered chips and more powerful ones.

Sign in to comment.


Louis Vaickus
Louis Vaickus on 24 Mar 2020
All,
It's been awhile since I made this initial post and I have a few updates.
In all our applications using fp32, the Titan V is ~25% faster than a GTX970.
Where these cards really shine is in fp16 or mixed precision tasks.
Using NVIDIA's APEX mixed precision libraries with our Titan V's in Windows, we get, at a minimum, halved memory usage, e.g. we can double the batch size and run larger models. With certain batch and filter sizes in Windows we get a 125% increase in speed (the tensor cores seem to like batch and filter sizes that are multiples of 8).
In Linux, we can achieve halved memory usage AND 500% increase in speed. You read that right, 500% increase in speed.
Linux seems necessary for this speed up as APEX needs access to resources which are not available in Windows (I can't remember what exactly is missing, but if you run anything with APEX it will tell you in an error message, I think some cuda library).
Of course all of the above is in Pytorch, I don't know whether half or mixed presicison is implemented in Matlab yet.
Lou.
  1 Comment
Joss Knight
Joss Knight on 24 Mar 2020
In MATLAB, you can generate code to run models in half or mixed precision using cudnn or TensorRT, using the GPU Coder product.

Sign in to comment.

Categories

Find more on Get Started with GPU Coder in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!