Why is Titan V training performance so poor?

Question

1 vote

I wanted to speed up my neural network training so upgraded from a GTX1080 to a Titan V expecting a large increase in performance due to improved architecture, memory speed, etc.

Well, the 1080 is crushing the Titan V.

Transfer learning on alexnet and training on the same pool of images with identical settings

 opts = trainingOptions('sgdm','InitialLearnRate',0.001, 'Plots', 'training-progress', 'MiniBatchSize', 512)

the Titan moves at approximately 164 seconds per iteration while the 1080 is cruising at a 62 seconds per iteration.

I'm flabbergasted that a GPU that is outclassed in every way somehow manages to win by such a large margin.

Does anyone have a similar experience or any explanation for why this might be happening?

Thanks in advance.

L.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Joss Knight on 15 Jan 2018

Edited: Joss Knight on 16 Jan 2018

Open in MATLAB Online

1 vote

I am posting here the same information with which I responded to your tech support request. Perhaps others will find this useful.

In my tests of transfer learning AlexNet, the Titan V was 5x faster than the K20c, 2x faster than the GTX1080 (same series but faster than the 970) and 1.3x faster than the Titan XP. This was running R2017b Updates 2 and 4.

GeForce cards on Windows in WDDM mode are significantly affected by the OS's supervisory interference, particularly when it comes to the speed of memory allocation. This makes them much slower for certain functionality that requires a lot of memory allocation than on Linux. The Titan V, which is very new and does not yet have fully optimised drivers, seems to be particularly affected by this.

The solution is to put the Titan V into TCC mode. You will need to drive your graphics from another GPU or on-board graphics. Go to C:\Program Files\NVIDIA Corporation\NVSMI and run

nvidia-smi

to find out which GPU is your Titan V. Let us say it is GPU 1. Then type

nvidia-smi -i 1 -dm 1

and reboot.

In my own experiments I found that the Titan V was still slower on Windows for transfer learning of AlexNet than on Linux, but I do have a much slower CPU in my Windows machine, so it's probably just because of that. It may also be, as I say, that the Windows driver is not yet fully optimised - it is early days for the Titan V drivers.

An alternative work-around is to reduce the amount of raw allocation that is happening during training. You can either reduce the MiniBatchSize, or you can use a special Feature command to increase the amount of memory MATLAB is allowed to reserve:

>> feature('GpuAllocPoolSizeKb', intmax('int32'))

This has the side-effect of making MATLAB more likely to conflict with other applications using the GPU, but you can experiment with different pool sizes to find a balance. In WDDM mode you should see a considerable increase in performance due to the reduction in raw memory allocations, although in my experiments it didn't quite reach the performance of using TCC mode instead.

It's worth elaborating - you cannot judge how well a card will perform based entirely on raw computing power - all GPU algorithms require a combination of compute, memory i/o and CPU compute to function. GPUBench gives a reasonable indication of expected FLOPs for different kinds of algorithm, and Deep Learning is another kind of algorithm again.

MathWorks does not generally give hardware advice, so it is up to the customer to decide whether the Titan V is cost effective. Some things to take into consideration are:

The Titan cards (V and XP) can be put into TCC mode whereas the 970 and 1080 cannot.
The Titan cards support Remote Desktop when the card is not driving the display, the 970 and 1080 do not.
The Titan V has a Tensor Core, which means that when MATLAB supports half precision Deep Learning, its performance will greatly increase over the Pascal and Maxwell architectures.
The Titan V has excellent double-precision performance, unlike any other GeForce card. This means you can use it for other MATLAB functionality such as system modelling that requires the accuracy of double precision.

Hope this helps.

6 Comments
Show 4 older comments Hide 4 older comments

Louis Vaickus on 16 Jan 2018

I wanted to add a final update.

Following the steps outlined in your excellent responses (both email and in the forum), I was able to increase the performance of the Titan V to match that of the GTX1080.

I noticed in your response that you were getting roughly double the performance that I was, so I undertook the following steps in Windows:

0. Set Titan to TCC mode from WDDM using nvidia-smi (as you suggested).

1. Uninstall all Microsoft Visual Studio, .NET, C++ Redistributable, Build and Database components (I had versions 2015 and 2012 both installed).

2. Uninstall all CUDA components.

3. Restart.

4. Reinstall Microsoft tools: Visual Studio 2017 15.45, C++ Redistributable 2017, .NET 2017, and Build tools 2017.

5. Restart.

6. Install NVIDIA Driver 390.65.

7. Restart.

8. Install CUDA 9.1.

9. Restart.

These steps resulted in a 433% improvement in training speed on the Titan. So roughly 6 times as fast as a GTX970 and twice as fast as a GTX1080. This also got rid of CUDA install errors I had been getting (e.g. "unsupported hardware" or "hardware too new") and Matlab 2017b no longer displays the error message stating that the CUDA libraries would need to be recompiled every time I start training or use the command gpuDevice.

Just setting the Titan to TCC mode before the above steps resulted in a 177% increase in performance (putting the Titan on par with the GTX1080 in terms of training speed). I think there must have been some sort of interaction between CUDA and the Visual Studio installation (which was 2015/2012 version prior to the above steps) that was slowing the Titan down.

Thanks again.

Lou.

Joss Knight on 31 Jul 2018

It means you are not running as administrator.

Could you post this as a new question?

gycsu on 29 Nov 2018

I got this message on CenOS7:

Changing driver models is not supported for GPU 00000000:65:00.0 on this platform.

Treating as warning and moving on.

All done.

I was using root to do it. What is the solution for this? Also my Titan V is very slow when using Matlab2018B. Appreciate your response.

Sign in to comment.

Answer 2

Mert Su on 23 Mar 2020

Edited: Mert Su on 23 Mar 2020

0 votes

I am also perplexed that a GTX 1660 has a compute capability of 7.5 compared to a TitanV's 7.0.

I have two machines; one for work ($5,000) one for home ($850) use.

Both machines have Win 10 x64.

Titan V is on Intel i7-8700K, 32 GB Ram, Samsung 860 512 GB Nvme

Gtx 1660 is on Ryzen 5, 16 GB Ram, Intel 660p 512 GB Nvme

I believe this has nothing to do with Matlab because NVIDIA does not list the compute capabilities of Geforce 16 series on their website. A $170 GPU crushes a $3,000 GPU...

https://developer.nvidia.com/cuda-gpus#compute

1 Comment
Show -1 older comments Hide -1 older comments

Joss Knight on 24 Mar 2020

NVIDIA always take care to keep this Wikipedia page up to date, and you can find the GTX 16 Series there.

NVIDIA's bizarre naming and numbering conventions aside, the compute capability has to do with the underlying chipset and instruction set support and not to do with the performance capabilities of the card. In every compute capability category there are weaker lower-powered chips and more powerful ones.

Sign in to comment.

Answer 3

Louis Vaickus on 24 Mar 2020

0 votes

All,

It's been awhile since I made this initial post and I have a few updates.

In all our applications using fp32, the Titan V is ~25% faster than a GTX970.

Where these cards really shine is in fp16 or mixed precision tasks.

Using NVIDIA's APEX mixed precision libraries with our Titan V's in Windows, we get, at a minimum, halved memory usage, e.g. we can double the batch size and run larger models. With certain batch and filter sizes in Windows we get a 125% increase in speed (the tensor cores seem to like batch and filter sizes that are multiples of 8).

In Linux, we can achieve halved memory usage AND 500% increase in speed. You read that right, 500% increase in speed.

Linux seems necessary for this speed up as APEX needs access to resources which are not available in Windows (I can't remember what exactly is missing, but if you run anything with APEX it will tell you in an error message, I think some cuda library).

Of course all of the above is in Pytorch, I don't know whether half or mixed presicison is implemented in Matlab yet.

Lou.

1 Comment
Show -1 older comments Hide -1 older comments

Joss Knight on 24 Mar 2020

In MATLAB, you can generate code to run models in half or mixed precision using cudnn or TensorRT, using the GPU Coder product.

Sign in to comment.

Why is Titan V training performance so poor?

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

6 Comments
Show 4 older comments Hide 4 older comments

More Answers (2)

1 Comment
Show -1 older comments Hide -1 older comments

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Tags

Community Treasure Hunt

Why is Titan V training performance so poor?

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

6 Comments Show 4 older comments Hide 4 older comments

More Answers (2)

1 Comment Show -1 older comments Hide -1 older comments

1 Comment Show -1 older comments Hide -1 older comments

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

6 Comments
Show 4 older comments Hide 4 older comments

1 Comment
Show -1 older comments Hide -1 older comments

1 Comment
Show -1 older comments Hide -1 older comments