GPU utilization is not 100%.
40 views (last 30 days)
Show older comments
DONGHYUN KIM on 22 May 2019
Commented: yan gao on 25 Sep 2021
The GPU usage is only 40% allocated for running the deep learning network.
Sometimes go up to 80% for a while but usually stay at 40%.
I want to know why.
Walter Roberson on 22 May 2019
GPU can only run at full speed if the entire problem fits into memory. That is seldom the case for deep learning: those networks are updated incrementally, so transferring images in from disc and memory uses a fair bit of time.
Joss Knight on 31 May 2019
Your question is very hard to answer in it's current form. You want to know why GPU utilisation is not 100%? The answer is, because the GPU isn't running kernels 100% of the time. Why? I don't know, because you haven't provided any information about what you're doing. Maybe, as Walter says, a lot of time is being spent doing file I/O, perhaps because you have a very slow disk or slow network file access. Maybe you have a transformed datastore, or an imageDatastore with a custom ReadFcn, and the data processing is very complex and takes place on the CPU, blocking GPU execution while it is carried out. Maybe you have a very small network, or a low resolution network, or you don't have a high enough mini-batch size, and so you are not successfully occupying all the cores on the GPU. Maybe your network is so small that the amount of time spent running the MATLAB interpreter in order to generate the GPU kernels to do the computation outweighs the amount of time it takes to run those kernels.
If you want to know more, run the MATLAB profiler and find out where time is being spent during training.
yan gao on 25 Sep 2021
I kindly invite you to help me by giving some advice on my question at
Abolfazl Nejatian on 15 Dec 2019
Thank you for the information you provided.
the strange thing is when i was testing my code on Linux and building my network with Python the GPU utilization grew up to around 100 percent but in windows with Matlab, it stays around 45 percent.
Joss Knight on 15 Dec 2019
It's not strange. Windows is a different operating system, different file system, and completely different (and considerably slower at allocating memory) GPU driver. Do you have a different card in your Windows machine too? All could be a problem.
Plus, if you start with a model defined in a Python framework and optimized for that, and then adapt it, we've no idea how good a job you did. If you took a MATLAB example and then converted it to Python you might have the same problem with Python. Maybe you're not successfully prefetching your data from the file system. Maybe you're not using MEX acceleration when you should be. Maybe your GPU could be put in TCC mode. That's why it's so difficult to answer your question when you're not telling us what you're doing.
Abolfazl Nejatian on 16 Dec 2019
well, i know these are the different OS, but the vague point is, with the same resource(both of them use Tesla V100, actually i install both of OS on one machin), why they can't use GPU in a similar percent!
yes, absolutely i used MEX code on my Matlab.
then i try to train a Resnet with Python. ( and all of the initial value were same, input size, network layers and etc).
there is no code conversion between Matlab and Python i used Matlab function and Pretrained Net for this work and for Python use Keras and Pycharm.
but in a windows environment with Matlab, my GPU utilization goes around 45% and in Python, at Linux OS it was around 90%!
now the question is, do you recommend me reInstall my Matlab on Linux OS and then i can use more from my hardware resource?
Joss Knight on 16 Dec 2019
Hi Abolfazi. I can't really recommend anything until I've seen your code. It may be as simple as changing the way you access your data; it may be that you should move to Linux; or it may be that there's nothing you can do. Maybe your Python code is grotesquely inefficient with GPU resources or spins up a lot of worthless kernels during spare cycles! It's just impossible to say. Give us your code, and run the MATLAB profiler and show us the profile report.
Lamya Mohammad on 29 Feb 2020
Did you solve the problem? My utilization is 29% and I wish to increase it
Find more on Image Data Workflows in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!Start Hunting!