Matlab is able to utilize only a part of actual available VRAM.

It is known that Windows10 with WDDM2.2 reports less than actual VRAM present. For instance with a GTX-1080-TI having 11GB of memory is being reported as only 9GB of free available. However Microsoft claims that as soon as an application claims more memory, VRAM becomes available, so the entire 11GB of VRAM is actually available to the user.
Just to note that the GTX-1080-TI is the 2nd GPU, there is another low profile GPU that has the monitors attached to it as dedicated for display (Set from NVIDIA control panel)
Now, Matlab reports 9.02GB of available VRAM using the 'gpudevice' command. However, this becomes a hard-limit within Matlab. Meaning I cannot declare a GPU variable larger than the available 9.02GB.
What can be done to use the additional 2GB of actual physical memory that is available on the GPU but not accessible to Matlab. This is a very expensive resource and very useful and would like to use it. Can Matlab not claim the memory from the OS (Windows10) dynamically? I am using Matlab R2017a and system spec is 128GB main memory with 2 Xeon CPUs each with 6 cores running at 2.6GHz.

 Accepted Answer

NVIDIA have responded to confirm that this is expected behaviour. In summary:
  • WDDM2 releases 90% of available memory to CUDA.
  • A single application is only allowed to allocate 90% of that 90%, i.e. 81% of total memory.
  • NVIDIA are working with Microsoft to tighten this bound, and/or remove it on non-display cards.

11 Comments

Wow Joss thanks so much for resolving that! Really appreciate it
Hi Joss, Thanks a lot for throwing some light on this issue. Gives us a much better picture of what is happening. I observe a difference here when I compare this with another OS and driver set like Linux (CentOS-7) where I have access to almost the full GPU memory. I understand Windows is artificially making this restrictions. One of the things you mentioned, that WDDM2 is releasing 90% of the memory as it would do for a GPU that is responsible for regular display. When I am dedicating a GPU for compute it should be releasing 100% instead of 90%. Can this be not changed manually by a user? Is there something that could be done? Some work around solution? Also a single application is allowed 90%, can this be not changed either? The other problem is allocation by percentage and not actual value. Meaning if I buy an expensive GPU with 2x of GPU memory, WDDM2 would be blocking even more memory now?
Pavel, just for clarification, virtually the entire GPU memory is available * across all processes*, just not for any individual process. Maybe theres some way to write-the software using multiple processes so you can access the entire memory
Yes, that is what I meant, availability in terms of a single process. This is clearly an issue with WDDM2. There should be a way for an advanced user to use his complete available GPU memory for a single process. The easiest would be for Microsoft to come up with a patch for WDDM2, allowing 100% allocation to CUDA when that particular GPU is not being used for display purpose (even though it may have the capability to do so) and allowing 100% memory available to a single GPU process when not being used for display. Until then can get create a hack! I read somewhere that WDDM1 for Windows7 did not have this issue. Haven't verified though. If true, could this be used instead of WDDM2?
Well, I'm an outsider, but from my perspective it doesn't look like Microsoft are especially focussed on support for compute within the WDDM driver. I think what you're expected to do is buy a card that supports the TCC driver, like the Titan series. The GeForce cards are primarily intended for graphics.
I agree it would be great if its user configurable. But I doubt its to allow for display purpose, since that shouldnt be a percentage. I think its to make sure no process takes too much VRAM so you cant start up another software and use the GPU. And since I'm assuming a lot of users like to jump from software app to game and back again, maybe they have a good reason.
The Titan series or the Tesla series is super expensive (orders of magnitude) compared to GTX-1080-Ti or the expected release of GTX-1180-Ti which would have more than double the memory of it's predecessor as they claim. So, clearly there is a need for compute even on GeForce cards. Can there be a possible hack! Like patching the WDDM1 driver instead of the WDDM2 or something like that. I am not an expert in this field else I would have tried doing it myself! I am running 3 GTX-1080-Ti card and getting good result computing Deep Learning AI tasks, but forced to switch to Linux. This is preventing from building a budget HW solution that has good performance capability on Windows10 platform.
Or an alternative solution would be for NVIDIA to come up with a device driver for GTX-1080-Ti that is TCC compatible!
If I were NVIDIA, I'd be looking for ways to encourage compute users to use the devices with a suitable spec to handle compute workloads, such as having ECC memory and sufficient robustness to be operating continuously for several days. This way they can focus on creating devices that are optimal for graphics users without having to continually deal with users who are using hardware for a job it isn't designed for and continually hitting issues. (For an example of the pain it causes, trying searching for 1080 Ti on this forum).
Unfortunately, there isn't really a good way to do that; what it seems they will be unlikely to do is actively encourage people to use the GTX series for compute, for instance by supporting TCC mode.
This, of course, is completely just my own speculation.
It would benefit other users of the forum if you could accept this answer in both your posts. Thanks.

Sign in to comment.

More Answers (5)

The division of GPU memory into compute and graphics is dictated by the driver - it isn't something that a CUDA application has any control over. I'm not really sure what you're referring to with regards more memory becoming available once requested. By the time you call gpuDevice, the CUDA runtime has been initialized and it takes a large chunk of memory, then MATLAB preallocates a bunch of slabs of GPU memory, so maybe you've lost about 1GB to graphics and about 1GB to MATLAB and MATLAB's CUDA context. So 9GB sounds about right.
I suppose you could start MATLAB using softwareopengl - maybe that would save some video memory? Probably not though, given you're not actually driving the display with this card.

5 Comments

Thanks for the reply. In my system, I have a graphics card that is separate from a GTX-1080-TI GPU. I also used the NVIDIA control panel to make sure my graphics card is dedicated to graphics and the GTX-1080-TI is free from any other application using it other than Matlab.
This phenomenon of either Matlab or Python reporting less memory than available has been reported by many people in the past. Initially it was assumed as a Microsoft Windows issue. But later Microsoft clarified that thought Windows10 WDDM2.x reports a large chunk (~20%) is being used by the driver, it frees up memory as soon as a request is made. (Now I am not familiar with the details of how this actually works). If Microsoft is correct, then Matlab should be able to utilize the entire 11GB of memory minus the space Matlab needs and minimum amount that Windows driver will occupy. It is impossible that this combined would be to 2GB. If true people couldn't use a 2GB graphics card for compute purposes and is being used. So, my rough estimate is that around 1GB is being wasted here and my quest is in retrieving it as it's a precious resource. If you happen to have a gaming NVIDIA GPU as a 2nd GPU on a Windows10 system, you could try it out and roughly 20% will not be accessible to the user thought nothing is running on the GPU.
You also mentioned "you could start MATLAB using softwareopengl". Could you please elaborate on this. I am not familiar with it. Would be glad to try it.
Why not start by googling "MATLAB softwareopengl"? ;)
I don't know how to help you. MATLAB just reports what memory the driver says is free via the public driver API. And it allocates memory using the driver's API. There are no other mechanisms available for reporting or allocating memory. Is it possible that Microsoft were referring to video memory being allocated via openGL being released, rather than compute memory?
The Windows WDDM driver is pathological and not only hogs memory for the OS (even when the card is not being used for graphics), it also imposes incredible bandwidth restrictions on memory access and extremely disruptive kernel timeout interrupts. Then the CUDA runtime takes a large chunk of memory for its page table, which is related to the total amount of memory. On my Titan V this amounts to 500MB.
In other words, this is really a question for NVIDIA. I can happily raise your issue with them and see what they have to say.
I had googled "MATLAB -softwareopengl" and what I had found was from cmd line I could open Matlab that the option of 'opengl'. I get exactly the same amount of available VRAM if I did without and hence, I was not sure if I was doing the correct thing. So, it looks like Matlab sees the same amount of available VRAM with or without '-softwareopengl'. As you mentioned, I would really appreciate if you could raise the question with "NVIDIA". I am also trying to run on a Linux machine. But there are some systems that We need to run on a windows machine because of interaction with other SW that is windows only. My understanding was that windows does hog memory, as you mentioned, however, upon request it makes it available as pointed out by Microsoft (They didn't mention openGL or compute memory). I am not familiar with the details of this, but if Microsoft is claiming correctly, then somehow Matlab is not able to utilize the full memory from the OS, as you mentioned, may be because of Nvidia drivers.
Okay, I've put your question to NVIDIA for you. It would be nice if there were indeed some magic we could do to get Windows to give us back some memory; but to be honest, if there were, I would have expected to have heard of it by now.
Pavel, I've been getting exactly the same numbers as you with my 1080ti, and similar numbers with my 1070. And Windows 10 Task Manager seems to agree that the maximum utilization I can get on my 1080ti is 9 GB with Matlab plus another 0.6 GB of other stuff on the system. However, when I go into Visual Studio and run the CUDA "cudaMemGetInfo" to obtain free and total GPU memory bytes right after I fill the memory with a 9 GB gpuArray it shows used memory is 10.7 GB, not 9.6 GB like Task Manager and Matlab show. cudaMemGetInfo reports free and total, and I take the difference to obtain "used". Below is the link to my posts on the subject:

Sign in to comment.

Hi, That's an interesting piece of information. I did not check from visual studio. What I learnt is that Microsoft claims that it holds a large chunk of VRAM, but as an application requests for using more memory it frees up. Matlab says that it uses a standard API to use the GPU pointing towards NVIDIA windows10 driver issue. NVIDIA hasn't said anything useful in regards to this so far, that I know of. Tesla series GPU from NVIDIA which is only for computing (super expensive) and doesn't have any display adapter, doesn't have this issue. I have verified this on both K20 and K40 series. So, the bottom line is that, if you are using Windows10, and if your GPU supports display adapter then you will roughly loose 20% of VRAM space and no one claims occupying that 20%, between Microsoft, NVIDIA and Mathworks. To me it seems more like a Microsoft and NVIDIA issue from the face value of it. I switched to Linux, CentOS-7 and it's running fine with around little less than 0.5GB of VRAM being shown as occupied.

1 Comment

Yeah, the strange thing is that when you make a gpuArray in Matlab and you KNOW exactly the amount of VRAM it requires, and you KNOW the starting amount of VRAM used by the other processes (from GPUs in Task Manager), and that adds up to 9.6 GB, and if you increase the array size by a tiny amount and it crashes, you're pretty sure of the amount of memory that can be used by all the processes is 9.6 GB. Though maybe it holds 1 GB for a different process to come along and request it, and won't give it out to existing processes that are using VRAM. But then at the same time CUDA tells you you're actually using over 1 GB more than that (ie, 10.7 GB). I have a hard time believing that Microsoft or NVIDIA would deliberately hold out VRAM just to be nasty, or that they'd block existing processes from accessing VRAM that nobody else has asked for. It's all pretty strange, IMO. By the way, here's the CUDA code I used in Visual Studio:
int main()
{
cudaSetDevice(0);
// show memory usage of GPU
size_t free_byte;
size_t total_byte;
cudaError_t cuda_status = cudaMemGetInfo(&free_byte, &total_byte);
if (cudaSuccess != cuda_status) {
std::cout << "Error: cudaMemGetInfo fails, " << cudaGetErrorString(cuda_status) << std::endl;
exit(1);
}
double free_db = (double)free_byte;
double total_db = (double)total_byte;
double used_db = total_db - free_db;
std::cout << "GPU memory usage: used = " << used_db / 1024.0 / 1024.0 / 1024.0 <<
" GB, free = " << free_db / 1024.0 / 1024.0 / 1024.0 <<
" GB, total = " << total_db / 1024.0 / 1024.0 / 1024.0 << " GB" << std::endl;
return 0;

Sign in to comment.

I tried this in Matlab. I reset GPU memory and try to assign the largest size array possible. I couldn't exceed marginally more than 9GB. So, definitely, that 2GB sitting there is being occupied by something Or Matlab is just not able to see it or not allowed to access.
It is possible that NVIDIA driver is not efficient enough to request memory from OS dynamically (if this is the case). Because, with same version of Matlab but different OS and different device driver, on the same GPU, I am able to see slightly more than 10.5GB as available VRAM from Matlab. Also, I noticed that the GPU computation time is much less around 50% less in Linux compared to Windows10. That is why I feel it's the combination of OS and device driver that is the culprit here, at least as of now.

2 Comments

Looks like I was able to utilize 10.8 GB out of my 1080ti's 11 GB. I started with my Matlab code that loaded the gpuArray up to what it saw as the max available, 9 GB. At the time there was also some system stuff using 0.6 GB. But then I opened a 3D rendering app which used about 2GB of GPU memory, and it loaded fine. In fact Task Manager showed 10.8 GB out of 11 GB of GPU memory "dedicated". So maybe what Microsoft is saying actually IS true. Maybe any one individual process can't access more than 9 GB, but if another process comes along and asks for VRAM it will get it. Now when I look in Task Manager under the Details tab, where you can see the Dedicated & Shared GPU Memory for each process, it shows Matlab taking 8.6 gB and the 3D rendering process taking 2.1 GB, which indeed totals 10.7 GB. BTW, both are also taking some "shared" GPU memory, which I believe is system RAM. There are also some other system processes taking GPU memory at the same time, amounting to maybe 0.2 GB or less. Interesting...so maybe that's the story. Maybe any one process can't access more than 9 GB on a 1080ti, but if someone else comes along it can grab whatever is remaining, so that the total utilization is just under 11 GB.
I note that the second application only requires graphics memory, not compute. Have you tried starting a second MATLAB and seeing how much memory you can allocate between them?
Ultimately WDDM is a video driver and I suspect that any memory held back is only released for graphics.

Sign in to comment.

According to Microsoft, not just a different application but even a single application would be given more memory if requested. Your experiment proves that the 2nd application is able to utilize full memory. Then I would think, Matlab using NVIDIA device driver should be able to utilize full memory in Windows10 environment. Very likely it's a Nvidia issue. As Matlab says that it uses standard API provided by Nvidia. Unless Matlab is missing out on requesting Nvidia driver for more memory. Joss Knight, from Matlab pointed out that he would ask Nvidia about this issue. Would love to hear back on that.

1 Comment

Pavel thats interesting that even the same application can request more and get it. Like I said I tried to increase the Matlab gpuArray size slightly above the available 9 GB and it crashed on "out of memory". Maybe it needs to be a separate process or request or something. Or like Joss suggests another instance of Matlab

Sign in to comment.

So I tried what Joss suggested, and ran two instances of Matlab. Both instances had the same code, which checks available GPU memory, and then fills a gpuArray with just enough elements to fill that amount of memory. Below is an image of Task Manager/Details, which shows dedicated and shared GPU memory for both processes. The first one (on top), grabbed about 8 GB, and the second one took 2 GB of GPU memory and almost 7 GB of shared memory. I believe that means it grabbed some system RAM in order to make both arrays?

2 Comments

I can't really explain that; MATLAB doesn't deliberately allocate any unified memory (which is what the Task Manager could be showing), but the CUDA runtime and CUDA libraries might, so perhaps that's what that is, depending on what your script is doing; it could also be the behaviour of the WDDM driver which is allocating resources out of CPU memory.
If you look at dedicated GPU memory that now adds up to 10.4 GB, which is much closer to 11 GB than before; maybe this issue is resolved?
Given that MATLAB's way of measuring available memory depends on the driver, you may find a more reliable way of packing memory is to allocate arrays of larger and larger size until MATLAB throws an out-of-memory error.
FWIW, here's the code I'm using to fill a gpuArray just under what is returned as "available":
g = gpuDevice(1);
bytestoGB = 1073741824.0;
fprintf(1,'Device %s has Total Installed Memory %.2f GB \n', g.Name,g.TotalMemory/bytestoGB)
fprintf(1,'Device %s has Available Memory %.2f GB \n', g.Name,g.AvailableMemory/bytestoGB)
% Determines the number of 8-byte ints that are needed to fill GPU VRAM
% based on how much is available. For 1080ti that is a 34809 x 34809 array,
% and for 1070 it's a 29693 x 29693 array
elements=uint64((sqrt(g.AvailableMemory/8))-1);
%
% % Creates an array of int64's on GPU
Array1=gpuArray(ones(elements, 'int64'));
%
% % Figure how much memory is used by the array
el = cast(elements,'double');
arrayMemory = (el*el)*8.0/bytestoGB;
fprintf(1,'Array has %i x %i 8-byte (int64) elements \n', elements, elements)
fprintf(1,'Array consumes: %.3f GB \n', arrayMemory)
fprintf(1,'Device %s has Available Memory %.4f MB after array fills GPU \n', g.Name,g.AvailableMemory/1024.0/1024.0)
fprintf(1,'Total GPU Memory minus Array Memory = %.2f GB \n', (g.TotalMemory/bytestoGB-arrayMemory))

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!