# How do I estimate the memory usage of a calculation on the GPU?

22 views (last 30 days)
Petter Stefansson on 15 Sep 2016
Answered: Joss Knight on 19 Sep 2016
Hi.
I have a parallel calculation that I’m doing on the GPU to speed up things compared to doing it on the CPU. However, the calculations are so large that I have to do them in batches in order to not run out of memory. So to do that I’m trying to estimate how much memory the calculation will require before I do it and then design the batch sizes so that the memory limit is not exceeded.
The way I’m doing it right now is simply looking at the dimensions of every variable that will be on the GPU during the calculation and multiplying the number of elements in each variable with 8 since I’m using double precision.
Once I know the size of every variable that will be in the calculation I also multiply the whole thing by a safety factor, 1.5 for example, to be on the safe side. However, even though I’m overestimating the size with 150% using the safety factor I’m still every now and then getting the error: Error using gpuArray/pagefun
Out of memory on device. To view more detail about available memory on the GPU, use 'gpuDevice()'. If the problem persists, reset the
GPU by calling 'gpuDevice(1)'.
And the line that seems to cause the error is:
pse = pagefun(@mldivide, A, B);
So am I estimating the size wrong by only looking at the dimensions of each variable and multiplying with 8? If I use a function like mldivide, does that cause an internal expansion of some temporary variable to occur on the GPU that I’m not foreseeing?

Joss Knight on 19 Sep 2016
Hi. mldivide is a very complicated calculation, so it's using a considerable amount of working memory. The code is complicated and the implementation is via a third party library, but you can imagine that:
1. It needs to take a copy of A to work with.
2. It performs an LU decomposition (if A is square), which will require another array the size of A.
3. It needs to do two backsubstitutions. Each one probably needs an array the size of A of working memory. The output of each one needs to go into an array the size of B.
So, maybe you need (5*numel(A)+ 3*numel(B))*8 bytes per calculation. Hopefully not, but certainly 1.5x isn't going to be enough.