MATLAB Answers


gpuArray and memory management

Asked by Gunnar Läthén on 7 May 2012
I have a loop in which I create a number of gpuArrays. To keep within memory limits I clear some gpuArrays with intermediate results. In Matlab R2011b everything was cleared nicely, but with R2012a the loop crashes with an out of memory exception (running the exact same code). I understand that I cannot completely trust the FreeMemory reported by gpuDevice, although I see that memory is freed in R2011b when in R2012a it is not. Is there some way to force R2012a to release the memory (without a reset)?


It sounds like you're probably being hit by GPU memory fragmentation. Do you have any reproduction steps you could post?
Returning again to this issue, I run into problems when running the full set of code (as opposed to the example in the comment below). In general, it feels like the memory management in R2012a is quite flaky compared to R2011b. I have put the complete code at if you are willing to try it out.

Sign in to comment.

1 Answer

Answer by Ben Tordoff on 8 May 2012

Hi Gunnar,
this is more a work-around than an answer, but try inserting a "wait(gpu)" after freeing the memory. For example:
gpu = gpuDevice();
bigData = parallel.gpu.GPUArray.rand(2000);
% do lots of computations
clear bigData;
In R2012a and above the GPU might still be running when you get to the "clear" command so it may need to hold onto the memory. Using "wait" to ensure all computations have completed allows the memory to be released safely.
However, this shouldn't be necessary. If memory runs low, MATLAB should wait and free up some memory automatically. Could you post a snippet of code that shows how to hit the problem so that I can see why this isn't happening for you? In particular, which function runs out of memory - is it a creation function (zeros, ones, rand etc) or an operation (fft, multiply etc)?

  1 Comment

I've tried to reduce the code to something manageable. I removed kernel executions and replaced them with pure memory allocations and some bogus calculations. The code doesn't make sense but it reproduces the problem on my machine at least. It seems like adding a wait() in the end of the loop fixes things, but maybe the example can be of use to you!
In R2012a I get the output (without the wait()):
In R2011b I get the output:
...and so on...
g = gpuDevice;
dim = [288 320 256];
data = parallel.gpu.GPUArray.zeros(dim, 'single');
V = parallel.gpu.GPUArray.zeros(size(data), 'single');
eig1 = parallel.gpu.GPUArray.zeros(size(data), 'single');
eig2 = parallel.gpu.GPUArray.zeros(size(data), 'single');
eig3 = parallel.gpu.GPUArray.zeros(size(data), 'single');
for ind = 1:10
fxx = parallel.gpu.GPUArray.zeros(size(data), 'single');
fxy = parallel.gpu.GPUArray.zeros(size(data), 'single');
fxz = parallel.gpu.GPUArray.zeros(size(data), 'single');
fyy = parallel.gpu.GPUArray.zeros(size(data), 'single');
fyz = parallel.gpu.GPUArray.zeros(size(data), 'single');
fzz = parallel.gpu.GPUArray.zeros(size(data), 'single');
eig1 = fxx + fyy;
eig2 = fxy.*fyz;
eig3 = fxz - fzz;
clear fxx;
clear fxy;
clear fxz;
clear fyy;
clear fyz;
clear fzz;
v = parallel.gpu.GPUArray.zeros(size(data), 'single');
v = eig1.*eig2.*eig3;
V = v;
clear v;

Sign in to comment.