Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Matlab 2013a GPU memory leak

Asked by Michael on 27 May 2013

I have been running some very long loops (millions of iterations) where , in each iteration, I call a few CUDA kernels via feval using pre-allocated arrays of fixed size. I noticed that the host memory grows linearly with the number of iterations and in the end matlab crashes. While I was trying to isolate the problem I found out the following: - Using feval to call a CUDA kernel , you have to have all the arguments of the function already cast as gpuArray's, even if you pass scalar variables. This also applies to functions like gpuArray.rand or randn:

n = 1e4;
for i = 1:1e6   
out = gpuArray.rand(n,1,'single');
end

The above code causes the host memory to grow for the duration of the execution (about 100Mb per 250K iterations) If instead of n=1e4; you write n=gpuArray(1e4); the subsequent loop does not cause the memory to grow. I also found out the the above loop executes much faster when n is in the host memory vs. when n is a gpuArray (about 3 times faster).

-Even more puzzling is the following example:

x = gpuArray.rand(1e4,1,'single');
for i = 1:1e6
out = sqrt(x);
end

The above loop does not cause MATLAB's memory footprint to grow. However, if we change sqrt(x) with sqrt(1./x) then we get the memory blowup again. I am using MATLAB 2013a 64-bit on windows 7 professional. My video card is a gtx 650 2gb. Thanks in advance for any insights.

3 Comments

Ben Tordoff on 28 May 2013

Hi Michael, could you explain a bit more about your comment "you have to pass all the arguments as gpuArray's, even if you pass scalars"? This shouldn't be the case.

In your first example you are passing "n" as a non-GPU scalar and this should work both for functions and CUDAKernels. I have tried several CUDAKernel calls and passing non-GPU data (either scalar or array) seems to be fine. I'm obviously not trying the right thing - could you provide an example of it not working so that we can investigate what is going wrong?

I've also been trying to reproduce the memory leak you're hitting and am not getting very far. To help narrow down the differences between what I am trying and what you are trying, can you let me know the graphics driver version that you are using? Also, could you describe the error that appears when MATLAB crashes just in case that reveals something?

Thanks

Ben

Michael on 28 May 2013

Hi Ben, I tried the following code

n = 1e3;
for i = 1:1e8
out = gpuArray.rand(n,1,'single);
disp(i)
end

After about 30 million iterations, MATLAB's memory footprint grew to about 14gb (my PC has 16gb of RAM). MATLAB started to pause at certain times because of hard drive activity. I didn't manage to crash it, memory usage was around 13.9-14.5GB but the system was almost unresponsive due to hard drive activity. I need to find again the code that produced the crash. I remember though that the error message was something about Java heap space.

If I try

    n = gpuArray(1e3);
    for i = 1:1e8
    out = gpuArray.rand(n,1,'single);
    disp(i)
    end

then MATLAB's memory does not grow! However, it is quite a bit slower than the first loop (about 2-3 times)! That's what I meant when I said that I needed to cast scalar input variables as gpuArrays.

I used Windows' task manager to monitor MATLAB's memory usage. My driver version is 314.07. I installed the latest version (320.18); it didn't make a difference. Thanks for your help, Michael

Ben Tordoff on 4 Jun 2013

Thanks Michael, you are indeed right and this appears to be a bug introduced in R2013a. There is no realistic work-around I can provide right now, but I will post an update here once I have some more helpful suggestions.

The reason why the memory does not leak with certain calls is that they force a synchronisation event (in your first example, SQRT can error so has to wait to see if the error was hit; in the second the scalar parameter "n" has to be transferred back to host memory, which also causes a sync). You could achieve the same by inserting a "wait(gpu)" after every call:

gpu = gpuDevice();
for ii=1:1e8
  out = gpuArray.rand(1e3,1,'single');
  wait(gpu);
  disp(i)
end

but that will also slow things down a lot and is hardly a practical solution.

Michael

Products

No products are associated with this question.

1 Answer

Answer by Ben Tordoff on 18 Jun 2013
Accepted answer

Hi Michael, could you read the following bug-report and try the workaround it contains (being careful about the backing-up step!):

http://www.mathworks.com/support/bugreports/954239

If this does not fix the problem, please let me know as soon as possible.

Ben

1 Comment

Michael on 21 Jun 2013

Hi Ben, I tried a piece of code that I have been working on; Without the patch, MATLAB ends up consuming 7gb of RAM (after 4mil iterations). With the patch MATLAB ends up consuming 1.3gb. At the beginning of the loop MATLAB was using 0.5gb. For simpler pieces of code I have (like the ones I posted on my question) there is no memory growth. Thanks for the patch, it basically solves completely the problem for me.

Michael

Ben Tordoff

Contact us