I have been running some very long loops (millions of iterations) where , in each iteration, I call a few CUDA kernels via feval using pre-allocated arrays of fixed size. I noticed that the host memory grows linearly with the number of iterations and in the end matlab crashes. While I was trying to isolate the problem I found out the following: - Using feval to call a CUDA kernel , you have to have all the arguments of the function already cast as gpuArray's, even if you pass scalar variables. This also applies to functions like gpuArray.rand or randn:
n = 1e4; for i = 1:1e6 out = gpuArray.rand(n,1,'single'); end
The above code causes the host memory to grow for the duration of the execution (about 100Mb per 250K iterations) If instead of n=1e4; you write n=gpuArray(1e4); the subsequent loop does not cause the memory to grow. I also found out the the above loop executes much faster when n is in the host memory vs. when n is a gpuArray (about 3 times faster).
-Even more puzzling is the following example:
x = gpuArray.rand(1e4,1,'single'); for i = 1:1e6 out = sqrt(x); end
The above loop does not cause MATLAB's memory footprint to grow. However, if we change sqrt(x) with sqrt(1./x) then we get the memory blowup again. I am using MATLAB 2013a 64-bit on windows 7 professional. My video card is a gtx 650 2gb. Thanks in advance for any insights.
Hi Michael, could you read the following bug-report and try the workaround it contains (being careful about the backing-up step!):
If this does not fix the problem, please let me know as soon as possible.