I have been running some very long loops (millions of iterations) where , in each iteration, I call a few CUDA kernels via feval using pre-allocated arrays of fixed size. I noticed that the host memory grows linearly with the number of iterations and in the end matlab crashes. While I was trying to isolate the problem I found out the following: - Using feval to call a CUDA kernel , you have to have all the arguments of the function already cast as gpuArray's, even if you pass scalar variables. This also applies to functions like gpuArray.rand or randn:
n = 1e4;
for i = 1:1e6
out = gpuArray.rand(n,1,'single');
The above code causes the host memory to grow for the duration of the execution (about 100Mb per 250K iterations) If instead of n=1e4; you write n=gpuArray(1e4); the subsequent loop does not cause the memory to grow. I also found out the the above loop executes much faster when n is in the host memory vs. when n is a gpuArray (about 3 times faster).
-Even more puzzling is the following example:
x = gpuArray.rand(1e4,1,'single');
for i = 1:1e6
out = sqrt(x);
The above loop does not cause MATLAB's memory footprint to grow. However, if we change sqrt(x) with sqrt(1./x) then we get the memory blowup again. I am using MATLAB 2013a 64-bit on windows 7 professional. My video card is a gtx 650 2gb. Thanks in advance for any insights.