I have some MATLAB code that consists of 5 CUDA kernels followed by further processing using MATLAB functions (FFT, etc.). Kernels 3 and 4 are executed on the order of 10 times inside a MATLAB for loop (the algorithm is inherently sequential). The CUDA kernels produce a 100 MB gpuArray. On a GTX 560 Ti with 1 GB of memory, I was getting out of memory errors after CUDA kernel execution despite clearing every gpuArray except the one needed for further processing. The "solution" was to also clear the parallel.gpu.CUDAKernel variables. This freed hundreds of MB on the GPU and permitted further processing on the GTX 560 Ti. I have to re-create the CUDAKernels for each iteration, but this doesn't seem to take much time.
Is there any other way to release GPU memory associated with parallel.gpu.CUDAKernel objects?
P.S. The problem has already been decomposed into smaller pieces to limit memory consumption. Roughly 1 GB of raw data is passed through the GPU in chunks of 100 MB. P.P.S. The code ran fine on a GTX 660M with 2 GB of memory.