Hello all, I have a question regarding memory management during Matlab's gpuArray/fft operation. I have a large NxM matrix [N = 10E3,M = 20E3, as an approx] where where I wish to take an fft in the M dimension. Now, for CPU operations I would normally permute the matrix to make the fft operation act in the 1st (column) dimension, for speed.
On the GPU, if I run the fft operation in the 1st dimension, I slam into the memory ceiling of my GPU. However, if I apply it in the row dimension I do not. I assume that this has to do with whether Matlab is doing N asynchronous fft's in the row direction, vs. a single massive matrix operation in the column dimension.
So, 4 questions:
- Is my assumption true?
- Are GPU operations still faster in the column direction (sort of answered this myself, got 3x speed advantage with below snippet.)
- Is there a way to know what the GPU memory need will be for the fft? If so, I can try chunking up the fft based on the GPU memory available.
- Is there another implementation that will have the speed of the column operation without the memory issues? I am going to try doing this as an arrayfun just to see.
x = gpuArray.rand(10000,10000);
xp = x.';