## GPU memory overhead dependent on fft dimension.

### D. Plotnick (view profile)

on 2 Jul 2018
Latest activity Answered by Joss Knight

on 2 Jul 2018

### Joss Knight (view profile)

Hello all, I have a question regarding memory management during Matlab's gpuArray/fft operation. I have a large NxM matrix [N = 10E3,M = 20E3, as an approx] where where I wish to take an fft in the M dimension. Now, for CPU operations I would normally permute the matrix to make the fft operation act in the 1st (column) dimension, for speed.
On the GPU, if I run the fft operation in the 1st dimension, I slam into the memory ceiling of my GPU. However, if I apply it in the row dimension I do not. I assume that this has to do with whether Matlab is doing N asynchronous fft's in the row direction, vs. a single massive matrix operation in the column dimension.
So, 4 questions:
• Is my assumption true?
• Are GPU operations still faster in the column direction (sort of answered this myself, got 3x speed advantage with below snippet.)
• Is there a way to know what the GPU memory need will be for the fft? If so, I can try chunking up the fft based on the GPU memory available.
• Is there another implementation that will have the speed of the column operation without the memory issues? I am going to try doing this as an arrayfun just to see.
Code snippet:
x = gpuArray.rand(10000,10000);
xp = x.';
gputimeit(@() fft(x,[],1))
gputimeit(@() fft(xp,[],2))
Thanks all.

D. Plotnick

### D. Plotnick (view profile)

on 2 Jul 2018
As I suspected, arrayfun (at least my way of using it) is way slower.
f = @(i) fft(x(:,i),[],1);
tic
y = arrayfun(f,1:size(x,2),'UniformOutput',false);
wait(g);
y = cat(2,y{:});
toc

on 2 Jul 2018