## Unexpected speed decrease of 2D Fourier Transform on GPU when iFFTed

### Tutu (view profile)

on 3 Jun 2019
Latest activity Answered by Joss Knight

on 8 Jun 2019

### Joss Knight (view profile)

I am applying a first FFT2 on a stack of images, croping a part of it, and iFFT2 this part:
For example on GPU: FFT2(1920*1240*30 (single) ) -> crop to 320*207*30 (single) -> iFFT2(320*207*30 (single) )
1920/6=320
1240/6=207
Here you may observe the time of execution, normalized to the number of single data processed, for each function: Note that the yellow line (FFT2+crop1/6+iFFT2) is more than an order of magnitude slower than the purple line which has 36 more data to process with iFFT2 !
Any idea on what is happening here?
Here is the script I have used:
clear
n=10;
cx=1920;
cy=1240;
FPT=2:5:50;
fpt=size(FPT,2);
b=zeros(1,fpt);
for kk=1:8
for ii=1:fpt
ii
I=gpuArray(single(rand(cy,cx,FPT(1,ii))));
Ia=gpuArray(single(rand(round(cy/6),round(cx/6),FPT(1,ii))+1i.*rand(round(cy/6),round(cx/6),FPT(1,ii))));
=(ones(size(round(cy/2)-round(cy/12):round(cy/2)+round(cy/12),2),size(round(cx/2)-round(cx/12):round(cx/2)+round(cx/12),2)));
tic
for jj=1:n
switch kk
case 1
tic
B=fft2(I);
case 2
tic
B=fft2(I);
C=B(((cy/2)-round(cy/12)):((cy/2)+round(cy/12)),...
((cx/2)-round(cx/12)):((cx/2)+round(cx/12)),:);
case 3
tic
B=fft2(I);
C=B(((cy/2)-round(cy/12)):((cy/2)+round(cy/12)),...
((cx/2)-round(cx/12)):((cx/2)+round(cx/12)),:);
D=ifft2(C);
case 4
tic
B=fft2(I);
C=ifft2(B);
case 5
tic
B=fft2(I);
D=ifft2(C);
case 6
tic
B=fft2(I);
D=ifft2(C);
E1=imresize(abs(D),1/6);
E2=imresize(angle(D),1/6);
case 7
tic
C=fft2(I);
B=ifft2(Ia);
case 8
tic
B=ifft2(Ia);
end
end
b(1,ii)=toc/n; % b is the time of execution normalized to
%the amount of data and the number of time a case has been evaluated
end
hold on
plot(b)
clear A B C D I E1 E2
end
b is the variable plotted in the above graphic.
My graphic card is the GeForce RTX 2080 Ti.
Any help will be appreciated.
Thanks,
Tual

### Joss Knight (view profile)

on 8 Jun 2019

I modified your code inserting wait(gpuDevice) before each tic and toc and got a much more sensible graph: The GPU runs asynchronously so tic and toc often don't work very well. See the documentation here.

### Bruno Luong (view profile)

on 3 Jun 2019

If you want a fast FFT, make your data length power of 2, or product of small integers.
166 is bad since the prime factor is 2 * 83..

Tutu

### Tutu (view profile)

3 Jun 2019
Thank you for the answer, I knew this though. But, besides, on GPU this doesn't make a big difference if you execute an important amount of data: whether or not you use data with 2^n dimension size.
Tutu

### Tutu (view profile)

on 3 Jun 2019
Bruno Luong see, Note that the "y" is in hertz