# From CPU code to GPU

11 views (last 30 days)
MartinM on 19 Aug 2021
Answered: Joss Knight on 16 Sep 2021
Hello Everybody
I have an heavy code, it's a Split-Step Fourier program, with different external function.
The main vector (the propagating one) is 2^16 to 2^18 point. It currently work as a classic CPU program.
My computer has a CUDADevice with properties: Name: 'Quadro RTX 4000'. So I try to tranlate my program.
Before that I test the speed of the GPU vs CPU with some code I found here. And It's faster, Perfect
To translate my program I do this modification
A=gpuArray(A);
B=gpuArray(B);
And I try to do it for the most variable I have, it's quite long and painfull
I also do it in the function I need to use, to be sure that the most part of the variable are GpuArray
BUT, it's slower...
I never use GPU before, so I guess I am doing something wrong..
Martin

MartinM on 19 Aug 2021
for exemple
clc,clear all
close all
%% CPU
tic
num.n=1*2^18;
num.tspan = 2e-09;
num.dt=num.tspan/(num.n-1);
T = zeros(1, num.n);
for k=1:1:num.n
T(k)=(k-1)*num.dt-num.tspan/2;
end
toc
%% GPU
clear all
close all
tic
num.n=gpuArray(1*2^18);
num.tspan =gpuArray(2e-09);
num.dt=gpuArray(num.tspan/(num.n-1));
T = gpuArray(zeros(1, num.n));
for k=1:1:num.n
T(k)=(k-1)*num.dt-num.tspan/2;
end
toc
the result is
Elapsed time is 0.027964 seconds.
Elapsed time is 47.296786 seconds.

Joss Knight on 16 Sep 2021
You need to vectorize your code. The GPU is not intended for performing this kind of looping series of operations on scalar variables. For instance use
k = gpuArray(1:num.n);
T=(k-1)*num.dt-num.tspan/2;
to compute every element of T at once.
You do not need to convert every variable to a gpuArray, just your inputs.