Parallel Processing with Multipul GPUs - fftn

Hello, I was wondering it is possible to utilize multipul GPUs to perform several ffts in a manner where each individual fft could be given to an individual GPU to be processed in parallel - or simultaneous calculation of two separate functions.
For instance, something like this.
clear all
Nx = 256;
Ny = 256;
Nz = 512;
A = rand(Nx,Ny,Nz)+1i*rand(Nx,Ny,Nz);
A = gpuArray(A);
B = rand(Nx,Ny,Nz)+1i*rand(Nx,Ny,Nz);
B = gpuArray(B);
% GPU 1 will process this
A = fftn(A);
% GPU 2 will process this while GPU 1 is still processing the above
B = fftn(B);
% program waits for GPUs to finish then proceeds
Is this possible in MATLAB?

Answers (1)

Matt J
Matt J on 15 May 2020
Edited: Matt J on 15 May 2020
You can use gpuDevice() to select different GPUs for different calculations. gpuArray commands run asynchronously without blocking Matlab execution on your host CPU. So, the tasks below will run essentially in parallel with GPU 1 having only a slight head start.
% GPU 1 will process this
gpuDevice(1);
A = gpuArray.rand(Nx,Ny,Nz)+1i*gpuArray.rand(Nx,Ny,Nz);
A = fftn(A);
% GPU 2 will process this
gpuDevice(2);
B = gpuArray.rand(Nx,Ny,Nz)+1i*gpuArray.rand(Nx,Ny,Nz);
B = fftn(B);

5 Comments

I've tried this and it doesn't work.
Your workspace will show matrix 'A', but the variable will no longer be there. Once the command 'gpuDevice()' is sent, all the variables on other GPUs and the one that is called, are erased. Using your code with just one GPU, you can see this.
clear all
Nx = 256;
Ny = 256;
Nz = 512;
% GPU 1 will process this
gpuDevice(1);
A = gpuArray.rand(Nx,Ny,Nz)+1i*gpuArray.rand(Nx,Ny,Nz);
A = fftn(A);
% GPU 1 erases 'A' above and process this
gpuDevice(1);
B = gpuArray.rand(Nx,Ny,Nz)+1i*gpuArray.rand(Nx,Ny,Nz);
B = fftn(B);
% GPU 1 erases 'B' above
gpuDevice(1);
Additionally, the function 'gpuDevice()' takes a long time to execute so it isn't practical to use in my application
Maybe if you do it with parfeval,
p=parpool;
results(1:2) = parallel.FevalFuture;
for i=1:2
result(i)=parfeval(@fcn,1,i);
end
function A=fcn(i)
gpuDevice(i);
A = gpuArray.rand(Nx,Ny,Nz)+1i*gpuArray.rand(Nx,Ny,Nz);
A = gather(fftn(A));
end
Before I even attempt this, can we walk through your thought process on why you think this would work.
1.) It takes a considerable amount of time when this is called. Please, try this function out. You can do something simple like this.
tic
for i = 1:10
gpuDevice(1)
end
toc
2.) Again, using 'gpuDevice()' clears variables.
Matt J
Matt J on 15 May 2020
Edited: Matt J on 15 May 2020
1) For me it only takes 0.7 sec per call to gpuDevice(). It doesn't make much sense to be parallelizing if that's your bottleneck. The rest of your tasks would have to be trivially fast.
2) In parfeval, you are executing gpuDevice on different parpool workers, each acting like a different Matlab session, with its own workspace. I would not expect the workspace of the two parpool workers to interact at all.
If you use gpuDevice() to select a different device than is currently selected for that process then the existing GPU will be reset, losing information.
When you use parfeval() with a pool size that is no larger than the number of GPUs that you have, then each one will run in its own process and will be granted a GPU of its own. If I recall correctly, gpu selection is automatic for this situation.

Sign in to comment.

Products

Release

R2019b

Asked:

on 15 May 2020

Commented:

on 16 May 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!