The parallel pool that SPMD was using has been shut down.
Show older comments
Hi, I want to use parallel GPU-Computing for MatConvNet (code for multiple GPUs already exists). But I get a Matlab error in the Parallel Computing Toolbox.
At the command SPMD this error occurcs: "The parallel pool that SPMD was using has been shut down."
Code from MatConvNet:
function prepareGPUs(opts, cold)
numGpus = numel(opts.gpus) ;
if numGpus > 1
% check parallel pool integrity as it could have timed out
pool = gcp('nocreate') ;
if ~isempty(pool) && pool.NumWorkers ~= numGpus
delete(pool) ;
end
pool = gcp('nocreate') ;
if isempty(pool)
parpool('local', numGpus) ;
cold = true ;
end
end
if numGpus >= 1 && cold
fprintf('%s: resetting GPU\n', mfilename)
clearMex() ;
if numGpus == 1
gpuDevice(opts.gpus)
else
spmd
clearMex() ;
gpuDevice(opts.gpus(labindex))
end
end
end
That is why I try to test the parallel toolbox itself and get the same error. See attached image...

A out of ERROR-LOG-FILE (complete Log-FILE in the appendix)
job aborted:
rank: node: exit code[: error message]
0: 127.0.0.1: -2
1: 127.0.0.1: 0: process 1 exited without calling finalize
2: 127.0.0.1: -2
3: 127.0.0.1: -2
I am using Ubuntu 14.4 and 4 TITAN Xp
Any ideas, what I can do? Thanks for help! Regards, André
3 Comments
Joss Knight
on 10 Aug 2017
Something running inside the pool is crashing your MATLAB, probably one of MatConvNet's mex functions. Can you run the MatConvNet code on your machine in serial, without a pool? If it crashes you'll get a stack trace which will inform us as to the misbehaving code.
André Peter Kelm
on 10 Oct 2017
Brian Derstine
on 7 Jul 2021
I am getting the following error in 2021a after running trainNetwork with multi-gpu execution environment at the very end of training when it has completed and is about to generate final performance stats (ugh):
Error using trainNetwork (line 184)
The parallel pool that SPMD was using has been shut down.
...
Caused by:
Error using nnet.internal.cnn.ParallelTrainer/finalizeNetwork (line 122)
The parallel pool that SPMD was using has been shut down.
Cannot sort out why it is doing this. I ran the Cluster Profile Manager validation routine and it passed all tests with 10 workers. I also went into parallel preferences and turned off (unchecked) the option: "shut down and delete parallel pool after it has been idle for X (default 30) minutes." I also restarted matlab after doing this.
The training uses both system GPUs:
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 2).
and I have done the same in the past but for some reason this time it keeps erroring out.
Any ideas?
Accepted Answer
More Answers (0)
Categories
Find more on Parallel and Cloud in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!