The parallel pool that SPMD was using has been shut down.

Hi, I want to use parallel GPU-Computing for MatConvNet (code for multiple GPUs already exists). But I get a Matlab error in the Parallel Computing Toolbox.
At the command SPMD this error occurcs: "The parallel pool that SPMD was using has been shut down."
Code from MatConvNet:
function prepareGPUs(opts, cold)
numGpus = numel(opts.gpus) ;
if numGpus > 1
% check parallel pool integrity as it could have timed out
pool = gcp('nocreate') ;
if ~isempty(pool) && pool.NumWorkers ~= numGpus
delete(pool) ;
end
pool = gcp('nocreate') ;
if isempty(pool)
parpool('local', numGpus) ;
cold = true ;
end
end
if numGpus >= 1 && cold
fprintf('%s: resetting GPU\n', mfilename)
clearMex() ;
if numGpus == 1
gpuDevice(opts.gpus)
else
spmd
clearMex() ;
gpuDevice(opts.gpus(labindex))
end
end
end
That is why I try to test the parallel toolbox itself and get the same error. See attached image...
A out of ERROR-LOG-FILE (complete Log-FILE in the appendix)
job aborted:
rank: node: exit code[: error message]
0: 127.0.0.1: -2
1: 127.0.0.1: 0: process 1 exited without calling finalize
2: 127.0.0.1: -2
3: 127.0.0.1: -2
I am using Ubuntu 14.4 and 4 TITAN Xp
Any ideas, what I can do? Thanks for help! Regards, André

3 Comments

Something running inside the pool is crashing your MATLAB, probably one of MatConvNet's mex functions. Can you run the MatConvNet code on your machine in serial, without a pool? If it crashes you'll get a stack trace which will inform us as to the misbehaving code.
Thanks for your answer! The code works in serial and does not crashes. We upgradet to matlab 2017b and now we can use 2 GPUs, but not 4. We started a Matlab support question...
I am getting the following error in 2021a after running trainNetwork with multi-gpu execution environment at the very end of training when it has completed and is about to generate final performance stats (ugh):
Error using trainNetwork (line 184)
The parallel pool that SPMD was using has been shut down.
...
Caused by:
Error using nnet.internal.cnn.ParallelTrainer/finalizeNetwork (line 122)
The parallel pool that SPMD was using has been shut down.
Cannot sort out why it is doing this. I ran the Cluster Profile Manager validation routine and it passed all tests with 10 workers. I also went into parallel preferences and turned off (unchecked) the option: "shut down and delete parallel pool after it has been idle for X (default 30) minutes." I also restarted matlab after doing this.
The training uses both system GPUs:
Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 2).
and I have done the same in the past but for some reason this time it keeps erroring out.
Any ideas?

Sign in to comment.

 Accepted Answer

Hello. Andre. I encounter the same problem last day. The reason is that I have the same name function in my matlab environment which the warning is the function xxx has the same name in matlab builtin functions. so it can't use the parallel tool. so I changed the search path and make sure there is only one function use in matlab, finally it works. May this help you.

1 Comment

Thanks for this information. Now it is running quite nicely.
Details: We are using MatConvNet and there are functions called parallel! It was a name conflict in combination with the search path.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!