How to shut down all running workers of paarpools?

17 views (last 30 days)
How can I find and shut down all workers of all parpools that might currently be running?
During debugging I frequently run into crashes and out of memory errors. Often, some worker processes keep running and I would like to know, how to best close all of them, before starting another script.

Answers (3)

Raymond Norris
Raymond Norris on 6 Mar 2023
Hi @Felix. If even if a single worker crashes, all workers will terminate. Can you elaborate a bit more on a couple of things
  1. Are you using a local pool or a cluster? If cluster, MJS or your own scheduler (and if so, which)?
  2. Which parallel constructs are you using (parfor, parfeval, etc.)? Can you give a simple example of what might crash. Not interested in the details (I'm sure the worker(s) are crashing), more interested in how your running the code.
  1 Comment
Edric Ellis
Edric Ellis on 7 Mar 2023
Note that on "local" and MJS clusters, the parallel pool will not necessarily immediately terminate when a single worker crashes. On those clusters, pools that have not yet used spmd can survive losing workers.

Sign in to comment.


Edric Ellis
Edric Ellis on 7 Mar 2023
You can shut down all remaining workers of the currently running pool by executing:
delete(gcp('nocreate'))
There should be no running workers other than in the current pool.

Felix
Felix on 8 Mar 2023
  1. I'm using local pools on my machine with default settings. On my machine this defaults to 12 workers.
  2. So far, I'm using parfor and the run command with MultiStart problems. I'll sometimes start a pool before running a script via parpool to reduce runtime of that script.
A simple, somewhat pseudocode example of my monte carlo stuff might be:
relevant_input = randn(1000, 1);
relevant_output = nan(height(relevant_input), 1);
param = 10;
parpool;
my_fun = @(input) elaborate_function(par, relevant_input);
parfor h=1:height(relevant_input)
relevant_ouput(h,1) = my_fun(input);
end
function y = elaborate_function(par, x)
y = param*x.*sin(x);
end
Another use case is the MultiStart object with
ms = MultiStart('UseParallel', true, 'Display','iter');
, which I use with run.
My scripts sometimes crash and I have trouble restarting them, because some workers do not seem to clear their memory when they crash. When I try to restart I get warnings such as:
Starting parallel pool (parpool) using the 'Processes' profile ...
Preserving jobs with IDs: 10 12 13 because they contain crash dump files.
You can use 'delete(myCluster.Jobs)' to remove all jobs created with profile Processes. To create 'myCluster' use 'myCluster = parcluster('Processes')'.
However, these crash dump files and the preserved jobs hog up way too much memory on my machine. I am looking for a couple lines of code to put at the start of my scripts that search running jobs, such as the ones containing crash dump files and terminate them if they exist, so I don't have to type delete(myCluster.Jobs) every time myself.
  1 Comment
Raymond Norris
Raymond Norris on 14 Mar 2023
I'm confused how the crash dump files and preserverd jobs how up too much memory. Do you mean disk space?
If a job is running, I'm not sure there would be a crash dump file (untill the end). And do you want to delete the crash file or the job? If you're running a parallel pool and the pool crashes, there's no job to delete.

Sign in to comment.

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!