How to restart a worker in parpool?

17 views (last 30 days)
I want to run a parfor loop which looks like this:
parfor i=1:10000
my_func_output = my_function;
The problem with my_function is, that it sometimes terminates, sometimes however doesn't and keeps running for an indefinite time. If ther function does not terminate within one minute, chances are it won't terminate at all. The function is from third-party software which I cannot edit.
I need to run the function 500 times and I have 10 workers at my disposal. However, when I start the loop, eventually all the workers get cluttered with functions which do not terminate and I do not get my results.
My idea was to run the parfor loop above, i.e. run the function on each of my 10 workers in parallel. If the worker should need more than one minute for the function, it should restart and start running the function again. Or in this case, just execute continue and move onto the next iteration of the parfor loop.
Is there a way to do this in Matlab?

Accepted Answer

Edric Ellis
Edric Ellis on 11 Feb 2020
You can't easily do this with parfor, but you can do something like this with parfeval. I haven't tried too hard here to make things efficient, but it might be OK. (To make things more efficient, you would need to be careful about which futures you were checking the State of - that can be slow if you have 10000 of them).
The pattern is this:
  1. Create a "future" using parfeval to an iteration of your function
  2. While waiting for all the futures to complete, check the State of the futures
  3. For the futures in State 'running', check they haven't been running too long
  4. If they have, cancel them (and track the fact that they've been cancelled)
  5. Finally, retrieve the results from the non-cancelled futures.
%% Create parallel pool. My machine has only 6 cores.
parpool('local', 6);
%% Set up 100 iterations of 'randomPause'
% Each pauses for anything from 0 to 20 seconds.
for idx = 100:-1:1
futures(idx) = parfeval(@randomPause, 1);
%% Loop waiting for completion.
% Look for running futures, cancel any that have been running for
% more than 10 seconds.
isDone = false;
isCancelled = false(1, numel(futures));
while ~isDone
% Look for running futures
runningIdx = find(strcmp({futures.State}, 'running'));
for idx = 1:numel(runningIdx)
% For each running future, work out how long it has been running
thisFuture = futures(runningIdx(idx));
tnow = datetime; tnow.TimeZone = 'local';
runningFor = seconds(tnow - thisFuture.StartDateTime);
% If running too long, cancel, and remember that we cancelled
if runningFor > 10
fprintf('Cancelling: %d\n', runningIdx(idx));
isCancelled(runningIdx(idx)) = true;
% Check for overall completion using 'wait'
isDone = wait(futures, 'finished', 0);
%% Extract results from *only* non-cancelled futures.
results = NaN(numel(futures), 1);
results(~isCancelled) = fetchOutputs(futures(~isCancelled));
%% A pause between 0 and 20 seconds, returns the amount paused.
function out = randomPause()
out = randi([0,20]);

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!