How do I reset a MATLAB Distributed Computing Engine worker session that appears to be hung?

1 view (last 30 days)
There are two possible phases in which an MDCE worker session can hang:
- When I start a worker with the startworker command, and the system does not return the prompt to me, the worker could be hung. (If this is the case, skip step 1 in the solution.)
- If a task (and therefore its job) appears to be stuck in the running state, or if a task times out, it could be because of a hung worker session.

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 27 Jun 2009
The solutions to clearing a hung worker are presented here in the order of safest to most drastic. You should try them in the suggested order, testing after each step to see if the problem is cleared.
1. If a job is stuck in the running state because one of its tasks is stuck running on a hung worker, you can try destroying the job from the client MATLAB session by using the destroy function. Submit another job and see if the worker in question now properly evaluates its tasks.
2. Use the stopworker command on the worker node to end the worker session. Restart the worker session with the command startworker -clean.
3. Shut down all MDCE services on the worker node with the command mdce stop. Note that this will shut down all worker and job manager sessions on the node. Restart all sessions accordingly.
4. If MDCE stop does not return a prompt, then as a last resort you can delete the worker's checkpoint directories and reboot the node to restart its MDCE sessions.

More Answers (0)

Categories

Find more on MATLAB Parallel Server in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!