crash/infinite wait in parallel computing (parpool)

Hi,
I got some problems with parallel computing (matlab R2015b 64bits windows). Of course the code is the same in both cases
  1. On a 8 CPUs machine (Win10 - 96GB RAM) , extended to 32 virtualized CPUs: If I run the script with more than 8 workers, matlab seems lost: it waits...I have to stop the run (Ctrl C). If I run the script with 8 workers (or less), it seems to work (but randomly there is a crash - see below) but the RAM is not unallocated after the run of the worker (even after the run, I have to reboot the machine to free the RAM)
  2. On a 32 CPUs machine (Win Server2012 R2 - 128GB RAM), it seems ok but randomly it failed, Matlab doesn't wait for the end of time as #1 but it crashes, the log is below. There is no RAM problem
Any idea about that ?
Thanks in advance for any advice
MainParallel line #257: standard call : poolobj = parpool('local',NbCores); %NbCores has no effect on the crash
.
--- Log --- Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers. Parallel pool using the 'local' profile is shutting down. Elapsed time is 60.285457 seconds. Starting parallel pool (parpool) using the 'local' profile ... Error using parpool (line 103) Failed to start a parallel pool.
Error in MainParallel (line 257)
Caused by: Error using parallel.internal.pool.InteractiveClient>iThrowWithCause (line 667) Failed to initialize the interactive session. Error using parallel.internal.pool.InteractiveClient>iThrowIfBadParallelJobStatus (line 768) The interactive communicating job failed with no message.
parallel:cluster:PoolRunValidation

1 Comment

Does the code run consistently without crashes in a normal for loop?

Sign in to comment.

Answers (1)

As far as I know, we didn't encounter a crash in normal loop (on part of the full set of data). The calculation takes a lot of time (full run takes more than 12 hours on 20 workers) so there was not "a lot" of trials concerning the full job for one process.

Categories

Products

Asked:

on 20 Jul 2016

Answered:

on 20 Jul 2016

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!