Path: news.mathworks.com!not-for-mail
From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Re: parfor error message
Date: Mon, 9 Nov 2009 13:33:01 +0000 (UTC)
Organization: RMIT
Lines: 34
Message-ID: <hd95md$3ta$1@fred.mathworks.com>
References: <hd0qcm$168$1@fred.mathworks.com> <ytwaayzg4tl.fsf@uk-eellis-deb5-64.mathworks.co.uk>
Reply-To: <HIDDEN>
NNTP-Posting-Host: webapp-02-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1257773581 4010 172.30.248.37 (9 Nov 2009 13:33:01 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Mon, 9 Nov 2009 13:33:01 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 1372013
Xref: news.mathworks.com comp.soft-sys.matlab:583535


Edric M Ellis <eellis@mathworks.com> wrote in message <ytwaayzg4tl.fsf@uk-eellis-deb5-64.mathworks.co.uk>...
> "Mr. CFD" <s2108860@student.rmit.edu.au> writes:
> 
> > I have a simulation running on a cluster using the parfor command. The
> > simulation has previously run successfully with no problems, but today it was
> > terminated mid-way with the following error:
> >
> > "The session that parfor is using has shut down"
> >
> > Upon further inspection, I have traced the parfor statement within the code
> > which the error is referring to. I'm at a complete loss to explain the cause
> > of this error. Especially, since the code has been used successfully in
> > previous occasions, I have no idea how to deal with this problem. What can
> > cause the parfor command to &#8216;shut-down&#8217;? Any advice please.
> 
> Are you using an interactive MATLABPOOL (i.e. calling "matlabpool open ..." in
> your desktop MATLAB session)?
> 
> That error message literally means that the pool has been closed unexpectedly -
> the connection to the workers simply disappeared. This could happen if a worker
> crashed for example.
> 
> What sort of cluster are you running on? (Many clusters have various resource
> usage limits after which they terminate jobs - maybe you're hitting one of
> those?)
> 
> Cheers,
> 
> Edric.

Hi Edric,
The job is run on an external supercomputing cluster. Parfor commands are used and the 'createMatlabPoolJob' scheduler is applied. So each index of the parfor command is simulated by 'n' CPUs to accelerate computation.
I have run this same simulation on many occasions with no problems thus, for this error to shown an appearance now is strange and difficult to reason. Do you have any suggestions of what could be happening and how it can be avoided please?
Thanks