Path: news.mathworks.com!not-for-mail
From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Re: parfor error message
Date: Wed, 11 Nov 2009 07:52:04 +0000 (UTC)
Organization: RMIT
Lines: 112
Message-ID: <hddqf4$or3$1@fred.mathworks.com>
References: <hd0qcm$168$1@fred.mathworks.com> <ytwaayzg4tl.fsf@uk-eellis-deb5-64.mathworks.co.uk> <hd95md$3ta$1@fred.mathworks.com> <ytwbpjbepxb.fsf@uk-eellis-deb5-64.mathworks.co.uk> <hdbohl$95q$1@fred.mathworks.com> <ytwy6mecvlu.fsf@uk-eellis-deb5-64.mathworks.co.uk>
Reply-To: <HIDDEN>
NNTP-Posting-Host: webapp-05-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1257925924 25443 172.30.248.35 (11 Nov 2009 07:52:04 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Wed, 11 Nov 2009 07:52:04 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 1372013
Xref: news.mathworks.com comp.soft-sys.matlab:584090


Edric M Ellis <eellis@mathworks.com> wrote in message <ytwy6mecvlu.fsf@uk-eellis-deb5-64.mathworks.co.uk>...
> "Mr. CFD" <s2108860@student.rmit.edu.au> writes:
> 
> > b) The most common error which I have noticed and this most definitely indicates
> > an issue within the code itself is:
> >
> > Error in ==> mysimulation>(parfor body factory) at 183 Undefined variable "out"
> > or class "out.x".
> >
> > The line 183, in simple terms is defined as follows: parfor ii=1:n
> > [a,b,c]=dosomething(myoutputs.x) ....  ....  end
> >
> > This error is the most frustrating! Can you please advise how this is fixed?
> > Also this error appears at random, therefore I&#8217;m finding it hard to get a
> > fix on this.
> 
> Is the "myoutputs" structure defined inside or outside the parfor loop? 
> 
> I'm struggling to see how the error refers to "out" when the code refers to
> "myoutputs". I'm also somewhat confused as to how that error message could show
> up only sometimes. I'm even more confused why an "undefined variable" message
> could appear sporadically. The only variability that one might expect when
> running a PARFOR loop is in the way the loop iterations are divided among the
> workers. 
> 
> I wonder if perhaps the workers are running out of memory, and that's causing
> weirdness. Is there any facility to track worker memory usage while you're
> running this stuff? (Resource exhaustion just might also explain the hard
> crashes as well as the strange errors that you're seeing).
> 
> Cheers,
> 
> Edric.


Hi Edric,
First of all many thanks for your feedback. I appreciate your thoughts on this rather annoying error. I have been digging around to get some more info which could explain why we have this issue:

Error in ==> mysimulation>(parfor body factory) at 183
Undefined variable "out" or class "out.x".

The line 183, in simple terms is defined as follows: 
parfor ii=1:n
[a,b,c]=dosomething(myoutputs.x(ii,:)).
...
....  
end

> Is the "myoutputs" structure defined inside or outside the parfor loop?
myoutputs.x is defined outside the parfor loop

The errstack (8 by 1 struct array) from the catch statement provides some vital information:
erroutputs.errstack(1,1): 
file: '/usr/local/matlab/R2008b/toolbox/matlab/lang/parallel_function.m'
name: 'parallel_function'
line: 587

erroutputs.errstack(2,1):
Reports the actual error [Error in ==> mysimulation>(parfor body factory) at 183]

erroutputs.errstack(3,1):
file: /usr/local/matlab/R2008b/toolbox/distcomp/private/dctEvaluateFunction.m
name: 'iEvaluateWithNoErrors'
line: 21

erroutputs.errstack(4,1):
file: /usr/local/matlab/R2008b/toolbox/distcomp/private/dctEvaluateFunction.m
name: 'dctEvaluateFunction'
liine: 7

erroutputs.errstack(5,1):
file: '/usr/local/matlab/R2008b/toolbox/distcomp/private/dctEvaluateTask.m'
name: 'iEvaluateTask'
line: 95

erroutputs.errstack(6,1):
file: '/usr/local/matlab/R2008b/toolbox/distcomp/private/dctEvaluateTask.m'
 name: 'dctEvaluateTask'
line: 18

erroutputs.errstack(7,1):
file: /usr/local/matlab/R2008b/toolbox/distcomp/distcomp_evaluate_filetask.m
name: 'iDoTask'
line: 106

> Interrogating this error further we can see that line 106 is within the following commands:
=====================================================
try
    % If dctEvaluateTask throws an error then something went wrong in DCT
    % code not user code - and we need to exit the worker, not continue
    [output, errOutput, textOutput] = dctEvaluateTask(job, task, runprop);
    % Package up the output into a structure to pass around easily
    out = struct('output', {output}, 'errOutput', {errOutput}, 'textOutput', {textOutput});
catch e
    handlers.errorFcn(e, 'Unexpected error while evaluating task - MATLAB will now exit.');
end
=====================================================
Here we see the statement "If dctEvaluateTask throws an error then something went wrong in DCT code not user code - and we need to exit the worker, not continue"
This could further support your initial guess; maybe an issue in the cluster itself!

erroutputs.errstack(8,1):
file: /usr/local/matlab/R2008b/toolbox/distcomp/distcomp_evaluate_filetask.m
name: 'distcomp_evaluate_filetask'
line: 32

Alot of information here: I'm also confused as to why the error refers to "out" when the code refers to "myoutputs" and why the "undefined variable" message would appear on random:

I have been in touch with the administrators running the cluster. The memory used for this failed simulation was within the allocated resources, so they don&#8217;t feel memory resources could be a problem. In any case, I can track worker memory usage for future tasks, but haven&#8217;t re-started the simulation, since we don&#8217;t have a fix on this error as yet.

Hope this information will provide some answers.

Thanks