|
Edric M Ellis <eellis@mathworks.com> wrote in message <ytwy6mecvlu.fsf@uk-eellis-deb5-64.mathworks.co.uk>...
> "Mr. CFD" <s2108860@student.rmit.edu.au> writes:
>
> > b) The most common error which I have noticed and this most definitely indicates
> > an issue within the code itself is:
> >
> > Error in ==> mysimulation>(parfor body factory) at 183 Undefined variable "out"
> > or class "out.x".
> >
> > The line 183, in simple terms is defined as follows: parfor ii=1:n
> > [a,b,c]=dosomething(myoutputs.x) .... .... end
> >
> > This error is the most frustrating! Can you please advise how this is fixed?
> > Also this error appears at random, therefore I’m finding it hard to get a
> > fix on this.
>
> Is the "myoutputs" structure defined inside or outside the parfor loop?
>
> I'm struggling to see how the error refers to "out" when the code refers to
> "myoutputs". I'm also somewhat confused as to how that error message could show
> up only sometimes. I'm even more confused why an "undefined variable" message
> could appear sporadically. The only variability that one might expect when
> running a PARFOR loop is in the way the loop iterations are divided among the
> workers.
>
> I wonder if perhaps the workers are running out of memory, and that's causing
> weirdness. Is there any facility to track worker memory usage while you're
> running this stuff? (Resource exhaustion just might also explain the hard
> crashes as well as the strange errors that you're seeing).
>
> Cheers,
>
> Edric.
Hi Edric,
First of all many thanks for your feedback. I appreciate your thoughts on this rather annoying error. I have been digging around to get some more info which could explain why we have this issue:
Error in ==> mysimulation>(parfor body factory) at 183
Undefined variable "out" or class "out.x".
The line 183, in simple terms is defined as follows:
parfor ii=1:n
[a,b,c]=dosomething(myoutputs.x(ii,:)).
...
....
end
> Is the "myoutputs" structure defined inside or outside the parfor loop?
myoutputs.x is defined outside the parfor loop
The errstack (8 by 1 struct array) from the catch statement provides some vital information:
erroutputs.errstack(1,1):
file: '/usr/local/matlab/R2008b/toolbox/matlab/lang/parallel_function.m'
name: 'parallel_function'
line: 587
erroutputs.errstack(2,1):
Reports the actual error [Error in ==> mysimulation>(parfor body factory) at 183]
erroutputs.errstack(3,1):
file: /usr/local/matlab/R2008b/toolbox/distcomp/private/dctEvaluateFunction.m
name: 'iEvaluateWithNoErrors'
line: 21
erroutputs.errstack(4,1):
file: /usr/local/matlab/R2008b/toolbox/distcomp/private/dctEvaluateFunction.m
name: 'dctEvaluateFunction'
liine: 7
erroutputs.errstack(5,1):
file: '/usr/local/matlab/R2008b/toolbox/distcomp/private/dctEvaluateTask.m'
name: 'iEvaluateTask'
line: 95
erroutputs.errstack(6,1):
file: '/usr/local/matlab/R2008b/toolbox/distcomp/private/dctEvaluateTask.m'
name: 'dctEvaluateTask'
line: 18
erroutputs.errstack(7,1):
file: /usr/local/matlab/R2008b/toolbox/distcomp/distcomp_evaluate_filetask.m
name: 'iDoTask'
line: 106
> Interrogating this error further we can see that line 106 is within the following commands:
=====================================================
try
% If dctEvaluateTask throws an error then something went wrong in DCT
% code not user code - and we need to exit the worker, not continue
[output, errOutput, textOutput] = dctEvaluateTask(job, task, runprop);
% Package up the output into a structure to pass around easily
out = struct('output', {output}, 'errOutput', {errOutput}, 'textOutput', {textOutput});
catch e
handlers.errorFcn(e, 'Unexpected error while evaluating task - MATLAB will now exit.');
end
=====================================================
Here we see the statement "If dctEvaluateTask throws an error then something went wrong in DCT code not user code - and we need to exit the worker, not continue"
This could further support your initial guess; maybe an issue in the cluster itself!
erroutputs.errstack(8,1):
file: /usr/local/matlab/R2008b/toolbox/distcomp/distcomp_evaluate_filetask.m
name: 'distcomp_evaluate_filetask'
line: 32
Alot of information here: I'm also confused as to why the error refers to "out" when the code refers to "myoutputs" and why the "undefined variable" message would appear on random:
I have been in touch with the administrators running the cluster. The memory used for this failed simulation was within the allocated resources, so they don’t feel memory resources could be a problem. In any case, I can track worker memory usage for future tasks, but haven’t re-started the simulation, since we don’t have a fix on this error as yet.
Hope this information will provide some answers.
Thanks
|