MATLAB Answers

Kevin
1

System call fails from MATLAB worker

Asked by Kevin
on 9 Feb 2018
Latest activity Commented on by Kevin
on 12 Feb 2018
I am encountering some difficulty with MATLAB workers in R2017a.
It works perfectly unless the MATLAB code attempts to make a system call. Then it fails with "Unexpected system error: bang: poll [4] Interrupted system call".
For example, if I submit the following:
>> job = sge_cluster.createJob();
>> job.createTask(@() system(''), 0);
>> job.submit();
Then the job runs but produces the following error:
>> disp(job.Tasks(1).Error)
ParallelException with properties:
identifier: 'MATLAB:bang:SystemError'
message: 'Unexpected system error: bang: poll [4] Interrupted system call'
cause: {}
remotecause: {[1×1 MException]}
stack: [1×1 struct]
The cluster and job submission work fine as long as I don't make any system calls. Also, everything works fine with an older version of MATLAB (R2014b). The cluster is mostly RHEL 6.9 (some 7.4).
EDIT: I should clarify that sge_cluster is a parallel.cluster.Generic that submits jobs to a Sun Grid Engine scheduler. If I run the same job on parcluster('local'), the system call works just fine.
I guess I'm not the only one encountering this problem: 359992-system-call-bizarre-behavior, but it's not clear to me how to apply that answer.

  0 Comments

Sign in to comment.

1 Answer

Answer by Shashank on 12 Feb 2018

Hi Kevin,
The solution mentioned in the System call bizarre behavior link should work for you.
Sourcing the bash_profile file means that you should execute the following command in the terminal prior to calling MATLAB:
source ~/.bash_profile
or as mentioned in the example there you can specify the shell name explicitly while ssh.
Hope this helps.
-Shashank

  1 Comment

Hi Shashank,
Including the bash_profile didn't change anything for me. I also tried including the "-nodisplay -nodesktop -noFigureWindows" flags, as mentioned in that answer.
Here's is my updated independentJobWrapper.sh:
#!/bin/sh
# This wrapper script is intended to support independent execution.
#
# This script uses the following environment variables set by the submit MATLAB code:
# MDCE_MATLAB_EXE - the MATLAB executable to use
# MDCE_MATLAB_ARGS - the MATLAB args to use
#
# Copyright 2010-2011 The MathWorks, Inc.
echo "Sourcing the bash profile"
source ~/.bash_profile
echo "Executing: ${MDCE_MATLAB_EXE} ${MDCE_MATLAB_ARGS}"
exec "${MDCE_MATLAB_EXE}" ${MDCE_MATLAB_ARGS}
and the log output:
Sourcing the bash profile
Executing: /sw/matlab/R2017a/bin/worker -nodisplay -nodesktop -noFigureWindows
< M A T L A B (R) >
Copyright 1984-2017 The MathWorks, Inc.
R2017a (9.2.0.538062) 64-bit (glnxa64)
February 23, 2017
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
2018-02-12 10:53:42 | About to evaluate task with DistcompEvaluateFileTask
2018-02-12 10:53:42 | Enter distcomp_evaluate_filetask_core
2018-02-12 10:53:42 | Enter distcomp_evaluate_filetask_core/iSetup
2018-02-12 10:53:42 | This process will exit on any fault.
2018-02-12 10:53:42 | This process will exit when its parent process dies.
2018-02-12 10:53:42 | About to call decode function.
2018-02-12 10:53:42 | In parallel.cluster.generic.independentDecodeFcn
2018-02-12 10:53:44 | Setting the desktop client to a new client with username
2018-02-12 10:53:44 | About to construct the storage object using constructor "makeFileStorageObject" and location "PC{}:UNIX{/matlabjobs}:"
2018-02-12 10:53:47 | About to find job and task using locations "Job2" and "Job2/Task1"
2018-02-12 10:53:49 | Setting the TaskEvaluator to the NullEvaluator
2018-02-12 10:53:49 | Setting number of computational threads to 1.
2018-02-12 10:53:49 | MATLAB Drive Enabled 0
2018-02-12 10:53:49 | Completed pre-execution phase
2018-02-12 10:53:49 | About to pPreJobEvaluate
2018-02-12 10:53:51 | About to pPreTaskEvaluate
2018-02-12 10:53:51 | About to add job dependencies
2018-02-12 10:53:51 | > JobPathHelper.addAdditionalPaths
2018-02-12 10:53:51 | > JobPathHelper.getPathsToAdd
2018-02-12 10:53:51 | < JobPathHelper.getPathsToAdd ~isMATLABDriveEnabledOnWorker
2018-02-12 10:53:51 | Not adding path dependencies as there is no change required to the path.
2018-02-12 10:53:51 | < JobPathHelper.addAdditionalPaths
2018-02-12 10:53:51 | Calling clear('functions'), and closing simulink models
2018-02-12 10:53:52 | About to call jobStartup
2018-02-12 10:53:52 | About to call taskStartup
2018-02-12 10:53:52 | About to get evaluation data
2018-02-12 10:53:52 | Begin task function
2018-02-12 10:53:53 | End task function
2018-02-12 10:53:53 | dctEvaluateFunctionArray calling: @()taskPostFcn(runprop.TaskEvaluator) with args
2018-02-12 10:53:53 | About to call taskFinish
2018-02-12 10:53:53 | dctEvaluateFunctionArray done.
2018-02-12 10:53:53 | dctEvaluateFunctionArray calling: iFinishTask with args
2018-02-12 10:53:53 | About to call pPostTaskEvaluate
2018-02-12 10:53:54 | About to call pPostJobEvaluate
2018-02-12 10:53:54 | dctEvaluateFunctionArray done.
2018-02-12 10:53:54 | dctEvaluateFunctionArray calling: removeDirectory with args
2018-02-12 10:53:54 | dctEvaluateFunctionArray done.
2018-02-12 10:53:54 | dctEvaluateFunctionArray calling: removeDirectory with args
2018-02-12 10:53:54 | dctEvaluateFunctionArray done.
2018-02-12 10:53:54 | dctEvaluateFunctionArray calling: iExitFunction with args
2018-02-12 10:53:54 | About to exit MATLAB normally
2018-02-12 10:53:54 | About to exit with code: 0
Perhaps you could explain what the "poll [4] Interrupted system call" error means, or how the behavior in R2017a has changed from previous releases? That might help me debug things from my end.

Sign in to comment.