| Parallel Computing Toolbox™ | ![]() |
| On this page… |
|---|
The size limit of data transfers among the parallel computing objects is limited by the Java™ Virtual Machine (JVM™) memory allocation. This limit applies to single transfers of data between client and workers in any job using a job manager as a scheduler, or in any parfor-loop. The approximate size limitation depends on your system architecture:
| System Architecture | Maximum Data Size Per Transfer (approx.) |
|---|---|
| 64-bit | 2.0 GB |
| 32-bit | 600 MB |
By default, a worker on a Windows® operating system is installed as a service running as LocalSystem, so it does not have access to mapped network drives.
Often a network is configured to not allow services running as LocalSystem to access UNC or mapped network shares. In this case, you must run the mdce service under a different user with rights to log on as a service. See the section Setting the User in the MATLAB® Distributed Computing Server™ System Administrator's Guide.
If a worker cannot find the task function, it returns the error message
Error using ==> feval
Undefined command/function 'function_name'.
The worker that ran the task did not have access to the function function_name. One solution is to make sure the location of the function's file, function_name.m, is included in the job's PathDependencies property. Another solution is to transfer the function file to the worker by adding function_name.m to the FileDependencies property of the job.
If a worker cannot save or load a file, you might see the error messages
??? Error using ==> save Unable to write file myfile.mat: permission denied. ??? Error using ==> load Unable to read file myfile.mat: No such file or directory.
In determining the cause of this error, consider the following questions:
What is the worker's current directory?
Can the worker find the file or directory?
What user is the worker running as?
Does the worker have permission to read or write the file in question?
A job or task might get stuck in the queued state. To investigate the cause of this problem, look for the scheduler's logs:
Platform LSF® schedulers might send e-mails with error messages.
Windows Compute Cluster Server (CCS),LSF®, PBS Pro®, TORQUE, and mpiexec save output messages in a debug log. See the getDebugLog reference page.
If using a generic scheduler, make sure the submit function redirects error messages to a log file.
Possible causes of the problem are
The MATLAB® worker failed to start due to licensing errors, the executable is not on the default path on the worker machine, or is not installed in the location where the scheduler expected it to be.
MATLAB could not read/write the job input/output files in the scheduler's data location. The data location may not be accessible to all the worker nodes, or the user that MATLAB runs as does not have permission to read/write the job files.
If using a generic scheduler
The environment variable MDCE_DECODE_FUNCTION was not defined before the MATLAB worker started.
The decode function was not on the worker's path.
If using mpiexec
The passphrase to smpd was incorrect or missing.
The smpd daemon was not running on all the specified machines.
If your job returned no results (i.e., getAllOutputArguments(job) returns an empty cell array), it is probable that the job failed and some of its tasks have their ErrorMessage and ErrorIdentifier properties set.
You can use the following code to identify tasks with error messages:
errmsgs = get(yourjob.Tasks, {'ErrorMessage'});
nonempty = ~cellfun(@isempty, errmsgs);
celldisp(errmsgs(nonempty));
This code displays the nonempty error messages of the tasks found in the job object yourjob.
If you are using a supported third-party scheduler, you can use the getDebugLog function to read the debug log from the scheduler for a particular job or task.
For example, find the failed job on your LSF scheduler, and read its debug log.
sched = findResource('scheduler', 'type', 'lsf')
failedjob = findJob(sched, 'State', 'failed');
message = getDebugLog(sched, failedjob(1))Detailed instructions for diagnosing connection problems between the client and job manager can be found in some of the Bug Reports listed on the MathWorks Web site. The following sections can help you identify the general nature of some connection problems.
If you cannot locate your job manager with
findResource('scheduler','type','jobmanager')the most likely reasons for this failure are
The client cannot contact the job manager host via multicast. Try to fully specify where to look for the job manager by using the LookupURL property in your call to findResource:
findResource('scheduler','type','jobmanager', ...
'LookupURL','JobMgrHostName')The job manager is currently not running.
Firewalls do not allow traffic from the client to the job manager.
The client and the job manager are not running the same version of the software.
The client and the job manager cannot resolve each other's short hostnames.
If findResource displays a warning message that the job manager cannot open a TCP connection to the client computer, the most likely reasons for this are
Firewalls do not allow traffic from the job manager to the client.
The job manager cannot resolve the short hostname of the client computer. Use pctconfig to change the hostname that the job manager will use for contacting the client.
![]() | Using the Parallel Profiler | Parallel for-Loops (parfor) | ![]() |
| © 1984-2008- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |