These connection errors may be caused by the server running the jobmanager not having a resolvable hostname. This is most often due to the fact that MATLAB Parallel Server machines are not part of the domain and hence do not have a fully qualified domain name. This causes the cluster to be unresolvable by the Distributed Computing Toolbox. All machines (most importantly, the machine which contains the jobmanager) must be domain members and have resolvable names recorded in the network's DNS.
Below, you will find some basic troubleshooting steps to determine the causes of this connection problem:
1. Verify that you are using the correct command to find the jobmanager.
a. Find any job manager available:
jm = findResource('jobmanager')
b. Find a specific named job manager:
jm = findResource('jobmanager', 'name', 'MyJobManager')
c. Use a UNICAST call to the job manager machine:
jm = findResource('jobmanager', 'LookupURL', 'hostname or IPaddress')
- If there is a connection error between the client and the job manager machine, you will likely see a "handle 0 by 1" message returned. If this occurs, check the jobmanager name and hostname.
2. Check basic networking connections between the client and job manager node.
- From the client machine, verify if you can ping, traceroute and nslookup to the job manager node and vice-versa. The client machine must know the name of the job manager machine via DNS or some type of lookup service. If this fails, there is a networking problem that must be resolved by your IT staff.
3. Check the hostnames of the jobmanager and client machines using a MATLAB session. If any of these tests fail, refer to your system administrator resolve the network issue.
-On both the client and the jobmanager machine what does the following MATLAB command return:
This should be the short names but might not be (some PC's think their name is different than the DNS name). For example:
4. Check to make sure that a firewall program is not blocking the ports MATLAB Parallel Server uses. By default MATLAB Parallel Server uses ports 27350 through 27350 + N (where N is the number of worker nodes in your cluster). If it is, refer to the attached solution below.
Also, check for other firewalled sockets on the jobmanager.
- Run the following command as root:
At this point, if you need further assistance, send the following information to
- Operating System information
- A brief explanation of the issue and how to reproduce it.
- Log files from the jobmanager machine. By default these are located in /var/log/mdce on UNIX/MAC machines or in $TEMP/MDCE/Log on Windows machines.
NOTE: Starting in R2019a the following name changes occurred: - MATLAB Distributed Computing Server was renamed to MATLAB Parallel Server
- mdce_def was renamed to mjs_def
- mdce binary was renamed to mjs