Resolve Communication Issues in MATLAB Job Scheduler Cluster
Issue
To maintain a functional cluster setup, the MATLAB® Job Scheduler job manager must resolve the host name that the MATLAB worker nodes advertise. Similarly, all MATLAB workers and clients must resolve the host name that the job manager node advertises.
If a worker cannot connect to its job manager, or a client session cannot validate a profile that uses that scheduler, then this indicates communication problems between the cluster nodes.
Possible Solutions
Investigate With Command-Line Interface
Make sure that all nodes agree on their IP resolutions. Verify that a node's host name
is consistent from both its own perspective and another node's perspective. For example,
if a process on nodeB cannot connect to one on
nodeA, check nodeA's host name both locally and from
nodeB. The host names must match.
If the nodes can identify each other, then you can diagnose problems between their
processes by using the nodestatus command. Use this command to
determine what MATLAB
Parallel Server™ processes are running on the local host, and which are accessible from
remote hosts.
For example, if a worker on nodeA cannot register with its job
manager on nodeB, run the same nodestatus command on
both nodes to verify that nodeA can correctly identify
nodeB and all of the MATLAB
Parallel Server processes on it. On nodeB, in a Linux® or Windows® command window, run the following commands. matlabroot is
the MATLAB installation
folder.
cd matlabroot\toolbox\parallel\bin
nodestatus -remotehost nodeBnodeA, run the
same commands.cd matlabroot\toolbox\parallel\bin
nodestatus -remotehost nodeBnodestatus command again on hostA with a higher
information level to receive more detailed information about the processes running on
nodeB from nodeA's
perspective.nodestatus -remotehost nodeB -infolevel 3Investigate With Admin Center Tool
You can diagnose some communication problems using the Admin Center tool. To learn more about the Admin Center tool, see Start Admin Center.
If you cannot add a node to the Admin Center listing by specifying its host name, you can use its IP address instead. For more information, see Add Hosts.
To test communications between your MATLAB Job Scheduler node, your worker nodes, and the node where Admin Center runs, in the Admin Center Tool, select Test Connectivity to open the Connectivity Testing dialog box. These tests verify that nodes can identify each other and allow their processes to communicate. If any of the communication tests fail, Admin Center provides information that you can use to diagnose the problem. For more information, see Test MATLAB Job Scheduler Cluster Connectivity in Admin Center.