Main Content

Resolve Communication Issues in MATLAB Job Scheduler Cluster

Issue

To maintain a functional cluster setup, the MATLAB® Job Scheduler job manager must resolve the host name that the MATLAB worker nodes advertise. Similarly, all MATLAB workers and clients must resolve the host name that the job manager node advertises.

If a worker cannot connect to its job manager, or a client session cannot validate a profile that uses that scheduler, then this indicates communication problems between the cluster nodes.

Possible Solutions

Investigate With Command-Line Interface

Make sure that all nodes agree on their IP resolutions. Verify that a node's host name is consistent from both its own perspective and another node's perspective. For example, if a process on nodeB cannot connect to one on nodeA, check nodeA's host name both locally and from nodeB. The host names must match.

If the nodes can identify each other, then you can diagnose problems between their processes by using the nodestatus command. Use this command to determine what MATLAB Parallel Server™ processes are running on the local host, and which are accessible from remote hosts.

For example, if a worker on nodeA cannot register with its job manager on nodeB, run the same nodestatus command on both nodes to verify that nodeA can correctly identify nodeB and all of the MATLAB Parallel Server processes on it. On nodeB, in a Linux® or Windows® command window, run the following commands. matlabroot is the MATLAB installation folder.

cd matlabroot\toolbox\parallel\bin
nodestatus -remotehost nodeB
Then, on nodeA, run the same commands.
cd matlabroot\toolbox\parallel\bin
nodestatus -remotehost nodeB
The outputs of the two sets of commands should match, listing the same job managers and workers. If the outputs do not match, run the nodestatus command again on hostA with a higher information level to receive more detailed information about the processes running on nodeB from nodeA's perspective.
nodestatus -remotehost nodeB -infolevel 3

Investigate With Admin Center Tool

You can diagnose some communication problems using the Admin Center tool. To learn more about the Admin Center tool, see Start Admin Center.

If you cannot add a node to the Admin Center listing by specifying its host name, you can use its IP address instead. For more information, see Add Hosts.

To test communications between your MATLAB Job Scheduler node, your worker nodes, and the node where Admin Center runs, in the Admin Center Tool, select Test Connectivity to open the Connectivity Testing dialog box. These tests verify that nodes can identify each other and allow their processes to communicate. If any of the communication tests fail, Admin Center provides information that you can use to diagnose the problem. For more information, see Test MATLAB Job Scheduler Cluster Connectivity in Admin Center.

See Also

Topics