Environment: Red Hat Enterprise Linux 6.0
I am trying to get the mdce service on worker nodes to communicate with the mdce service on the head node. When running the admincenter on ANY node, I can connect to everyone, but can only see the number of cores on the node which I am currently on. Further, the MDCE Status reads "running" on the current node and "unavailable" on all other nodes. When I attempt to start the MDCE Service I receive
"Error on machine cerebro: The MATLAB Distributed Computing Server is already running. Use nodestatus to obtain more information."
Because, obviously, the service is running from a previous attempt. Stopping and restarting the services does not help.
When I run
nodestatus -remotehost <currentnode>
everything looks fine. When I run
nodestatus -remotehost <anyothernode>
I receive a series of java exceptions that ends with
"java.net.NoRouteToHostException: No route to host"
The lack of connectivity with nodestatus and the GUI occurs whether I use the computer aliases or the local IP addresses or the remote IP addresses.
I have confirmed that all nodes can communicate with each other using ping and traceroute. In addition, I have confirmed that ports 27350 through 27355 are open on all nodes.
All services are being run as root.
In the Admin Center, there is a "Test Connectivity" test under the "Hosts" menu. Does that come back clean?
It sounds like there is something missing in name resolution:
You can also pass the "-infolevel 2" flag to the nodestatus command. It will tell you the ports that the job manager and workers are using.
You might want to try turning off the firewalls temporarily to see if the port range is too restrictive or something else is "off". I've definitely encountered a fat-finger issue with iptables that caused issues similar to what you are seeing.