Connecting to MDCE Services on Worker Nodes from Head Node with MDCS

3 views (last 30 days)
Environment: Red Hat Enterprise Linux 6.0
I am trying to get the mdce service on worker nodes to communicate with the mdce service on the head node. When running the admincenter on ANY node, I can connect to everyone, but can only see the number of cores on the node which I am currently on. Further, the MDCE Status reads "running" on the current node and "unavailable" on all other nodes. When I attempt to start the MDCE Service I receive
"Error on machine cerebro:
The MATLAB Distributed Computing Server is already running.
Use nodestatus to obtain more information."
Because, obviously, the service is running from a previous attempt. Stopping and restarting the services does not help.
When I run
nodestatus -remotehost <currentnode>
everything looks fine. When I run
nodestatus -remotehost <anyothernode>
I receive a series of java exceptions that ends with
"java.net.NoRouteToHostException: No route to host"
The lack of connectivity with nodestatus and the GUI occurs whether I use the computer aliases or the local IP addresses or the remote IP addresses.
I have confirmed that all nodes can communicate with each other using ping and traceroute. In addition, I have confirmed that ports 27350 through 27355 are open on all nodes.
All services are being run as root.
  3 Comments
Jason Ross
Jason Ross on 12 Apr 2012
On the worker, block all ports and punch holes on ports 27350-27357 from the jobmanager.
On the clients, block all ports and punch holes on ports 27370-27375 from the jobmanager.
On the jobmanager, block all ports and punch holes on ports 27350-27355 from all workers, also block all ports and punch holes on ports 27350-27355 from the clients.
Generally the iptables command looks something like this:
iptables -A INPUT -p tcp --source source.hostname.here --dport ! 27370:27375 --syn -j REJECT --reject-with icmp-host-prohibited
Keep in mind that the above is only an example -- you'll likely need to tailor this to your own environment.
Alberto
Alberto on 13 Mar 2014
Hi
Can you be a bit more detailed in what should I modify in the iptables? I get the same error.

Sign in to comment.

Accepted Answer

Jason Ross
Jason Ross on 10 Apr 2012
In the Admin Center, there is a "Test Connectivity" test under the "Hosts" menu. Does that come back clean?
It sounds like there is something missing in name resolution:
  • host resolving its own name
  • forward lookup
  • reverse lookup
You can also pass the "-infolevel 2" flag to the nodestatus command. It will tell you the ports that the job manager and workers are using.
You might want to try turning off the firewalls temporarily to see if the port range is too restrictive or something else is "off". I've definitely encountered a fat-finger issue with iptables that caused issues similar to what you are seeing.

More Answers (0)

Categories

Find more on Cluster Configuration in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!