MATLAB Answers

0

Connecting to MDCE Services on Worker Nodes from Head Node with MDCS

Asked by Jonathan Yoke on 10 Apr 2012
Latest activity Commented on by Alberto
on 13 Mar 2014

Environment: Red Hat Enterprise Linux 6.0

I am trying to get the mdce service on worker nodes to communicate with the mdce service on the head node. When running the admincenter on ANY node, I can connect to everyone, but can only see the number of cores on the node which I am currently on. Further, the MDCE Status reads "running" on the current node and "unavailable" on all other nodes. When I attempt to start the MDCE Service I receive

"Error on machine cerebro:
The MATLAB Distributed Computing Server is already running.
Use nodestatus to obtain more information."  

Because, obviously, the service is running from a previous attempt. Stopping and restarting the services does not help.

When I run

nodestatus -remotehost <currentnode>

everything looks fine. When I run

nodestatus -remotehost <anyothernode>

I receive a series of java exceptions that ends with

"java.net.NoRouteToHostException: No route to host"

The lack of connectivity with nodestatus and the GUI occurs whether I use the computer aliases or the local IP addresses or the remote IP addresses.

I have confirmed that all nodes can communicate with each other using ping and traceroute. In addition, I have confirmed that ports 27350 through 27355 are open on all nodes.

All services are being run as root.

  3 Comments

Hey, I solved the above problem (by editing iptables to allow all between nodes on the cluster), but now I am unable to get a client Matlab session to connect to the MJS. I am trying to connect from a Windows box to the above described server.

nodestatus -remotehost <headnode> comes back with all the necessary information. Ports 27350-27354 are open for TCP and port 137 is open for UDP. I can use s = java.net.Socket('<IPaddress>', 27350) while in MATLAB to connect. Ports 2735[1-3] also work, but 27354 and 137 do not.

jm = findResource('jobmanager', 'lookupurl', '<IPaddress>') does NOT work, nor does "Validate" in the Cluster Profle Manager. They return the same error.

Warning: Could not contact an MJS lookup service
on host '<IPaddress>'. etc etc...

Likely this has something to do with the firewall on the head node. How do I configure iptables to allow the PCT to connect to the MDCS?

On the worker, block all ports and punch holes on ports 27350-27357 from the jobmanager.

On the clients, block all ports and punch holes on ports 27370-27375 from the jobmanager.

On the jobmanager, block all ports and punch holes on ports 27350-27355 from all workers, also block all ports and punch holes on ports 27350-27355 from the clients.

Generally the iptables command looks something like this:

iptables -A INPUT -p tcp --source source.hostname.here --dport ! 27370:27375 --syn -j REJECT --reject-with icmp-host-prohibited

Keep in mind that the above is only an example -- you'll likely need to tailor this to your own environment.

Hi

Can you be a bit more detailed in what should I modify in the iptables? I get the same error.

Sign in to comment.

1 Answer

Answer by Jason Ross
on 10 Apr 2012
 Accepted Answer

In the Admin Center, there is a "Test Connectivity" test under the "Hosts" menu. Does that come back clean?

It sounds like there is something missing in name resolution:

  • host resolving its own name
  • forward lookup
  • reverse lookup

You can also pass the "-infolevel 2" flag to the nodestatus command. It will tell you the ports that the job manager and workers are using.

You might want to try turning off the firewalls temporarily to see if the port range is too restrictive or something else is "off". I've definitely encountered a fat-finger issue with iptables that caused issues similar to what you are seeing.

  0 Comments

Sign in to comment.