File Exchange

image thumbnail

Cluster setup scripts

version 1.2.0.1 (7.21 KB) by Jos Martin
Shell scripts to easily control a MATLAB distributing computing cluster

2 Downloads

Updated 01 Sep 2016

View License

What do these scripts do?
=========================
These Bourne shell scripts allow you to control a MATLAB Distributing Computing
cluster (Version 2) easily. To use them, follow the steps below:

Step 1: Configure ssh
=====================

To use these scripts, first set up ssh to be password free for the
user under whom you have chosen the Distributed Computing Engine
daemons (mdce) to run. There are numerous tutorials on how to do this
on the internet. But here are some steps in brief:

(1) Run ssh-keygen -t dsa to generate files. If you choose a non-empty
passphrase, you can use ssh-agent and ssh-add to make life easier
later on.
(2) Copy the newly generated file id_dsa.pub to the .ssh directory of
the mdce user on the remote machine and rename the file as "authorized_keys"

Step 2: Edit the configuration files
====================================

Then, edit the following files:

(1) matlabroot
(2) mode
(3) hosts

More detail for each in turn:

(1) matlabroot

This must contain the root directory of the MATLAB installation on
each cluster node. It must be the same for all nodes to use these
scripts. For example:

$ cat matlabroot
/usr/local/matlab

(2) mode

This must contain the string "synchronous" or the string
"asynchronous". If the former, then commands are executed by the
scripts synchonously, or in a queue, with each starting only when the
previous has returned. If the latter, then commands are executed by
the scripts asynchonrously, or simultaneously, with them all running
in the background at the same time. However, the scripts wait until
they have all finished before returning.

The latter mode is faster, but it is easier to see what the scripts
are doing when run in the former mode.

(3) hosts

This the main configuration file for the cluster. It is easiest to see
what is going on with an example:

$ cat hosts
MDCE hosts Job Managers Workers
machine1 jm1 jm1,jm1
machine2 - -
machine3 jm2,jm3 -
machine4 - jm2
machine5 - jm2,jm3

The first column consists of the list of machines which form the
cluster.

The second column consists of job manager names. In the above
example, machine1 will run a job manager called jm1 and machine3 will
run two job managers, called jm2 and jm3. The other machines run no
job managers.

The third column defines which workers will run, where they run and
to which job managers they will attach. In the example above, two
worker processes will run on machine1 and attach to the job manager
called jm1. On machine4, one worker will run, attached to jm2 and
machine5 will run 2 workers, one attached to jm2 and the other
attached to jm3. Nothing execpt the mdce itself will run on machine2.

Any whitespace can separate the columns but an empty entry must be a
hyphen "-" as in the example.

Step 3: Run the commands
========================

In the directory <matlabroot>/toolbox/distcomp/bin you will find the
following commands:

mdce
startjobmanager.sh
stopjobmanager.sh
startworker.sh
stopworker.sh

In this package, you will find a "distributed" version of each of
these, prefixed with a "d". Namely:

dmdce
dstartjobmanagers.sh
dstopjobmanagers.sh
dstartworkers.sh
dstopworkers.sh

Each of these commands can accept the same command-line arguments as
their non-distributed counterparts, with expection of those that are
defined by the hosts file. These are:

For dstartjobmanagers.sh and dstopjobmanagers.sh, -name and -remotehost.
For dstartworkers.sh and dstopworkers.sh, -name, -jobmanager,
-jobmanagerhost and -remotehost.

To bring up a cluster typical usage might be as follows (output suppressed):

$./dmdce start
$./dstartjobmanagers.sh -clean
$./dstartworkers.sh

and to take down a cluster:

$./dstopworkers.sh
$./dstopjobmanagers.sh
$./dmdce stop

although just the last line would do the trick. The only other command in
the directory is "dssh" which loops over the list of hosts and runs a
command using ssh on each. The dmdce command uses this to run mdce on
remote hosts, as did the other commands before the -remotehost option
became available. A nice way of checking that your ssh is configured
properly is to run something like:

$./dssh hostname

Another use of ./dssh would be to blow away all the checkpoint
history, usually in /var/lib/mdce:

$./dssh rm -rf /var/lib/mdce

----------------------------------------------------------------------
Version 1.1, 2006-04-27
Please send bug reports to jos.martin@mathworks.co.uk

Comments and Ratings (0)

Updates

1.2.0.1

Updated license

1.2.0.0

Changed bug report owner

MATLAB Release Compatibility
Created with R2006a
Compatible with any release
Platform Compatibility
Windows macOS Linux