Configure for a Generic Scheduler

    Note   You must use the generic scheduler interface for any of the following:

    • Any third-party scheduler not listed in previous chapters (e.g., Sun Grid Engine, GridMP, etc.)

    • PBS other than PBS Pro

    • A nonshared file system when the client cannot directly submit to the scheduler (e.g., TORQUE on Windows)

This chapter includes the following sections. Read all that apply to your configuration:

Interfacing with Generic Schedulers

Support Scripts

To support usage of the generic scheduler interface, templates and scripts are provided with the product in the folder:

matlabroot\toolbox\distcomp\examples\integration (on Windows)

matlabroot/toolbox/distcomp/examples/integration (on UNIX)

Subfolders are provided for several different kinds of schedulers, and each of those contains a subfolder for the supported usage modes for shared file system, nonshared file system, or remote submission. Each folder contains a file named README that provides specific instructions on how to use the scripts.

For more information on programming jobs for generic schedulers, see:

Submission Mode

The provided scripts support three possible submission modes:

  • Shared — When the client machine is able to submit directly to the cluster and there is a shared file system present between the client and the cluster machines.

  • Remote Submission — When there is a shared file system present between the client and the cluster machines, but the client machine is not able to submit directly to the cluster (for example, if the scheduler's client utilities are not installed).

  • Nonshared — When there is not a shared file system between client and cluster machines.

Before using the support scripts, decide which submission mode describes your particular network setup.

Custom MPI Builds

You can use an MPI build that differs from the one provided with Parallel Computing Toolbox™. For more information about using this option with the generic scheduler interface, see Use Different MPI Builds on UNIX Systems.

Configure Generic Scheduler on Windows Cluster

If your cluster is already set up to use mpiexec and smpd, you can use Parallel Computing Toolbox™ software with your existing configuration if you are using a compatible MPI implementation library (as defined in matlabroot\toolbox\distcomp\mpi\mpiLibConf.m). However, if you do not have mpiexec on your cluster and you want to use it, you can use the mpiexec software shipped with the parallel computing products.

For further information about mpiexec and smpd, see the MPICH2 home page at http://www.mcs.anl.gov/research/projects/mpich2/. For user's guides and installation instructions on that page, select Documentation > User Docs.

In the following instructions, matlabroot refers to the MATLAB installation location.

To use mpiexec to distribute a job, the smpd service must be running on all nodes that will be used for running MATLAB® workers.

    Note   The smpd executable does not support running from a mapped drive. Use either a local installation, or the full UNC pathname to the executable. Microsoft® Windows Vista™ does not support the smpd executable on network share installations, so with Vista the installation must be local.

Choose one of the following configurations:

Without Delegation

  1. Log in as a user with administrator privileges.

  2. Start smpd by typing in a DOS command window one of the following, as appropriate:

    matlabroot\bin\win32\smpd -install
    

    or

    matlabroot\bin\win64\smpd -install
    

    This command installs the service and starts it. As long as the service remains installed, it will start each time the node boots.

  3. If this is a worker machine and you did not run the installer on it to install MDCS software (for example, if you are running MDCS software from a shared installation), execute the following command in a DOS command window.

    matlabroot\bin\matlab.bat -install_vcrt
    

    This command installs the Microsoft run-time libraries needed for running jobs with your scheduler.

  4. If you are using Windows® firewalls on your cluster nodes, execute the following in a DOS command window.

    matlabroot\toolbox\distcomp\bin\addMatlabToWindowsFirewall.bat
    

    This command adds MATLAB as an allowed program. If you are using other firewalls, you must configure them to make similar accommodation.

  5. Log in as the user who will be submitting jobs for execution on this node.

  6. Register this user to use mpiexec by typing one of the following, as appropriate:

    matlabroot\bin\win32\mpiexec -register
    

    or

    matlabroot\bin\win64\mpiexec -register
    
  7. Repeat steps 5–6 for all users who will run jobs on this machine.

  8. Repeat all these steps on all Windows nodes in your cluster.

Using Passwordless Delegation

  1. Log in as a user with administrator privileges.

  2. Start smpd by typing in a DOS command window one of the following, as appropriate:

    matlabroot\bin\win32\smpd -register_spn
    

    or

    matlabroot\bin\win64\smpd -register_spn
    

    This command installs the service and starts it. As long as the service remains installed, it will start each time the node boots.

  3. If this is a worker machine and you did not run the installer on it to install MDCS software (for example, if you are running MDCS software from a shared installation), execute the following command in a DOS command window.

    matlabroot\bin\matlab.bat -install_vcrt
    

    This command installs the Microsoft run-time libraries needed for running jobs with your scheduler.

  4. If you are using Windows firewalls on your cluster nodes, execute the following in a DOS command window.

    matlabroot\toolbox\distcomp\bin\addMatlabToWindowsFirewall.bat
    

    This command adds MATLAB as an allowed program. If you are using other firewalls, you must configure them for similar accommodation.

  5. Repeat these steps on all Windows nodes in your cluster.

Configure Sun Grid Engine on Linux Cluster

To run communicating jobs with MATLAB Distributed Computing Server™ and Sun™ Grid Engine (SGE), you need to establish a "matlab" parallel environment for SGE. The "matlab" parallel environment described in these instructions is based on the "MPI" example shipped with SGE. To use this parallel environment, you must use the matlabpe.template, customized to match the number of slots available, and to indicate where the startmatlabpe.sh and stopmatlabpe.sh scripts are installed on your cluster.

In the following instructions, matlabroot refers to the MATLAB installation location.

Create the Parallel Environment

The following steps create the parallel environment (PE), and then make the parallel environment runnable on a particular queue. You should perform these steps on the head node of your cluster.

  1. Navigate to the folder of the integration files appropriate for your cluster: shared, nonshared, or remoteSubmission, with one of the following shell commands.

    cd matlabroot/toolbox/distcomp/examples/integration/sge/shared
    cd matlabroot/toolbox/distcomp/examples/integration/sge/nonshared
    cd matlabroot/toolbox/distcomp/examples/integration/sge/remoteSubmission
    
  2. Modify the contents of matlabpe.template to use the desired number of slots and the correct location of the startmatlabpe.sh and stopmatlabpe.sh files. (These files can exist in a shared location accessible by all hosts, or they can be copied to the same local on each host.) You can also change other values or add additional values to matlabpe.template to suit your cluster. For more information, refer to the sge_pe documentation provided with your scheduler.

  3. Add the "matlab" parallel environment, using a shell command like:

    qconf -Ap matlabpe.template
    
  4. Make the "matlab" parallel environment runnable on all queues:

    qconf -mq all.q
    

    This will bring up a text editor for you to make changes: search for the line pe_list, and add matlab.

  5. Ensure you can submit a trivial job to the PE:

    $ echo "hostname" | qsub -pe matlab 1
    
  6. Use qstat to check that the job runs correctly, and check that the output file contains the name of the host that ran the job. The default filename for the output file is ~/STDIN.o###, where ### is the SGE job number.

      Note   The example submit functions for SGE rely on the presence of the "matlab" parallel environment. If you change the name of the parallel environment to something other than "matlab", you must ensure that you also change the submit functions.

Configure Windows Firewalls on Client

If you are using Windows firewalls on your cluster nodes,

  1. Log in as a user with administrative privileges.

  2. Execute the following in a DOS command window.

    matlabroot\toolbox\distcomp\bin\addMatlabToWindowsFirewall.bat
    

    This command adds MATLAB as an allowed program. If you are using other firewalls, you must configure them to for accommodation.

Validate Installation Using a Generic Scheduler

Testing the installation of the parallel computing products with a generic scheduler requires familiarity with your network configuration, with your scheduler interface, and with the generic scheduler interface of Parallel Computing Toolbox software.

    Note   The remainder of this chapter illustrates only the case of using LSF® in a nonshared file system. For other schedulers or a shared file system, look for the appropriate scripts and modify them as necessary, using the following instructions as a guide. If you have any questions, contact the MathWorks install support team.

Example Setup for LSF

This section provides guidelines for setting up your cluster profile to use the generic scheduler interface with an LSF scheduler in a network without a shared file system between the client the cluster machines. The scripts necessary to set up your test are found in:

matlabroot/toolbox/distcomp/examples/integration/lsf/nonshared

These scripts are written for an LSF scheduler, but might require modification to work in your network. The following diagram illustrates the cluster setup:

In this type of configuration, job data is copied from the client host running a Windows operating system to a host on the cluster (cluster login node) running a UNIX® operating system. From the cluster login node, the LSF bsub command submits the job to the scheduler. When the job finishes, its output is copied back to the client host.

Requirements.  For this setup to work, the following conditions must be met:

  • The client node and cluster login node must support ssh and sFTP.

  • The cluster login node must be able to call the bsub command to submit a job to an LSF scheduler. You can find more about this in the file:

    matlabroot\toolbox\distcomp\examples\integration\lsf\nonshared\README
    

If these requirements are met, use the following steps to implement the solution:

Step 1: Set Up Windows Client Host

On the Client Host.  

  1. You need the necessary scripts on the path of the MATLAB client. You can do this by copying them to a folder already on the path.

    Browse to the folder:

    matlabroot\toolbox\distcomp\examples\integration\lsf\nonshared
    

    Copy all the files from that folder, and paste them into the folder:

    matlabroot\toolbox\local
    

Step 2: Define a Cluster Profile

In this step you define a cluster profile to use in subsequent steps.

  1. Start a MATLAB session on the client host.

  2. Start the Cluster Profile Manager from the MATLAB desktop by selecting Parallel > Manage Cluster Profiles.

  3. Create a new profile in the Cluster Profile Manager by selecting New > Generic.

  4. With the new profile selected in the list, click Rename and edit the profile name to be InstallTest. Press Enter.

  5. In the Properties tab, provide settings for the following fields:

    1. Set the Description field to For testing installation.

    2. Set the JobStorageLocation to the location where you want job and task data to be stored on the client machine (not the cluster location).

        Note   JobStorageLocation should not be shared by parallel computing products running different versions; each version on your cluster should have its own JobStorageLocation.

    3. Set the NumWorkers to the number of workers you want to test your installation on.

    4. Set the ClusterMatlabRoot to the installation location of the MATLAB to be executed by the worker machines, as determined in Chapter 1 of the installation instructions.

    5. Set IndependentSubmitFcn with the following text:

      {@independentSubmitFcn, 'cluster-host-name', '/network/share/joblocation'}

      where

      cluster-host-name is the name of the cluster host from which the job will be submitted to the scheduler; and, /network/share/joblocation is the location on the cluster where the scheduler can access job data. This must be accessible from all cluster nodes.

    6. Set CommunicatingSubmitFcn with the following text:

      {@communicatingSubmitFcn, 'cluster-host-name', '/network/share/joblocation'}
    7. Set the OperatingSystem to the operating system of your cluster worker machines.

    8. Set HasSharedFilesystem to false, indicating that the client node and worker nodes cannot share the same data location.

    9. Set the GetJobStateFcn to @getJobStateFcn.

    10. Set the DeleteJobFcn field to @deleteJobFcn.

  6. Click Done to save your cluster profile changes.

The dialog box should look something like this.

Step 3: Validate Cluster Profile

In this step you validate your cluster profile, and thereby your installation.

  1. If it is not already open, start the Cluster Profile Manager from the MATLAB desktop by selecting on the Home tab in the Environment area Parallel > Manage Cluster Profiles.

  2. Select your cluster profile in the listing.

  3. Click Validate.

The Validation Results tab shows the output. The following figure shows the results of a profile that passed all validation tests.

    Note   If your validation fails any stage, contact the MathWorks install support team.

If your validation passed, you now have a valid profile that you can use in other parallel applications. You can make any modifications to your profile appropriate for your applications, such as NumWorkersRange, AttachedFiles, AdditionalPaths, etc. To save your profile for other users, select the profile and click Export, then save your profile to a file in a convenient location. Later, when running the Cluster Profile Manager, other users can import your profile by clicking Import..

Was this topic helpful?