Using the Generic Scheduler Interface

Overview

Parallel Computing Toolbox™ software provides a generic interface that lets you interact with third-party schedulers, or use your own scripts for distributing tasks to other nodes on the cluster for evaluation.

Because each job in your application is comprised of several tasks, the purpose of your scheduler is to allocate a cluster node for the evaluation of each task, or to distribute each task to a cluster node. The scheduler starts remote MATLAB® worker sessions on the cluster nodes to evaluate individual tasks of the job. To evaluate its task, a MATLAB worker session needs access to certain information, such as where to find the job and task data. The generic scheduler interface provides a means of getting tasks from your Parallel Computing Toolbox client session to your scheduler and thereby to your cluster nodes.

To evaluate a task, a worker requires five parameters that you must pass from the client to the worker. The parameters can be passed any way you want to transfer them, but because a particular one must be an environment variable, the examples in this section pass all parameters as environment variables.

MATLAB® Client Submit Function

When you submit a job to a scheduler, the function identified by the scheduler object's SubmitFcn property executes in the MATLAB client session. You set the scheduler's SubmitFcn property to identify the submit function and any arguments you might want to send to it. For example, to use a submit function called mysubmitfunc, you set the property with the command

set(sched, 'SubmitFcn', @mysubmitfunc)

where sched is the scheduler object in the client session, created with the findResource function. In this case, the submit function gets called with its three default arguments: scheduler, job, and properties object, in that order. The function declaration line of the function might look like this:

function mysubmitfunc(scheduler, job, props)

Inside the function of this example, the three argument objects are known as scheduler, job, and props.

You can write a submit function that accepts more than the three default arguments, and then pass those extra arguments by including them in the definition of the SubmitFcn property.

time_limit = 300
testlocation = 'Plant30'
set(sched, 'SubmitFcn', {@mysubmitfunc, time_limit, testlocation})

In this example, the submit function requires five arguments: the three defaults, along with the numeric value of time_limit and the string value of testlocation. The function's declaration line might look like this:

function mysubmitfunc(scheduler, job, props, localtimeout, plant)

The following discussion focuses primarily on the minimum requirements of the submit and decode functions.

This submit function has three main purposes:

Identifying the Decode Function

The client's submit function and the worker's decode function work together as a pair. Therefore, the submit function must identify its corresponding decode function. The submit function does this by setting the environment variable MDCE_DECODE_FUNCTION. The value of this variable is a string identifying the name of the decode function on the path of the MATLAB worker. Neither the decode function itself nor its name can be passed to the worker in a job or task property; the file must already exist before the worker starts. For more information on the decode function, see MATLAB® Worker Decode Function.

Passing Job and Task Data

The third input argument (after scheduler and job) to the submit function is the object with the properties listed in the following table.

You do not set the values of any of these properties. They are automatically set by the toolbox so that you can program your submit function to forward them to the worker nodes.

Property Name

Description

StorageConstructor

String. Used internally to indicate that a file system is used to contain job and task data.

StorageLocation

String. Derived from the scheduler DataLocation property.

JobLocation

String. Indicates where this job's data is stored.

TaskLocations

Cell array. Indicates where each task's data is stored. Each element of this array is passed to a separate worker.

NumberOfTasks

Double. Indicates the number of tasks in the job. You do not need to pass this value to the worker, but you can use it within your submit function.

With these values passed into your submit function, the function can pass them to the worker nodes by any of several means. However, because the name of the decode function must be passed as an environment variable, the examples that follow pass all the other necessary property values also as environment variables.

The submit function writes the values of these object properties out to environment variables with the setenv function.

Defining Scheduler Command to Run MATLAB® Workers

The submit function must define the command necessary for your scheduler to start MATLAB workers. The actual command is specific to your scheduler and network configuration. The commands for some popular schedulers are listed in the following table. This table also indicates whether or not the scheduler automatically passes environment variables with its submission. If not, your command to the scheduler must accommodate these variables.

Scheduler

Scheduler Command

Passes Environment Variables

Condor®

condor_submit

Not by default. Command can pass all or specific variables.

LSF®

bsub

Yes, by default.

PBS

qsub

Command must specify which variables to pass.

Sun™ Grid Engine

qsub

Command must specify which variables to pass.

Your submit function might also use some of these properties and others when constructing and invoking your scheduler command. scheduler, job, and props (so named only for this example) refer to the first three arguments to the submit function.

Argument Object

Property

scheduler

MatlabCommandToRun

scheduler

ClusterMatlabRoot

job

MinimumNumberOfWorkers

job

MaximumNumberOfWorkers

props

NumberOfTasks

Example — Writing the Submit Function

The submit function in this example uses environment variables to pass the necessary information to the worker nodes. Each step below indicates the lines of code you add to your submit function.

  1. Create the function declaration. There are three objects automatically passed into the submit function as its first three input arguments: the scheduler object, the job object, and the props object.

    function mysubmitfunc(scheduler, job, props)

    This example function uses only the three default arguments. You can have additional arguments passed into your submit function, as discussed in MATLAB® Client Submit Function.

  2. Identify the values you want to send to your environment variables. For convenience, you define local variables for use in this function.

    decodeFcn = 'mydecodefunc';
    jobLocation = get(props, 'JobLocation');
    taskLocations = get(props, 'TaskLocations'); %This is a cell array
    storageLocation = get(props, 'StorageLocation');
    storageConstructor = get(props, 'StorageConstructor');

    The name of the decode function that must be available on the MATLAB worker path is mydecodefunc.

  3. Set the environment variables, other than the task locations. All the MATLAB workers use these values when evaluating tasks of the job.

    setenv('MDCE_DECODE_FUNCTION', decodeFcn);
    setenv('MDCE_JOB_LOCATION', jobLocation);
    setenv('MDCE_STORAGE_LOCATION', storageLocation);
    setenv('MDCE_STORAGE_CONSTRUCTOR', storageConstructor);

    Your submit function can use any names you choose for the environment variables, with the exception of MDCE_DECODE_FUNCTION; the MATLAB worker looks for its decode function identified by this variable. If you use alternative names for the other environment variables, be sure that the corresponding decode function also uses your alternative variable names.

  4. Set the task-specific variables and scheduler commands. This is where you instruct your scheduler to start MATLAB workers for each task.

    for i = 1:props.NumberOfTasks
        setenv('MDCE_TASK_LOCATION', taskLocations{i});
        constructSchedulerCommand;
    end

    The line constructSchedulerCommand represents the code you write to construct and execute your scheduler's submit command. This command is typically a string that combines the scheduler command with necessary flags, arguments, and values derived from the values of your object properties. This command is inside the for-loop so that your scheduler gets a command to start a MATLAB worker on the cluster for each task.

MATLAB® Worker Decode Function

The sole purpose of the MATLAB worker's decode function is to read certain job and task information into the MATLAB worker session. This information could be stored in disk files on the network, or it could be available as environment variables on the worker node. Because the discussion of the submit function illustrated only the usage of environment variables, so does this discussion of the decode function.

When working with the decode function, you must be aware of the

Identifying File Name and Location

The client's submit function and the worker's decode function work together as a pair. For more information on the submit function, see MATLAB® Client Submit Function. The decode function on the worker is identified by the submit function as the value of the environment variable MDCE_DECODE_FUNCTION. The environment variable must be copied from the client node to the worker node. Your scheduler might perform this task for you automatically; if it does not, you must arrange for this copying.

The value of the environment variable MDCE_DECODE_FUNCTION defines the filename of the decode function, but not its location. The file cannot be passed as part of the job PathDependencies or FileDependencies property, because the function runs in the MATLAB worker before that session has access to the job. Therefore, the file location must be available to the MATLAB worker as that worker starts.

You can get the decode function on the worker's path by either moving the file into a directory on the path (for example, matlabroot/toolbox/local), or by having the scheduler use cd in its command so that it starts the MATLAB worker from within the directory that contains the decode function.

In practice, the decode function might be identical for all workers on the cluster. In this case, all workers can use the same decode function file if it is accessible on a shared drive.

When a MATLAB worker starts, it automatically runs the file identified by the MDCE_DECODE_FUNCTION environment variable. This decode function runs before the worker does any processing of its task.

Reading the Job and Task Information

When the environment variables have been transferred from the client to the worker nodes (either by the scheduler or some other means), the decode function of the MATLAB worker can read them with the getenv function.

With those values from the environment variables, the decode function must set the appropriate property values of the object that is its argument. The property values that must be set are the same as those in the corresponding submit function, except that instead of the cell array TaskLocations, each worker has only the individual string TaskLocation, which is one element of the TaskLocations cell array. Therefore, the properties you must set within the decode function on its argument object are as follows:

Example — Writing the Decode Function

The decode function must read four environment variables and use their values to set the properties of the object that is the function's output.

In this example, the decode function's argument is the object props.

function props = workerDecodeFunc(props)
% Read the environment variables:
storageConstructor = getenv('MDCE_STORAGE_CONSTRUCTOR');
storageLocation = getenv('MDCE_STORAGE_LOCATION');
jobLocation = getenv('MDCE_JOB_LOCATION');
taskLocation = getenv('MDCE_TASK_LOCATION');
%
% Set props object properties from the local variables:
set(props, 'StorageConstructor', storageConstructor);
set(props, 'StorageLocation', storageLocation);
set(props, 'JobLocation', jobLocation);
set(props, 'TaskLocation', taskLocation);

When the object is returned from the decode function to the MATLAB worker session, its values are used internally for managing job and task data.

Example — Programming and Running a Job in the Client

1. Create a Scheduler Object

You use the findResource function to create an object representing the scheduler in your local MATLAB client session.

You can specify 'generic' as the name for findResource to search for. (Any scheduler name starting with the string 'generic' creates a generic scheduler object.)

sched = findResource('scheduler', 'type', 'generic')

Generic schedulers must use a shared file system for workers to access job and task data. Set the DataLocation and HasSharedFilesystem properties to specify where the job data is stored and that the workers should access job data directly in a shared file system.

set(sched, 'DataLocation', '\\apps\data\project_101')
set(sched, 'HasSharedFilesystem', true)

If DataLocation is not set, the default location for job data is the current working directory of the MATLAB client the first time you use findResource to create an object for this type of scheduler, which might not be accessible to the worker nodes.

If MATLAB is not on the worker's system path, set the ClusterMatlabRoot property to specify where the workers are to find the MATLAB installation.

set(sched, 'ClusterMatlabRoot', '\\apps\matlab\')

You can look at all the property settings on the scheduler object. If no jobs are in the DataLocation directory, the Jobs property is a 0-by-1 array. All settable property values on a scheduler object are local to the MATLAB client, and are lost when you close the client session or when you remove the object from the client workspace with delete or clear all.

get(sched)
                   Type: 'generic'
           DataLocation: '\\apps\data\project_101'
    HasSharedFilesystem: 1
                   Jobs: [0x1 double]
      ClusterMatlabRoot: '\\apps\matlab\'
          ClusterOsType: 'pc'
               UserData: []
            ClusterSize: Inf
     MatlabCommandToRun: 'worker'
              SubmitFcn: []
      ParallelSubmitFcn: []
          Configuration: ''

You must set the SubmitFcn property to specify the submit function for this scheduler.

set(sched, 'SubmitFcn', @mysubmitfunc)

With the scheduler object and the user-defined submit and decode functions defined, programming and running a job is now similar to doing so with a job manager or any other type of scheduler.

2. Create a Job

You create a job with the createJob function, which creates a job object in the client session. The job data is stored in the directory specified by the scheduler object's DataLocation property.

j = createJob(sched)

This statement creates the job object j in the client session. Use get to see the properties of this job object.

get(j)
                Name: 'Job1'
                  ID: 1
            UserName: 'neo'
                 Tag: ''
               State: 'pending'
          CreateTime: 'Fri Jan 20 16:15:47 EDT 2006'
          SubmitTime: ''
           StartTime: ''
          FinishTime: ''
               Tasks: [0x1 double]
    FileDependencies: {0x1 cell}
    PathDependencies: {0x1 cell}
             JobData: []
              Parent: [1x1 distcomp.genericscheduler]
            UserData: []

This generic scheduler job has somewhat different properties than a job that uses a job manager. For example, this job has no callback functions.

The job's State property is pending. This state means the job has not been queued for running yet. This new job has no tasks, so its Tasks property is a 0-by-1 array.

The scheduler's Jobs property is now a 1-by-1 array of distcomp.simplejob objects, indicating the existence of your job.

get(sched)
                   Type: 'generic'
           DataLocation: '\\apps\data\project_101'
    HasSharedFilesystem: 1
                   Jobs: [1x1 distcomp.simplejob]
      ClusterMatlabRoot: '\\apps\matlab\'
          ClusterOsType: 'pc'
               UserData: []
            ClusterSize: Inf
     MatlabCommandToRun: 'worker'
              SubmitFcn: @mysubmitfunc
      ParallelSubmitFcn: []
          Configuration: ''

3. Create Tasks

After you have created your job, you can create tasks for the job. Tasks define the functions to be evaluated by the workers during the running of the job. Often, the tasks of a job are identical except for different arguments or data. In this example, each task generates a 3-by-3 matrix of random numbers.

createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});

The Tasks property of j is now a 5-by-1 matrix of task objects.

get(j,'Tasks')
ans =
    distcomp.simpletask: 5-by-1

Alternatively, you can create the five tasks with one call to createTask by providing a cell array of five cell arrays defining the input arguments to each task.

T = createTask(job1, @rand, 1, {{3,3} {3,3} {3,3} {3,3} {3,3}});

In this case, T is a 5-by-1 matrix of task objects.

4. Submit a Job to the Job Queue

To run your job and have its tasks evaluated, you submit the job to the scheduler's job queue.

submit(j)

The scheduler distributes the tasks of j to MATLAB workers for evaluation.

The job runs asynchronously. If you need to wait for it to complete before you continue in your MATLAB client session, you can use the waitForState function.

waitForState(j)

The default state to wait for is finished or failed. This function pauses MATLAB until the State property of j is 'finished' or 'failed'.

5. Retrieve the Job's Results

The results of each task's evaluation are stored in that task object's OutputArguments property as a cell array. Use getAllOutputArguments to retrieve the results from all the tasks in the job.

results = getAllOutputArguments(j);

Display the results from each task.

results{1:5}

    0.9501    0.4860    0.4565
    0.2311    0.8913    0.0185
    0.6068    0.7621    0.8214

    0.4447    0.9218    0.4057
    0.6154    0.7382    0.9355
    0.7919    0.1763    0.9169

    0.4103    0.3529    0.1389
    0.8936    0.8132    0.2028
    0.0579    0.0099    0.1987

    0.6038    0.0153    0.9318
    0.2722    0.7468    0.4660
    0.1988    0.4451    0.4186

    0.8462    0.6721    0.6813
    0.5252    0.8381    0.3795
    0.2026    0.0196    0.8318

Supplied Submit and Decode Functions

There are several submit and decode functions provided with the toolbox for your use with the generic scheduler interface. These files are in the directory

matlabroot/toolbox/distcomp/examples/integration

In this directory are subdirectories for each of several types of scheduler, containing wrappers, submit functions, and decode functions for distributed and parallel jobs. For example, the directory matlabroot/toolbox/distcomp/examples/integration/pbs contains the following files for use with a PBS scheduler:

FilenameDescription
pbsSubmitFcn.mSubmit function for a distributed job
pbsDecodeFunc.mDecode function for a distributed job
pbsParallelSubmitFcn.mSubmit function for a parallel job
pbsParallelDecode.mDecode function for a parallel job
pbsWrapper.shScript that is submitted to PBS to start workers that evaluate the tasks of a distributed job
pbsParallelWrapper.shScript that is submitted to PBS to start labs that evaluate the tasks of a parallel job

Depending on your network and cluster configuration, you might need to modify these files before they will work in your situation. Ask your system administrator for help.

At the time of publication, there are directories for PBS schedulers (pbs), Platform LSF® schedulers (lsf), generic UNIX®-based scripts (ssh), Sun Grid Engine (sge), and mpiexec on Microsoft® Windows® operating systems (winmpiexec). In addition, the pbs and lsf directories have subdirectories called nonshared, which contain scripts for use when there is a nonshared file system between the client and cluster computers. Each of these subdirectories contains a file called README, which provides instruction on how to use its scripts.

As more files or solutions might become available at any time, visit the support page for this product on the MathWorks Web site at http://www.mathworks.com/support/product/product.html?product=DM. This page also provides contact information in case you have any questions.

Summary

The following list summarizes the sequence of events that occur when running a job that uses the generic scheduler interface:

  1. Provide a submit function and a decode function. Be sure the decode function is on all the MATLAB workers' paths.

The following steps occur in the MATLAB client session:

  1. Define the SubmitFcn property of your scheduler object to point to the submit function.

  2. Send your job to the scheduler.

    submit(job)
  3. The client session runs the submit function.

  4. The submit function sets environment variables with values derived from its arguments.

  5. The submit function makes calls to the scheduler — generally, a call for each task (with environment variables identified explicitly, if necessary).

The following step occurs in your network:

  1. For each task, the scheduler starts a MATLAB worker session on a cluster node.

The following steps occur in each MATLAB worker session:

  1. The MATLAB worker automatically runs the decode function, finding it on the path.

  2. The decode function reads the pertinent environment variables.

  3. The decode function sets the properties of its argument object with values from the environment variables.

  4. The MATLAB worker uses these object property values in processing its task without your further intervention.

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS