Using a Fully Supported Third-Party Scheduler

Creating and Running Jobs

If your network already uses Platform LSF (Load Sharing Facility), Microsoft Windows Compute Cluster Server (CCS), PBS Pro, or a TORQUE scheduler, you can use Parallel Computing Toolbox software to create jobs to be distributed by your existing scheduler. This section provides instructions for using your scheduler.

This section details the steps of a typical programming session with Parallel Computing Toolbox software for jobs distributed to workers by a fully supported third-party scheduler.

This section assumes you have an LSF, PBS Pro, TORQUE, or CCS (including HPC Server 2008) scheduler installed and running on your network. For more information about LSF, see http://www.platform.com/Products/. For more information about CCS, see http://www.microsoft.com/hpc.

The following sections illustrate how to program Parallel Computing Toolbox software to use these schedulers:

Find an LSF, PBS Pro, or TORQUE Scheduler

You use the findResource function to identify the type of scheduler and to create an object representing the scheduler in your local MATLAB client session.

You specify the scheduler type for findResource to search for with one of the following:

sched = findResource('scheduler','type','lsf')
sched = findResource('scheduler','type','pbspro')
sched = findResource('scheduler','type','torque')

You set properties on the scheduler object to specify

set(sched, 'DataLocation', '\\share\scratch\jobdata')
set(sched, 'HasSharedFilesystem', true)
set(sched, 'ClusterMatlabRoot', '\\apps\matlab\')

Alternatively, you can use a parallel configuration to find the scheduler and set the object properties with a single findResource statement.

If DataLocation is not set, the default location for job data is the current working directory of the MATLAB client the first time you use findResource to create an object for this type of scheduler. All settable property values on a scheduler object are local to the MATLAB client, and are lost when you close the client session or when you remove the object from the client workspace with delete or clear all.

You can look at all the property settings on the scheduler object. If no jobs are in the DataLocation directory, the Jobs property is a 0-by-1 array.

get(sched)
                     Configuration: ''
                              Type: 'lsf'
                      DataLocation: '\\share\scratch\jobdata'
               HasSharedFilesystem: 1
                              Jobs: [0x1 double]
                 ClusterMatlabRoot: '\\apps\matlab\'
                     ClusterOsType: 'unix'
                          UserData: []
                       ClusterSize: Inf
                       ClusterName: 'CENTER_MATRIX_CLUSTER'
                        MasterName: 'masterhost.clusternet.ourdomain.com'
                   SubmitArguments: ''
   ParallelSubmissionWrapperScript: [1x92 char]

Find a CCS Scheduler

You use the findResource function to identify the CCS scheduler and to create an object representing the scheduler in your local MATLAB client session.

You specify 'ccs' as the scheduler type for findResource to search for.

sched = findResource('scheduler','type','ccs')

You set properties on the scheduler object to specify

set(sched, 'DataLocation', '\\share\scratch\jobdata');
set(sched, 'ClusterMatlabRoot', '\\apps\matlab\');
set(sched, 'ClusterOsType', 'pc');
set(sched, 'SchedulerHostname', 'server04');
set(sched, "UseSOAJobSubmission', false);

Alternatively, you can use a parallel configuration to find the scheduler and set the object properties with a single findResource statement.

If DataLocation is not set, the default location for job data is the current working directory of the MATLAB client the first time you use findResource to create an object for this type of scheduler. All settable property values on a scheduler object are local to the MATLAB client, and are lost when you close the client session or when you remove the object from the client workspace with delete or clear all.

You can look at all the property settings on the scheduler object. If no jobs are in the DataLocation directory, the Jobs property is a 0-by-1 array.

get(sched)
          Configuration: ''
                   Type: 'ccs'
           DataLocation: '\\share\scratch\jobdata'
    HasSharedFilesystem: 1
                   Jobs: [0x1 double]
      ClusterMatlabRoot: '\\apps\matlab\'
          ClusterOsType: 'pc'
               UserData: []
            ClusterSize: Inf
      SchedulerHostname: 'server04'
    UseSOAJobSubmission: 0

Create a Job

You create a job with the createJob function, which creates a job object in the client session. The job data is stored in the directory specified by the scheduler object's DataLocation property.

j = createJob(sched)

This statement creates the job object j in the client session. Use get to see the properties of this job object.

get(j)
       Configuration: ''
                Name: 'Job1'
                  ID: 1
            UserName: 'eng1'
                 Tag: ''
               State: 'pending'
          CreateTime: 'Fri Jul 29 16:15:47 EDT 2005'
          SubmitTime: ''
           StartTime: ''
          FinishTime: ''
               Tasks: [0x1 double]
    FileDependencies: {0x1 cell}
    PathDependencies: {0x1 cell}
             JobData: []
              Parent: [1x1 distcomp.lsfscheduler]
            UserData: []

This output varies only slightly between jobs that use LSF and CCS schedulers, but is quite different from a job that uses a job manager. For example, jobs on LSF or CCS schedulers have no callback functions.

The job's State property is pending. This state means the job has not been queued for running yet. This new job has no tasks, so its Tasks property is a 0-by-1 array.

The scheduler's Jobs property is now a 1-by-1 array of distcomp.simplejob objects, indicating the existence of your job.

get(sched, 'Jobs')
    Jobs: [1x1 distcomp.simplejob]

You can transfer files to the worker by using the FileDependencies property of the job object. Workers can access shared files by using the PathDependencies property of the job object. For details, see the FileDependencies and PathDependencies reference pages and Sharing Code.

Create Tasks

After you have created your job, you can create tasks for the job. Tasks define the functions to be evaluated by the workers during the running of the job. Often, the tasks of a job are all identical except for different arguments or data. In this example, each task will generate a 3-by-3 matrix of random numbers.

createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});
createTask(j, @rand, 1, {3,3});

The Tasks property of j is now a 5-by-1 matrix of task objects.

get(j,'Tasks')
ans =
    distcomp.simpletask: 5-by-1

Alternatively, you can create the five tasks with one call to createTask by providing a cell array of five cell arrays defining the input arguments to each task.

T = createTask(job1, @rand, 1, {{3,3} {3,3} {3,3} {3,3} {3,3}});

In this case, T is a 5-by-1 matrix of task objects.

Submit a Job to the Job Queue

To run your job and have its tasks evaluated, you submit the job to the scheduler's job queue.

submit(j)

The scheduler distributes the tasks of job j to MATLAB workers for evaluation. For each task, the scheduler starts a MATLAB worker session on a worker node; this MATLAB worker session runs for only as long as it takes to evaluate the one task. If the same node evaluates another task in the same job, it does so with a different MATLAB worker session.

The job runs asynchronously with the MATLAB client. If you need to wait for the job to complete before you continue in your MATLAB client session, you can use the waitForState function.

waitForState(j)

The default state to wait for is finished. This function causes MATLAB to pause until the State property of j is 'finished'.

Retrieve the Job's Results

The results of each task's evaluation are stored in that task object's OutputArguments property as a cell array. Use getAllOutputArguments to retrieve the results from all the tasks in the job.

results = getAllOutputArguments(j);

Display the results from each task.

results{1:5}

    0.9501    0.4860    0.4565
    0.2311    0.8913    0.0185
    0.6068    0.7621    0.8214

    0.4447    0.9218    0.4057
    0.6154    0.7382    0.9355
    0.7919    0.1763    0.9169

    0.4103    0.3529    0.1389
    0.8936    0.8132    0.2028
    0.0579    0.0099    0.1987

    0.6038    0.0153    0.9318
    0.2722    0.7468    0.4660
    0.1988    0.4451    0.4186

    0.8462    0.6721    0.6813
    0.5252    0.8381    0.3795
    0.2026    0.0196    0.8318

Sharing Code

Because different machines evaluate the tasks of a job, each machine must have access to all the files needed to evaluate its tasks. The following sections explain the basic mechanisms for sharing data:

Directly Accessing Files

If all the workers have access to the same drives on the network, they can access needed files that reside on these shared resources. This is the preferred method for sharing data, as it minimizes network traffic.

You must define each worker session's path so that it looks for files in the correct places. You can define the path by

Passing Data Between Sessions

A number of properties on task and job objects are for passing code or data from client to scheduler or worker, and back. This information could include M-code necessary for task evaluation, or the input data for processing or output data resulting from task evaluation. All these properties are described in detail in their own reference pages:

Passing M-Code for Startup and Finish

As a session of MATLAB, a worker session executes its startup.m file each time it starts. You can place the startup.m file in any directory on the worker's MATLAB path, such as toolbox/distcomp/user.

Three additional M-files can initialize and clean a worker session as it begins or completes evaluations of tasks for a job:

Empty versions of these files are provided in the directory

matlabroot/toolbox/distcomp/user

You can edit these files to include whatever M-code you want the worker to execute at the indicated times.

Alternatively, you can create your own versions of these M-files and pass them to the job as part of the FileDependencies property, or include the pathnames to their locations in the PathDependencies property.

The worker gives precedence to the versions provided in the FileDependencies property, then to those pointed to in the PathDependencies property. If any of these files is not included in these properties, the worker uses the version of the file in the toolbox/distcomp/user directory of the worker's MATLAB installation.

For further details on these M-files, see the jobStartup, taskStartup, and taskFinish reference pages.

Managing Objects

Objects that the client session uses to interact with the scheduler are only references to data that is actually contained in the directory specified by the DataLocation property. After jobs and tasks are created, you can shut down your client session, restart it, and your job will still be stored in that remote location. You can find existing jobs using the Jobs property of the recreated scheduler object.

The following sections describe how to access these objects and how to permanently remove them:

What Happens When the Client Session Ends?

When you close the client session of Parallel Computing Toolbox software, all of the objects in the workspace are cleared. However, job and task data remains in the directory identified by DataLocation. When the client session ends, only its local reference objects are lost, not the data of the scheduler.

Therefore, if you have submitted your job to the scheduler job queue for execution, you can quit your client session of MATLAB, and the job will be executed by the scheduler. The scheduler maintains its job and task data. You can retrieve the job results later in another client session.

Recovering Objects

A client session of Parallel Computing Toolbox software can access any of the objects in the DataLocation, whether the current client session or another client session created these objects.

You create scheduler objects in the client session by using the findResource function.

sched = findResource('scheduler', 'type', 'LSF');
set(sched, 'DataLocation', '/share/scratch/jobdata');

When you have access to the scheduler by the object sched, you can create objects that reference all the data contained in the specified location for that scheduler. All the job and task data contained in the scheduler data location are accessible in the scheduler object's Jobs property, which is an array of job objects.

all_jobs = get(sched, 'Jobs')

You can index through the array all_jobs to locate a specific job.

Alternatively, you can use the findJob function to search in a scheduler object for a particular job identified by any of its properties, such as its State.

finished_jobs = findJob(sched, 'State', 'finished')

This command returns an array of job objects that reference all finished jobs on the scheduler sched, whose data is found in the specified DataLocation.

Destroying Jobs

Jobs in the scheduler continue to exist even after they are finished. From the command line in the MATLAB client session, you can call the destroy function for any job object. If you destroy a job, you destroy all tasks contained in that job. The job and task data is deleted from the DataLocation directory.

For example, find and destroy all finished jobs in your scheduler whose data is stored in a specific directory.

sched = findResource('scheduler', 'name', 'LSF');
set(sched, 'DataLocation', '/share/scratch/jobdata');
finished_jobs = findJob(sched, 'State', 'finished');
destroy(finished_jobs);
clear finished_jobs

The destroy function in this example permanently removes from the scheduler data those finished jobs whose data is in /apps/data/project_88. The clear function removes the object references from the local MATLAB client workspace.

  


 © 1984-2009- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS