MATLAB Examples

RUN MATLAB COMPUTATIONS ON A SUN GRIDENGINE CLUSTER

This demo shows how QSUB_SUBMIT_CM, QSUB_RUN_CM and QSUB_CHECK_FINISH can be used to run MATLAB computations on a cluster of UNIX/Linux machines. The actual user interface is QSUB_SUBMIT_CM. Both QSUB_RUN_CM and QSUB_CHECK_FINISH are usually only called from code that is generated in QSUB_SUBMIT_CM.

However, before describing QSUB_SUBMIT_CM, the underlying mechanisms of job execution are explained. Trouble shooting in a distributed environment can be quite tricky. Therefore it is quite important to know where to start debugging when a certain kind of error occurs.

Contents

CREATE JOB STRUCTURE

Each computation job is stored in a job structure. Here, a very simple job is created to demonstrate how job submission, execution and results collection work. The task is to compute sin(rand(100)) and to return the result.

job.fun      = @sin;
job.job      = {rand(100)};
job.noutputs = 1;
job.ctx.path = path;

TESTING JOB EXECUTION

Before submitting jobs to a cluster, it is very important to test the correctness of job execution. There are different levels of tests:

  • Execute job.fun on a sample data set
  • Save job to a .mat file and run QSUB_RUN_CM from the current MATLAB session
  • Use the saved job and run MATLAB with the run script that would be generated by QSUB_SUBMIT_CM from a UNIX command line

TESTING JOB.FUN

The actual execution takes place in QSUB_RUN_CM around line 39ff. This code is copied here to test execution of job.fun. Since job.noutputs equals 1 in our example, the else clause of the if statement will be executed.

    if job.noutputs == 0,
        % evaluate job function - no output
        job.fun(job.job{:});
    else
        % evaluate job function - capture output
        out.out = cell(1,job.noutputs);
        [out.out{:}] = deal(job.fun(job.job{:}));
    end

The variable out should now contain a field out, which is a cell array with one member:

disp(out)
    out: {[100x100 double]}
    err: {}

It should contain the output argument of sin(rand(100)):

disp(isequalwithequalnans(out.out{1}, sin(job.job{1})))
     1

TESTING QSUB_RUN_CM

To test QSUB_RUN_CM, the job has to be saved to a .mat file. This file does not need to have a .mat extension. Note that the fields of the job variable are saved as individual variables in this file:

jobfilename = '/tmp/testjob.in';
save(jobfilename, '-struct','job');

QSUB_RUN_CM expects three filenames as input - jobfilename, outfilename and flagfilename. The first one is the existing file containing the job description and inputs. The other files will be created by QSUB_RUN_CM.

outfilename  = '/tmp/testjob.out';
flagfilename = '/tmp/testjob.flag';

Note: Running QSUB_RUN_CM will quit your MATLAB session after the job has finished! The command to run would be: qsub_run_cm(jobfilename, outfilename, flagfilename); After running QSUB_RUN_CM and restarting MATLAB, the results can be loaded:

out = load(outfilename, '-mat');

disp(out)
    out: {[100x100 double]}
    err: {}

If everything went well, out.out should contain a cell array of output arguments computed by job.fun. If some error occured, out.err contains an MException object with an error description.

TESTING MATLAB INVOCATION

The next step is to test the script which will be created by QSUB_SUBMIT_CM to run MATLAB. The MATLAB command line is constructed in QSUB_SUBMIT_CM at line 91f

    runpath = fileparts(which('qsub_run_cm'));
    mlcmd   = sprintf(['%s -nodisplay -r ' ...
                       '"addpath(''%s'');qsub_run_cm(''%%s'',''%%s'',''%%s'');"'], fullfile(matlabroot,'bin','matlab'), runpath);

This command contains placeholders for the three filename arguments of QSUB_RUN_CM. The actual shell script is created in QSUB_SUBMIT_CM around line 116ff

    % Create executable shell script to run the job
    scriptname  = [jobfilename '.sh'];
    fid = fopen(scriptname,'w');
    fprintf(fid, '#!/bin/sh\n');
    fprintf(fid, mlcmd, jobfilename, outfilename, flagfilename);
    fclose(fid);
    fileattrib(scriptname, '+x');

This script can then be executed from a Linux/UNIX command line. It should open a MATLAB session, run QSUB_RUN_CM and close MATLAB again. Note that the correct invocation and shell may differ from this example, depending on the shells available on the system.

[sts, termout] = unix(sprintf('. %s',scriptname))
sts =

     0


termout =

/usr/local/matlab2009a/bin/matlab: 674: shopt: not found

                                                    < M A T L A B (R) >
                                          Copyright 1984-2009 The MathWorks, Inc.
                                        Version 7.8.0.347 (R2009a) 64-bit (glnxa64)
                                                     February 12, 2009

Warning: Duplicate directory name: /home/volkmar/matlab. 
Warning: Duplicate directory name: /usr/local/matlab2009a/toolbox/local. 
Warning: Duplicate directory name: /home/volkmar/matlab. 
 
  To get started, type one of these: helpwin, helpdesk, or demo.
  For product information, visit www.mathworks.com.
 


This script should be ready to be submitted to qsub as well:

[sts, qsubout] = unix(sprintf('qsub %s',scriptname))
sts =

     0


qsubout =

Your job 637 ("testjob.in.sh") has been submitted


JOB SUBMISSION THROUGH QSUB_SUBMIT_CM

QSUB_SUBMIT_CM is the interface through which jobs are actually submitted. It saves the input data .mat file, creates the script command and invokes qsub to submit the job. In addition, it can also start a MATLAB timer object to supervise the job and retrieve computation results. A user specified callback has to be provided to use this feature. The simplest callback would be to display the computed results.

jobname  = 'testjob';
jobdir   = '/tmp'; % This must be a rw folder on a shared network drive
finishcb = @disp;

qsub_submit_cm(job, jobdir, jobname, finishcb)

EVALUATION OF RESULTS IN FINISHCB

The job timer will be monitored by QSUB_CHECK_FINISH. When the job has finished, this function will try to load the job output file and pass its contents on to the specified callback. If the computation was successful, finishcb will be called with a cell array containing the output(s) of the computation. If the computation failed, finishcb will be called with an MException object describing the reason for failure. An simple callback could rethrow an exception or display the output:

 function finishcb(out)
if isa(out, 'MException')
rethrow(out)
else
disp(out)
end