Skip to Main Content Skip to Search
Product Documentation

Quick Start Parallel Computing for Statistics Toolbox

What Is Parallel Statistics Functionality?

You can use any of the Statistics Toolbox functions with Parallel Computing Toolbox constructs such as parfor and spmd. However, some functions, such as those with interactive displays, can lose functionality in parallel. In particular, displays and interactive usage are not effective on workers (see Vocabulary for Parallel Computation).

Additionally, the following functions are enhanced to use parallel computing internally. These functions use parfor internally to parallelize calculations.

This chapter gives the simplest way to use these enhanced functions in parallel. For more advanced topics, including the issues of reproducibility and nested parfor loops, see the other sections in this chapter.

For information on parallel statistical computing at the command line, enter

help parallelstats

How To Compute in Parallel

To have a function compute in parallel:

  1. Open matlabpool

  2. Set the UseParallel Option to 'always'

  3. Call the Function Using the Options Structure

Open matlabpool

To run a statistical computation in parallel, first set up a parallel environment.

Multicore.   For a multicore machine, enter the following at the MATLAB command line:

matlabpool open n

n is the number of workers you want to use.

Network.  If you have multiple processors on a network, use Parallel Computing Toolbox functions and MATLAB Distributed Computing Server™ software to establish parallel computation. Make sure that your system is configured properly for parallel computing. Check with your system administrator, or refer to the Parallel Computing Toolbox documentation, or the Administrator Guide documentation for MATLAB Distributed Computing Server.

Many parallel statistical functions call a function that can be one you define in a file. For example, jackknife calls a function (jackfun) that can be a built-in MATLAB function such as corr, but can also be a function you define. Built-in functions are available to all workers. However, you must take extra steps to enable workers to access a function file that you define.

To place a function file on the path of all workers, and check that it is accessible:

  1. At the command line, enter

    matlabpool open conf

    or

    matlabpool open conf n

    where conf is your configuration, and n is the number of processors you want to use.

  2. If network_file_path is the network path to your function file, enter

    pctRunOnAll('addpath network_file_path')

    so the worker processors can access your function file.

  3. Check whether the file is on the path of every worker by entering:

    pctRunOnAll('which filename')

    If any worker does not have a path to the file, it reports:

    filename not found.

Set the UseParallel Option to 'always'

Create an options structure with the statset function. To run in parallel, set the UseParallel option to 'always':

paroptions = statset('UseParallel','always');

Call the Function Using the Options Structure

Call your function with syntax that uses the options structure. For example:

% Run crossval in parallel
cvMse = crossval('mse',x,y,'predfun',regf,'Options',paroptions);

% Run bootstrp in parallel
sts = bootstrp(100,@(x)[mean(x) std(x)],y,'Options',paroptions);

% Run TreeBagger in parallel
b = TreeBagger(50,meas,spec,'OOBPred','on','Options',paroptions);

For more complete examples of parallel statistical functions, see Example: Parallel Treebagger and Examples of Parallel Statistical Functions.

After you have finished computing in parallel, close the parallel environment:

matlabpool close

Example: Parallel Treebagger

To run the example Workflow Example: Regression of Insurance Risk Rating for Car Imports with TreeBagger in parallel:

  1. Set up the parallel environment to use two cores:

    matlabpool open 2
    
    Starting matlabpool using the 'local' configuration ...
     connected to 2 labs.
  2. Set the options to use parallel processing:

    paroptions = statset('UseParallel','always');
  3. Load the problem data and separate it into input and response:

    load imports-85;
    Y = X(:,1);
    X = X(:,2:end);
  4. Estimate feature importance using leaf size 1 and 1000 trees in parallel. Time the function for comparison purposes:

    tic
    b = TreeBagger(1000,X,Y,'Method','r','OOBVarImp','on',...
        'cat',16:25,'MinLeaf',1,'Options',paroptions);
    toc
    
    Elapsed time is 37.357930 seconds.
  5. Perform the same computation in serial for timing comparison:

    tic
    b = TreeBagger(1000,X,Y,'Method','r','OOBVarImp','on',...
        'cat',16:25,'MinLeaf',1); % No options gives serial
    toc
    
    Elapsed time is 63.921864 seconds.

    Computing in parallel took less than 60% of the time of computing serially.

  


 © 1984-2012- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS