# Documentation

## Quick Start Parallel Computing for Statistics and Machine Learning Toolbox

 Note:   To use parallel computing as described in this chapter, you must have a Parallel Computing Toolbox™ license.

### What Is Parallel Statistics Functionality?

You can use any of the Statistics and Machine Learning Toolbox™ functions with Parallel Computing Toolbox constructs such as parfor and spmd. However, some functions, such as those with interactive displays, can lose functionality in parallel. In particular, displays and interactive usage are not effective on workers (see Vocabulary for Parallel Computation).

Additionally, the following functions are enhanced to use parallel computing internally. These functions use parfor internally to parallelize calculations.

The following functions for fitting multiclass models for support vector machines and other classifiers are also enhanced to use parallel computing internally.

This chapter gives the simplest way to use these enhanced functions in parallel. For more advanced topics, including the issues of reproducibility and nested parfor loops, see the other sections in this chapter.

For information on parallel statistical computing at the command line, enter

help parallelstats

### How To Compute in Parallel

To have a function compute in parallel:

#### Set Up a Parallel Environment

To run a statistical computation in parallel, first set up a parallel environment.

 Note:   Setting up a parallel environment can take several seconds.

For a multicore machine, enter the following at the MATLAB® command line:

parpool(n)

n is the number of workers you want to use.

#### Set the UseParallel Option to true

Create an options structure with the statset function. To run in parallel, set the UseParallel option to true:

paroptions = statset('UseParallel',true);

#### Call the Function Using the Options Structure

Call your function with syntax that uses the options structure. For example:

% Run crossval in parallel
cvMse = crossval('mse',x,y,'predfun',regf,'Options',paroptions);

% Run bootstrp in parallel
sts = bootstrp(100,@(x)[mean(x) std(x)],y,'Options',paroptions);

% Run TreeBagger in parallel
b = TreeBagger(50,meas,spec,'OOBPred','on','Options',paroptions);

For more complete examples of parallel statistical functions, see Parallel Treebagger and Examples of Parallel Statistical Functions.

After you have finished computing in parallel, close the parallel environment:

delete mypool
 Tip   To save time, keep the pool open if you expect to compute in parallel again soon.

### Parallel Treebagger

To run the example Regression of Insurance Risk Rating for Car Imports Using TreeBagger in parallel:

1. Set up the parallel environment to use two cores:

mypool = parpool(2)
Starting parpool using the 'local' profile ... connected to 2 workers.

mypool =

Pool with properties:

AttachedFiles: {0x1 cell}
NumWorkers: 2
Cluster: [1x1 parallel.cluster.Local]
SpmdEnabled: 1
2. Set the options to use parallel processing:

paroptions = statset('UseParallel',true);
3. Load the problem data and separate it into input and response:

Y = X(:,1);
X = X(:,2:end);
4. Estimate feature importance using leaf size 1 and 1000 trees in parallel. Time the function for comparison purposes:

tic
b = TreeBagger(1000,X,Y,'Method','r','OOBVarImp','on',...
'cat',16:25,'MinLeaf',1,'Options',paroptions);
toc

Elapsed time is 16.696336 seconds.
5. Perform the same computation in serial for timing comparison:

tic
b = TreeBagger(1000,X,Y,'Method','r','OOBVarImp','on',...
'cat',16:25,'MinLeaf',1); % No options gives serial
toc

Elapsed time is 21.747950 seconds.

Computing in parallel took about 75% of the time of computing serially.