A *reproducible* computation is one that
gives the same results every time it runs. Reproducibility is important
for:

Debugging — To correct an anomalous result, you need to reproduce the result.

Confidence — When you can reproduce results, you can investigate and understand them.

Modifying existing code — When you change existing code, you want to ensure that you do not break anything.

Generally, you do not need to ensure reproducibility for your
computation. Often, when you want reproducibility, the simplest technique
is to run in serial instead of in parallel. In serial computation
you can simply call the `rng`

function
as follows:

s = rng % Obtain the current state of the random stream % run the statistical function rng(s) % Reset the stream to the previous state % run the statistical function again, obtain identical results

This section addresses the case when your function uses random numbers, and you want reproducible results in parallel. This section also addresses the case when you want the same results in parallel as in serial.

To run a Statistics and Machine Learning Toolbox™ function reproducibly:

Set the

`UseSubstreams`

option to`true`

.Set the

`Streams`

option to a type that supports substreams:`'mlfg6331_64'`

or`'mrg32k3a'`

. For information on these streams, see Choosing a Random Number Generator in the MATLAB^{®}Mathematics documentation.To compute in parallel, set the

`UseParallel`

option to`true`

.Call the function with the options structure.

To reproduce the computation, reset the stream, then call the function again.

To understand why this technique gives reproducibility, see How Substreams Enable Reproducible Parallel Computations.

For example, to use the `'mlfg6331_64'`

stream
for reproducible computation:

Create an appropriate options structure:

s = RandStream('mlfg6331_64'); options = statset('UseParallel',true, ... 'Streams',s,'UseSubstreams',true);

Run your parallel computation. For instructions, see Quick Start Parallel Computing for Statistics and Machine Learning Toolbox.

Reset the random stream:

reset(s);

Rerun your parallel computation. You obtain identical results.

For an example of a parallel computation run this reproducible way, see Reproducible Parallel Bootstrap.

A *substream* is a portion of a random
stream that `RandStream`

can access quickly. There
is a number `M`

such that for any positive integer `k`

, `RandStream`

can
go the `kM`

th pseudorandom number in the stream.
From that point, `RandStream`

can generate the subsequent
entries in the stream. Currently, `RandStream`

has `M`

= 2^{72}, about
5e21, or more.

The entries in different substreams have good statistical properties,
similar to the properties of entries in a single stream: independence,
and lack of *k*-way correlation at various lags.
The substreams are so long that you can view the substreams as being
independent streams, as in the following picture.

Two `RandStream`

stream types support substreams: `'mlfg6331_64'`

and `'mrg32k3a'`

.

When MATLAB performs computations in parallel with `parfor`

,
each worker receives loop iterations in an unpredictable order. Therefore,
you cannot predict which worker gets which iteration, so cannot determine
the random numbers associated with each iteration.

Substreams allow MATLAB to tie each iteration to a particular
sequence of random numbers. `parfor`

gives each
iteration an index. The iteration uses the index as the substream
number. Since the random numbers are associated with the iterations,
not with the workers, the entire computation is reproducible.

To obtain reproducible results, simply reset the stream, and all the substreams generate identical random numbers when called again. This method succeeds when all the workers use the same stream, and the stream supports substreams. This concludes the discussion of how the procedure in Running Reproducible Parallel Computations gives reproducible parallel results.

A few functions generate random numbers on the client before distributing them to parallel workers. The workers do not use random numbers, so operate purely deterministically. For these functions, you can run a parallel computation reproducibly using any random stream type.

The functions that operate this way include:

To obtain identical results, reset the random stream on the client, or the random stream you pass to the client. For example:

s = rng % Obtain the current state of the random stream % run the statistical function rng(s) % Reset the stream to the previous state % run the statistical function again, obtain identical results

While this method enables you to run reproducibly in parallel,
the results can differ from a serial computation. The reason for the
difference is `parfor`

loops run in reverse order
from `for`

loops. Therefore, a serial computation
can generate random numbers in a different order than a parallel computation.
For unequivocal reproducibility, use the technique in Running Reproducible Parallel Computations.

For testing or comparison using particular random number algorithms, you must set the random number generators. How do you set these generators in parallel, or initialize streams on each worker in a particular way? Or you might want to run a computation using a different sequence of random numbers than any other you have run. How can you ensure the sequence you use is statistically independent?

Parallel Statistics and Machine Learning Toolbox functions allow you to set random
streams on each worker explicitly. For information on *creating* multiple
streams, enter `help RandStream/create`

at
the command line. To create four independent streams using the `'mrg32k3a'`

generator:

s = RandStream.create('mrg32k3a','NumStreams',4,... 'CellOutput',true);

Pass these streams to a statistical function using the `Streams`

option.
For example:

parpool(4) % if you have at least 4 cores s = RandStream.create('mrg32k3a','NumStreams',4,... 'CellOutput',true); % create 4 independent streams paroptions = statset('UseParallel',true,... 'Streams',s); % set the 4 different streams x = [randn(700,1); 4 + 2*randn(300,1)]; latt = -4:0.01:12; myfun = @(X) ksdensity(X,latt); pdfestimate = myfun(x); B = bootstrp(200,myfun,x,'Options',paroptions);

This method of distributing streams gives each worker a different stream for the computation. However, it does not allow for a reproducible computation, because the workers perform the 200 bootstraps in an unpredictable order. If you want to perform a reproducible computation, use substreams as described in Running Reproducible Parallel Computations.

If you set the `UseSubstreams`

option to `true`

,
then set the `Streams`

option to a single random
stream of the type that supports substreams (`'mlfg6331_64'`

or `'mrg32k3a'`

).
This setting gives reproducible computations.

Was this topic helpful?