Documentation Center

  • Trial Software
  • Product Updates

Examples of Parallel Statistical Functions

Parallel Jackknife

This example is from the jackknife function reference page, but runs in parallel.

mypool=parpool()
Starting parpool using the 'local' profile ... connected to 2 workers.

mypool = 

  Pool with properties:

    AttachedFiles: {0x1 cell}
       NumWorkers: 2
      IdleTimeout: 30
          Cluster: [1x1 parallel.cluster.Local]
     RequestQueue: [1x1 parallel.RequestQueue]
      SpmdEnabled: 1
opts = statset('UseParallel',true);
sigma = 5;
rng('default')
y = normrnd(0,sigma,100,1);
m = jackknife(@var, y,1,'Options',opts);
n = length(y);
bias = -sigma^2 / n % known bias formula
jbias = (n - 1)*(mean(m)-var(y,1)) % jackknife bias estimate

bias =

   -0.2500


jbias =

   -0.3378

This simple example is not a good candidate for parallel computation:

% How long to compute in serial?
tic;m = jackknife(@var,y,1);toc
Elapsed time is 0.022771 seconds.

% How long to compute in parallel?
tic;m = jackknife(@var,y,1,'Options',opts);toc
Elapsed time is 0.299066 seconds.

jackknife does not use random numbers, so gives the same results every time, whether run in parallel or serial.

Parallel Cross Validation

Simple Parallel Cross Validation

This example is the same as the first in the crossval function reference page, but runs in parallel.

mypool = parpool()
Starting parpool using the 'local' profile ... connected to 2 workers.

mypool = 

  Pool with properties:

    AttachedFiles: {0x1 cell}
       NumWorkers: 2
      IdleTimeout: 30
          Cluster: [1x1 parallel.cluster.Local]
     RequestQueue: [1x1 parallel.RequestQueue]
      SpmdEnabled: 1opts = statset('UseParallel',true);

load('fisheriris');
y = meas(:,1);
X = [ones(size(y,1),1),meas(:,2:4)];
regf=@(XTRAIN,ytrain,XTEST)(XTEST*regress(ytrain,XTRAIN));

cvMse = crossval('mse',X,y,'Predfun',regf,'Options',opts)

cvMse =

    0.1028

This simple example is not a good candidate for parallel computation:

% How long to compute in serial?
tic;cvMse = crossval('mse',X,y,'Predfun',regf);toc
Elapsed time is 0.073438 seconds.

% How long to compute in parallel?
tic;cvMse = crossval('mse',X,y,'Predfun',regf,...
    'Options',opts);toc
Elapsed time is 0.289585 seconds.

Reproducible Parallel Cross Validation

To run crossval in parallel in a reproducible fashion, set the options and reset the random stream appropriately (see Running Reproducible Parallel Computations).

mypool = parpool()

Starting parpool using the 'local' profile ... connected to 2 workers.

mypool = 

  Pool with properties:

    AttachedFiles: {0x1 cell}
       NumWorkers: 2
      IdleTimeout: 30
          Cluster: [1x1 parallel.cluster.Local]
     RequestQueue: [1x1 parallel.RequestQueue]
      SpmdEnabled: 1

s = RandStream('mlfg6331_64');
opts = statset('UseParallel',true,...
    'Streams',s,'UseSubstreams',true);

load('fisheriris');
y = meas(:,1);
X = [ones(size(y,1),1),meas(:,2:4)];
regf=@(XTRAIN,ytrain,XTEST)(XTEST*regress(ytrain,XTRAIN));

cvMse = crossval('mse',X,y,'Predfun',regf,'Options',opts)

cvMse =

    0.1020

Reset the stream:

reset(s)
cvMse = crossval('mse',X,y,'Predfun',regf,'Options',opts)

cvMse =

    0.1020

Parallel Bootstrap

Bootstrap in Serial and Parallel

Here is an example timing a bootstrap in parallel versus in serial. The example generates data from a mixture of two Gaussians, constructs a nonparametric estimate of the resulting data, and uses a bootstrap to get a sense of the sampling variability.

  1. Generate the data:

    % Generate a random sample of size 1000,
    % from a mixture of two Gaussian distributions 
    x = [randn(700,1); 4 + 2*randn(300,1)];
  2. Construct a nonparametric estimate of the density from the data:

    latt = -4:0.01:12;
    myfun = @(X) ksdensity(X,latt); 
    pdfestimate = myfun(x);
  3. Bootstrap the estimate to get a sense of its sampling variability. Run the bootstrap in serial for timing comparison.

    tic;B = bootstrp(200,myfun,x);toc
    
    Elapsed time is 10.878654 seconds.
  4. Run the bootstrap in parallel for timing comparison:

    mypool = parpool()
    Starting parpool using the 'local' profile ... connected to 2 workers.
    
    mypool = 
    
      Pool with properties:
    
        AttachedFiles: {0x1 cell}
           NumWorkers: 2
          IdleTimeout: 30
              Cluster: [1x1 parallel.cluster.Local]
         RequestQueue: [1x1 parallel.RequestQueue]
          SpmdEnabled: 1
    
    opt = statset('UseParallel',true);
    tic;B = bootstrp(200,myfun,x,'Options',opt);toc
    
    Elapsed time is 6.304077 seconds.

    Computing in parallel is nearly twice as fast as computing in serial for this example.

Overlay the ksdensity density estimate with the 200 bootstrapped estimates obtained in the parallel bootstrap. You can get a sense of how to assess the accuracy of the density estimate from this plot.

hold on
for i=1:size(B,1),
    plot(latt,B(i,:),'c:')
end
plot(latt,pdfestimate);
xlabel('x');ylabel('Density estimate')

Reproducible Parallel Bootstrap

To run the example in parallel in a reproducible fashion, set the options appropriately (see Running Reproducible Parallel Computations). First set up the problem and parallel environment as in Bootstrap in Serial and Parallel. Then set the options to use substreams along with a stream that supports substreams.

s = RandStream('mlfg6331_64'); % has substreams
opts = statset('UseParallel',true,...
    'Streams',s,'UseSubstreams',true);
B2 = bootstrp(200,myfun,x,'Options',opts);

To rerun the bootstrap and get the same result:

reset(s) % set the stream to initial state
B3 = bootstrp(200,myfun,x,'Options',opts);
isequal(B2,B3) % check if same results

ans =
     1
Was this topic helpful?