If you already have a coarse-grained application to perform, but you do not want to bother with the overhead of defining jobs and tasks, you can take advantage of the ease-of-use that pmode provides. Where an existing program might take hours or days to process all its independent data sets, you can shorten that time by distributing these independent computations over your cluster.
For example, suppose you have the following serial code:
results = zeros(1, numDataSets); for i = 1:numDataSets load(['\\central\myData\dataSet' int2str(i) '.mat']) results(i) = processDataSet(i); end plot(1:numDataSets, results); save \\central\myResults\today.mat results
The following changes make this code operate in parallel, either
spmd or pmode, or in a communicating
results = zeros(1, numDataSets, codistributor()); for i = drange(1:numDataSets) load(['\\central\myData\dataSet' int2str(i) '.mat']) results(i) = processDataSet(i); end res = gather(results, 1); if labindex == 1 plot(1:numDataSets, res); print -dtiff -r300 fig.tiff; save \\central\myResults\today.mat res end
Note that the length of the
and the length of the codistributed array
to match in order to index into
for drange loop. This way, no communication
is required between the workers. If
simply a replicated array, as it would have been when running the
original code in parallel, each worker would have assigned into its
results, leaving the remaining parts of
At the end,
results would have been a variant,
and without explicitly calling
there would be no way to get the total results back to one (or all)
When using the
you need to be careful that the data files are accessible to all workers
if necessary. The best practice is to use explicit paths to files
on a shared file system.
Correspondingly, when using the
you should be careful to only have one worker save to a particular
file (on a shared file system) at a time. Thus, wrapping the code
if labindex == 1 is recommended.
results is distributed across the
workers, this example uses
collect the data onto worker 1.
A worker cannot plot a visible figure, so the
for-loop over a distributed range
is executed in a communicating job, each worker performs its portion
of the loop, so that the workers are all working simultaneously. Because
of this, no communication is allowed between the workers while executing
a for-drange loop. In particular, a worker has access only to its
partition of a codistributed array. Any calculations in such a loop
that require a worker to access portions of a codistributed array
from another worker will generate an error.
To illustrate this characteristic, you can try the following example, in which one for loop works, but the other does not.
At the pmode prompt, create two codistributed arrays, one an identity matrix, the other set to zeros, distributed across four workers.
D = eye(8, 8, codistributor()) E = zeros(8, 8, codistributor())
By default, these arrays are distributed by columns; that is,
each of the four workers contains two columns of each array. If you
use these arrays in a
for-drange loop, any calculations
must be self-contained within each worker. In other words, you can
only perform calculations that are limited within each worker to the
two columns of the arrays that the workers contain.
For example, suppose you want to set each
column of array
E to some multiple of the corresponding
column of array
for j = drange(1:size(D,2)); E(:,j) = j*D(:,j); end
This statement sets the
j-th column of
j-th column of
D. In effect,
D is an identity matrix with
down the main diagonal,
E has the sequence
etc., down its main diagonal.
This works because each worker has access to the entire column
D and the entire column of
to perform the calculation, as each worker works independently and
simultaneously on two of the eight columns.
Suppose, however, that you attempt to set the values of the
E according to different columns of
for j = drange(1:size(D,2)); E(:,j) = j*D(:,j+1); end
This method fails, because when
j is 2, you
are trying to set the second column of
the third column of
D. These columns are stored
in different workers, so an error occurs, indicating that communication
between the workers is not allowed.
for-drange on a codistributed array,
the following conditions must exist:
The codistributed array uses a 1-dimensional distribution
The distribution complies with the default partition scheme.
The variable over which the
is indexing provides the array subscript for the distribution dimension.
All other subscripts can be chosen freely (and can
be taken from
for-loops over the full range of
To loop over all elements in the array, you can use
the dimension of distribution, and regular
on all other dimensions. The following example executes in an
running on a parallel pool of 4 workers:
spmd PP = zeros(6,8,12,'codistributed'); RR = rand(6,8,12,codistributor()) % Default distribution: % by third dimension, evenly across 4 workers. for ii = 1:6 for jj = 1:8 for kk = drange(1:12) PP(ii,jj,kk) = RR(ii,jj,kk) + labindex; end end end end
To view the contents of the array, type: