Working with Distributed Arrays

How MATLAB® Software Distributes Arrays

When you distribute an array to a number of labs, MATLAB® software partitions the array into segments and assigns one segment of the array to each lab. You can partition a two-dimensional array horizontally, assigning columns of the original array to the different labs, or vertically, by assigning rows. An array with N dimensions can be partitioned along any of its N dimensions. You choose which dimension of the array is to be partitioned by specifying it in the array constructor command.

For example, to distribute an 80-by-1000 array to four labs, you can partition it either by columns, giving each lab an 80-by-250 segment, or by rows, with each lab getting a 20-by-1000 segment. If the array dimension does not divide evenly over the number of labs, MATLAB partitions it as evenly as possible.

The following example creates an 80-by-1000 replicated array and assigns it to variable A. In doing so, each lab creates an identical array in its own workspace and assigns it to variable A, where A is local to that lab. The second command distributes A, creating a single 80-by-1000 array D that spans all four labs. lab 1 stores columns 1 through 250, lab 2 stores columns 251 through 500, and so on. The default distribution is by the last nonsingleton dimension, thus, columns in this case of a 2-dimesional array.

A = zeros(80, 1000);
D = distributed(A, 'convert')
  1: localPart(D) is 80-by-250
  2: localPart(D) is 80-by-250
  3: localPart(D) is 80-by-250
  4: localPart(D) is 80-by-250

Each lab has access to all segments of the array. Access to the local segment is faster than to a remote segment, because the latter requires sending and receiving data between labs and thus takes more time.

How MATLAB® Displays a Distributed Array

The MATLAB Parallel Command Window displays the local segments of a distributed array in tabbed or tiled windows for each lab, or combined into one display as shown below. Each lab displays that part of the array that is stored in its workspace. This part of the array is said to be local to that lab. The lab index appears at the left.

1: localPart(D) =
1:     11    12
1:     21    22
1:     31    32
1:     41    42
2: localPart(D) =
2:     13    14
2:     23    24
2:     33    34
2:     43    44

When displaying larger distributed arrays, MATLAB prints out only the sizes of the local segments.

1: localPart(D) is 4-by-250
2: localPart(D) is 4-by-250
3: localPart(D) is 4-by-250
4: localPart(D) is 4-by-250

How Much Is Distributed to Each Lab

In distributing an array of N rows, if N is evenly divisible by the number of labs, MATLAB stores the same number of rows (N/numlabs) on each lab. When this number is not evenly divisible by the number of labs, MATLAB partitions the array as evenly as possible.

MATLAB provides a functions called distributionDimension and distributionPartition that you can use to determine the exact distribution of an array. See dcolon Indexing Function for more information on dcolon.

Distribution of Other Data Types

You can distribute arrays of any MATLAB built-in data type, and also numeric arrays that are complex or sparse, but not arrays of function handles or object types.

Creating a Distributed Array

You can create a distributed array in any of the following ways:

Partitioning a Larger Array

If you have a large array already in memory that you want MATLAB to process more quickly, you can partition it into smaller segments and distribute these segments to all of the labs using the distributed function. Each lab then has an array that is a fraction the size of the original, thus reducing the time required to access the data that is local to each lab.

As a simple example, the following line of code creates a 4-by-8 replicated matrix on each lab assigned to the variable A:

P>> A = [11:18; 21:28; 31:38; 41:48]
A =
    11    12    13    14    15    16    17    18
    21    22    23    24    25    26    27    28
    31    32    33    34    35    36    37    38
    41    42    43    44    45    46    47    48

The next line uses the distributed function to construct a single 4-by-8 matrix D that is distributed along the second dimension of the array:

P>> D = distributed(A, 'convert')
1: localPart(D)| 2: localPart(D)| 3: localPart(D)| 4: localPart(D)
    11    12   |     13    14   |     15    16   |     17    18
    21    22   |     23    24   |     25    26   |     27    28
    31    32   |     33    34   |     35    36   |     37    38
    41    42   |     43    44   |     45    46   |     47    48

Note that arrays A and D are the same size (4-by-8). Array A exists in its full size on each lab, while only a segment of array D exists on each lab.

P>> whos
  Name      Size          Bytes  Class

  A         4x8             256  double
  D         4x8             460  distributed

See the distributed function reference page for syntax and usage information.

Building from Smaller Arrays

The distributed function is less useful for reducing the amount of memory required to store data when you first construct the full array in one workspace and then partition it into distributed segments. To save on memory, you can construct the smaller pieces (local part) on each lab first, and then combine them into a single array that is distributed across the labs.

This example creates a 4-by-250 variant array A on each of four labs and then uses distributor to distribute these segments across four labs, creating a 4-by-1000 distributed array. Here is the variant array, A:

P>> A = [1:250; 251:500; 501:750; 751:1000] + 250 * (labindex - 1);

     LAB 1       |       LAB 2                LAB 3
  1    2 ... 250 |  251   252 ... 500 |  501   502 ... 750 | etc.
251  252 ... 500 |  501   502 ... 750 |  751   752 ...1000 | etc.
501  502 ... 750 |  751   752 ...1000 | 1001  1002 ...1250 | etc.
751  752 ...1000 | 1001  1002 ...1250 | 1251  1252 ...1500 | etc.
                 |                    |                    |

Now combine these segments into an array that is distributed across the first (or vertical) dimension. The array is now 16-by-250, with a 4-by-250 segment residing on each lab:

P>> D = distributed(A, distributor('1d',1)
1: localPart(D) is 4-by-250
2: localPart(D) is 4-by-250
3: localPart(D) is 4-by-250
4: localPart(D) is 4-by-250

P>> whos
  Name          Size          Bytes  Class

  A             4x250          8000  double
  D            16x250          8396  distributed

You could also use replicated arrays in the same fashion, if you wanted to create a distributed array whose segments were all identical to start with. See the distributed function reference page for syntax and usage information.

Using MATLAB® Constructor Functions

MATLAB provides several array constructor functions that you can use to build distributed arrays of specific values, sizes, and classes. These functions operate in the same way as their nondistributed counterparts in the MATLAB language, except that they distribute the resultant array across the labs using the specified distributor, dist.

Constructor Functions.   The distributed constructor functions are listed here. Use the dist argument (created by the distributor function: dist=distributor()) to specify over which dimension to distribute the array. See the individual reference pages for these functions for further syntax and usage information.

cell(m, n, ..., dist)
eye(m, ..., classname, dist)
false(m, n, ..., dist)
Inf(m, n, ..., classname, dist)
NaN(m, n, ..., classname, dist)
ones(m, n, ..., classname, dist)
rand(m, n, ..., dist)
randn(m, n, ..., dist)
sparse(m, n, dist)
speye(m, ..., dist)
sprand(m, n, density, dist)
sprandn(m, n, density, dist)
true(m, n, ..., dist)
zeros(m, n, ..., classname, dist)

Local Arrays

That part of a distributed array that resides on each lab is a piece of a larger array. Each lab can work on its own segment of the common array, or it can make a copy of that segment in a variant or private array of its own. This local copy of a distributed array segment is called a local array.

Creating Local Arrays from a Distributed Array

The localPart function copies the segments of a distributed array to a separate variant array. This example makes a local copy L of each segment of distributed array D. The size of L shows that it contains only the local part of D for each lab. Suppose you distribute an array across four labs:

P>> A = [1:80; 81:160; 161:240];
P>> D = distributed(A, 'convert');

P>> size(D)
ans =
     3    80

P>> L = localPart(D);
P>> size(L)
ans =
     3    20

Each lab recognizes that the distributed array D is 3-by-80. However, notice that the size of the local part, L, is 3-by-20 on each lab, because the 80 columns of D are distributed over four labs.

Creating a Distributed from Local Arrays

Use the distributed function to perform the reverse operation. This function, described in Building from Smaller Arrays, combines the local variant arrays into a single array distributed along the specified dimension.

Continuing the previous example, take the local variant arrays L and put them together as segments of a new distributed array X.

P>> X = distributed(L);
P>> size(X)
ans =
     3    80

Obtaining Information About the Array

MATLAB offers several functions that provide information on any particular array. In addition to these standard functions, there are also two functions that are useful solely with distributed arrays.

Determining Whether an Array Is Distributed

The isdistributed function returns a logical 1 (true) if the input array is distributed, and logical 0 (false) otherwise. The syntax is

P>> TF = isdistributed(D)

where D is any MATLAB array.

Determining the Dimension of Distribution

The distributionDimension function returns a number that represents the dimension of distribution of a distributed array, and the distributionPartition function returns a vector that describes how the array is partitioned along its dimension of distribution.

The syntax is

P>> distributionDimension(distributor(D))
P>> distributionPartition(distributor(D))

where D is any distributed array. For a 250-by-10 matrix distributed across four labs by columns,

P>> D = ones(250, 10, distributor())
  1: localPart(D) is 250-by-3
  2: localPart(D) is 250-by-3
  3: localPart(D) is 250-by-2
  4: localPart(D) is 250-by-2
P>> dim = distributionDimension(distributor(D))
  1: dim = 2
P>> part = distributionPartition(distributor(D))
  1: part = [3 3 2 2]

The distributionDimension(distributor(D)) value of 2 means the array is distributed by columns (dimension 2); and the distributionPartition(distributor(D)) value of [3 3 2 2] means that the first three columns reside in the lab 1, the next three columns in lab 2, the next two columns in lab 3, and the final two columns in lab 4.

Other Array Functions

Other functions that provide information about standard arrays also work on distributed arrays and use the same syntax.

numel Not Supported on Distributed Arrays.   For a distributed array, the numel function does not return the number of elements, but instead always returns a value of 1.

Changing the Dimension of Distribution

When constructing an array, you distribute the parts of the array along one of the array's dimensions. You can change the direction of this distribution on an existing array using the distributed function.

Construct an 8-by-16 distributed array D of random values having distributed columns:

P>> D = rand(8, 16, distributor());

P>> size(localPart(D))
ans =
     8     4

Create a new array from this one that has distributed rows:

P>> X = redistribute(D, distributor('1d', 1));

P>> size(localPart(X))
ans =
     2    16

Restoring the Full Array

You can restore a distributed array to its undistributed form using the gather function. gather takes the segments of an array that reside on different labs and combines them into a replicated array on all labs, or into a single array on one lab.

Distribute a 4-by-10 array to four labs along the second dimension:

P>> A = [11:20; 21:30; 31:40; 41:50]
A =
    11    12    13    14    15    16    17    18    19    20
    21    22    23    24    25    26    27    28    29    30
    31    32    33    34    35    36    37    38    39    40
    41    42    43    44    45    46    47    48    49    50

P>> D = distributed(A, 'convert')
        Lab 1     |     Lab 2     |   Lab 3   |   Lab 4
    11   12   13  | 14   15   16  |  17   18  |  19    20
    21   22   23  | 24   25   26  |  27   28  |  29    30
    31   32   33  | 34   35   36  |  37   38  |  39    40
    41   42   43  | 44   45   46  |  47   48  |  49    50
                  |               |           |
P>> size(localPart(D))
  1: ans =
  1:      4     3
  2: ans =
  2:      4     3
  3: ans =
  3:      4     2
  4: ans =
  4:      4     2

Restore the undistributed segments to the full array form by gathering the segments:

P>> X = gather(D)
X =
    11    12    13    14    15    16    17    18    19    20
    21    22    23    24    25    26    27    28    29    30
    31    32    33    34    35    36    37    38    39    40
    41    42    43    44    45    46    47    48    49    50

P>> size(X)
ans =
     4    10

Indexing into a Distributed Array

While indexing into a nondistributed array is fairly straightforward, distributed arrays require additional considerations. Each dimension of a nondistributed array is indexed within a range of 1 to the final subscript, which is represented in MATLAB by the end keyword. The length of any dimension can be easily determined using either the size or length function.

With distributed arrays, these values are not so easily obtained. For example, the second segment of an array (that which resides in the workspace of lab 2) has a starting index that depends on the array distribution. For a 200-by-1000 array with a default distribution by columns over four labs, the starting index on lab 2 is 251. For a 1000-by-200 array also distributed by columns, that same index would be 51. As for the ending index, this is not given by using the end keyword, as end in this case refers to the end of the entire array; that is, the last subscript of the final segment. The length of each segment is also not given by using the length or size functions, as they only return the length of the entire array.

The MATLAB colon operator and end keyword are two of the basic tools for indexing into nondistributed arrays. For distributed arrays, MATLAB provides a distributed version of the colon operator, called dcolon. This actually is a function, not a symbolic operator like colon.

dcolon Indexing Function

The dcolon function returns a distributed vector of length L that maps the subscripts of an equivalent array residing on the same lab configuration. An equivalent array is an array for which the distributed dimension is also of length L. For example, the subscripts of a 50-element dcolon vector are as follows:

 [1:13]  for Lab 1
[14:26]  for Lab 2
[27:38]  for Lab 3
[39:50]  for Lab 4

This vector shows how MATLAB would distribute 50 rows, columns, or any dimension of an array in a configuration having the same number of labs (four in this case). A 50-row, 10-column array, for example, with the rows distributed over four labs

P>> D = rand(50, 10, distributor('1d',1));

will have rows 1 through 13 stored on lab 1, rows 14 through 26 on lab 2, rows 27 through 38 on lab 3, and rows 39 through 50 on lab 4.

The syntax for dcolon is as follows. The step input argument is optional:

P>> V = dcolon(first, step, last)

Inputs to dcolon are shown below. Each input must be a real scalar integer value.

Input ArgumentDescription
firstNumber of the first subscript in this dimension.
stepSize of the interval between numbers in the generated sequence. Optional; the default is 1.
lastNumber of the last subscript in this dimension.

To use dcolon to index into the 50-by-10 distributed array in the previous example, first generate the vector V that shows how the 50-row dimension is partitioned. Then you can use the elements of this vector to derive the range of rows that apply to particular segments of the array. For example:

P>> V = dcolon(1, length(D))

2-Dimensional Distribution

As an alternative to distributing by a single dimension of rows or columns, you can distribute a matrix by blocks using '2d' or two-dimensional distribution. Instead of segments that comprise a number of complete rows or columns of the matrix, the segments of the distributed array are 2-dimensional square blocks.

For example, consider a simple 8-by-8 matrix with ascending element values. You can create this array in a parallel job or in pmode:

P>> A = reshape(1:64, 8, 8)

The result is the replicated array:

     1     9    17    25    33    41    49    57

     2    10    18    26    34    42    50    58

     3    11    19    27    35    43    51    59

     4    12    20    28    36    44    52    60

     5    13    21    29    37    45    53    61

     6    14    22    30    38    46    54    62

     7    15    23    31    39    47    55    63

     8    16    24    32    40    48    56    64

Suppose you want to distribute this array among four labs, with a 4-by-4 block as the local part on each lab. In this case, the lab grid is a 2-by-2 arrangement of the labs, and the block size is a square of four elements on a side (i.e., each block is a 4-by-4 square). With this information, you can define the distributor object:

P>> DIST = distributor('2d', [2 2], 4)

Now you can use this distributor object to distribute the original matrix:

P>> AA = distributed(A, DIST, 'convert')

This distributes the array among the labs according to this scheme:

If the lab grid does not perfectly overlay the dimensions of the distributed array, you can still use '2d' distribution, which is block cyclic. In this case, you can imagine the lab grid being repeatedly overlaid in both dimensions until all the original matrix elements are included.

Using the same original 8-by-8 matrix and 2-by-2 lab grid, consider a block size of 3 instead of 4, so that 3-by-3 square blocks are distributed among the labs. The code looks like this:

P>> DIST = distributor('2d', [2 2], 3)
P>> AA = distributed(A, DIST, 'convert')

The first "row" of the lab grid is distributed to lab 1 and lab 2, but that contains only six of the eight columns of the original matrix. Therefore, the next two columns are distributed to lab 1. This process continues until all columns in the first rows are distributed. Then a similar process applies to the rows as you proceed down the matrix, as shown in the following distribution scheme:

The diagram above shows a scheme that requires four overlays of the lab grid to accommodate the entire original matrix. The following pmode session shows the code and resulting distribution of data to each of the labs:

The following points are worth noting:

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS