Create distributed array from data in client workspace


D = distributed(X)


D = distributed(X) creates a distributed array from X. X can be an array stored on the MATLAB client workspace or a datastore. D is a distributed array stored in parts on the workers of the open parallel pool.

Constructing a distributed array from local data this way is appropriate only if the MATLAB client can store the entirety of X in its memory. To construct large distributed arrays, use one of the constructor methods such as ones(___,'distributed'), zeros(___,'distributed'), etc.

If the input argument is already a distributed array, the result is the same as the input.

Use gather to retrieve the distributed array elements from the pool back to an array in the MATLAB workspace.


Create a small array and distribute it:

Nsmall = 50;
D1 = distributed(magic(Nsmall));

Create a large distributed array directly, using a build method:

Nlarge = 1000;
D2 = rand(Nlarge,'distributed');

Retrieve elements of a distributed array, and note where the arrays are located by their Class:

D3 = gather(D2);
  Name           Size           Bytes  Class

  D1            50x50             733  distributed
  D2          1000x1000           733  distributed
  D3          1000x1000       8000000  double
  Nlarge         1x1                8  double
  Nsmall         1x1                8  double

This example shows how to create and load distributed arrays using datastore. You first create a datastore using an example data set. This data set is too small to show equal partitioning of the data over the workers. To simulate a real big data set, artificially increase the size of the datastore using repmat:

files = repmat({'airlinesmall.csv'}, 10, 1);
ds = tabularTextDatastore(files);

Select the example variables:

ds.SelectedVariableNames = {'DepTime','DepDelay'};
ds.TreatAsMissing = 'NA';

Create a distributed table by reading the datastore in parallel. Partition the datastore with one partition per worker. Each worker then reads all data from the corresponding partition. The files must be in a shared location accessible from the workers.

dt = distributed(ds);
Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers.

Finally, display summary information about the distributed table:


    DepTime: 1,235,230×1 double

            min          1
            max       2505
            NaNs    23,510

    DepDelay: 1,235,230×1 double

            min      -1036
            max       1438
            NaNs    23,510


  • A distributed array is created on the workers of the existing parallel pool. If no pool exists, distributed will start a new parallel pool, unless the automatic starting of pools is disabled in your parallel preferences. If there is no parallel pool and distributed cannot start one, the result is the full array in the client workspace.

