Parallel Computing Toolbox 

Handling Large Data Sets with Distributed Arrays and Parallel MATLAB Functions

For MATLAB algorithms that require processing of large data sets, Parallel Computing Toolbox provides distributed arrays and parallel functions.

Distributed arrays are special arrays that store segments of data on MATLAB workers that are participating in a parallel computation. Because distributed arrays store data on many workers, you can handle larger data sets than you can on a single MATLAB session. Distributed array data that resides on workers can be manipulated from your client MATLAB session; MATLAB automatically transmits operating instructions to workers that then work collaboratively on the arrays.

You can construct distributed arrays in several ways:

  • Using constructor functions such as rand, ones, and zeros
  • Concatenating arrays with same name but different data on different labs
  • Distributing a large matrix

More than 150 MATLAB functions, including linear algebra operations, are supported directly for distributed arrays enabling you to quickly develop and prototype your parallel applications without having to write low-level message passing programs. Some of the linear algebra functions are based on ScaLAPACK library (Scalable LAPACK library designed for computer clusters). Refer to the documentation on Using MATLAB Functions on Distributed Arrays for the full list of supported functions.

You also have the ability to explicitly annotate sections of your code using the spmd (single program multiple data) construct for parallel execution on several workers. When MATLAB encounters the spmd statement, it sends all commands contained between spmd-end for execution on MATLAB workers. MATLAB ensures the transfer of necessary variables from your workspace to that of the workers. Variables that are in the workspace of the MATLAB workers can be manipulated from the MATLAB client, while the data contained in these variables continues to live remotely in the workspaces of the MATLAB workers.

Note how the example above can be recast using spmd-end in the example below. For details on relationship between Composite, codistributor1d, and distributed objects and the spmd-end block, see the Single Program Multiple Data section in the Parallel Computing Toolbox documentation.

In general, very few changes are required to convert your serial program to a parallel program. Note that in the NAS Conjugate Gradient Benchmark example below, the only requirement for changing between serial and parallel code is the declaration of distributed arrays. Other operations work transparently on distributed arrays.

See the Working with Distributed Arrays section in the Parallel Computing Toolbox documentation for more information.

function CG
%CG     NAS Conjugate Gradient benchmark
%   This function is similar to the NAS CG benchmark described in:
%   See code on page 19-20 for the pseudo code translated to m below.
n = 1400; nonzer = 7; lambda = 20; niter = 15;
nz = n * (nonzer + 1) * (nonzer + 1) + n * (nonzer + 2);
% Create A, a distributed sparse random matrix

A = distributed.sprand(n, n, 0.5 * nz/n^2;
A = 0.5 * (A + A');
% I is a distributed sparse identity matrix
I = distributed.speye(n); 
A = A - lambda * I;             % Shift at lambda, an approx eigenvalue
% x is a vector of ones
x = ones(n, 1);  
%% Outer iteration: exactly same code will work for distributed and
%% non-distributed arrays. 
for iter = 1 : niter
   [z, rnorm] = cgit(A, x);     % Conjugate Gradient iteration. 
   zeta = lambda + 1/(x' * z);
   x = z / norm(z);
%% ------------------------------------------------------------ %
function [z, rnorm] = cgit(A, x)
%CGIT  Conjugate Gradient iteration
%   [z, rnorm] = cgit(A, x) generates approximate solution to A*z = x.
z = zeros(size(x));
r = x;
rho = r' * r;
p = r;
for i = 1 : 15
   q = A * p;
   alpha = rho / (p' * q);
   z = z + alpha * p;
   rho0 = rho;
   r = r - alpha * q;
   rho = r' * r;
   beta = rho/rho0;
   p = r + beta * p;

rnorm = norm(x - A * z);

Working with distributed arrays: Implementation of the NAS Conjugate Gradient Benchmark. The declaration of distributed arrays, highlighted, is the only change required to convert the serial program to a parallel program.