Message Passing used in Calculating the Block Average of a Distributed Matrix
In parallel computing, communication between workers extracts a heavy cost and therefore a rule of thumb is to keep processing local. But certain array operations, such as averaging over a window, require access to neighbors of an array element which may reside in the memory of neighboring workers. In this case communication becomes necessary. However, instead of repeatedly communicating with neighbors we can exploit the symmetry in element access and exchange the necessary elements in one go.
In this exercise, we will sample a large image to get a smaller single image that can fit on our local machine. We will subsample and average the large image and visually compare the results.
We will use a distributed array to hold the data in the memory of the participating workers. Since we wish to perform subsampling and averaging and want to keep processing local, we will retrieve elements from neighboring workers in advance using the haloNodes function and construct a new larger array that holds the local elements of the distributed array in addition to the elements from the neighbors. We can then perform the operations on each worker independently and concatenate the results later.
Contents
Open matlabpool
if matlabpool('size') == 0 matlabpool open 4 else warning('MATLAB:poolOpen', 'matlabpool already open') end
Destroying 1 pre-existing parallel job(s) created by matlabpool that were in the finished or failed state. Starting matlabpool using the parallel configuration 'local'. Waiting for parallel job to start... Connected to a matlabpool session with 4 labs.
clear all n = 5000; % original size of the surface, n-x-n distDim = 2; % dimension distributing over
Create the surface to sample
spmd % Creating the surface to sample origSurf = createSurface(n,distDim); % creates a square matrix, n x n sizeFull = size(origSurf); end
Averaging, every sampleRate points
sampleRate = 100; spmd % uses message passing functionality to get necessary overlapping nodes a = haloNodes(origSurf,sampleRate); a = sampleFunctionAvg(a,sampleRate,sizeFull,distDim); a = codistributed(a,codistributor('1d', distDim)); a = gather(a,1); end
Subsampling
spmd b = localPart(origSurf); b = sampleFunctionNoAvg(b,sampleRate,sizeFull,distDim); b = codistributed(b,codistributor('1d', distDim)); b = gather(b,1); end
Compare Averaging vs Subsampling
subplot(1,2,1);
imagesc(a{1});
title('Averaging every sampleRate Points');
subplot(1,2,2);
imagesc(b{1});
title('Subsampling every sampleRate Points');
Close matlabpool
if matlabpool('size') ~= 0 matlabpool close else warning('MATLAB:poolClose', 'matlabpool already closed') end
Sending a stop signal to all the labs... Waiting for parallel job to finish... Performing parallel job cleanup... Done.