Code covered by the BSD License  

Highlights from
Worker Object Wrapper

5.0

5.0 | 2 ratings Rate this file 28 Downloads (last 30 days) File Size: 3.33 KB File ID: #31972

Worker Object Wrapper

by Edric Ellis

 

27 Jun 2011 (Updated 22 Jun 2012)

Simplifies managing resources such as large data within PARFOR loops and SPMD blocks

| Watch this File

File Information
Description

The WorkerObjWrapper is designed for situations where a piece of
data is needed multiple times inside the body of a PARFOR loop or
an SPMD block, and this piece of data is both expensive to
create, and does not need to be re-created multiple
times. Examples might include: database connection handles, large
arrays, and so on.

Consider a situation where each worker needs access to a large
but constant set of data. While this data set can be passed in to
the body of a PARFOR block, it does not persist there, and will
be transferred to each worker for each PARFOR block. For example:

largeData = generateLargeData( 5000 );
parfor ii = 1:20
  x(ii) = someFcn( largeData );
end
parfor ii = 1:20
  y(ii) = someFcn( largeData, x(ii) );
end

This could be simplified like so:

wrapper = WorkerObjWrapper( @generateLargeData, 5000 );
parfor ii = 1:20
  x(ii) = someFcn( wrapper.Value );
end
parfor ii = 1:20
  y(ii) = someFcn( wrapper.Value, x(ii) );
end

In that case, the function "generateLargeData" is evaluated only
once on each worker, and no large data is transferred from the
client to the workers. The large data is cleared from the workers
when the variable "wrapper" goes out of scope or is cleared on
the client.

Another example might be constructing a worker-specific
log-file. This can be achieved like so:

% build a function handle to open a numbered text file:
fcn = @() fopen( sprintf( 'worker_%d.txt', labindex ), 'wt' );

% opens the file handle on each worker, specifying that fclose
% will be used later to "clean up" the file handle created.
w = WorkerObjWrapper( fcn, {}, @fclose );

% Run a parfor loop, logging to disk which worker operated on which
% loop iterates
parfor ii=1:10
   fprintf( w.Value, '%d\n', ii );
end

clear w; % causes "fclose(w.Value)" to be invoked on the workers
type worker_1.txt % see which iterates worker 1 got

Required Products Parallel Computing Toolbox
MATLAB release MATLAB 7.12 (R2011a)
Tags for This File  
Everyone's Tags
parallel computing, parfor, shared memory, spmd
Tags I've Applied
Add New Tags Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (8)
13 Jun 2013 Mohsen

Thanks for the nice file!

However, I have tried to use it in my code in the following way and it resulted in an increase of the simulation running time compared to the non-parallelized version of the code.

I am dealing with large arrays and data files and have to calculate some statistics for a very large number of different cases.

In the initial non-parallelized version of my code, I calculate the statistics for case 1 to case 10^7 inside a for loop.

Non-parallelized Code:

... "read many files"
... "generate large arrays"

for i:1=10^7
... "Calculate statistics"
end

... "Write statistics in a text file"

In the parallelized version of the code, I use a PARFOR loop. However, I cannot have all the codes to calculate the statistics directly inside the PARFOR loop due to Matlab restrictions and errors. So, I had to create a new function called STAT and copy all the codes to calculate the statistics in this function.

Parallelized Code:

... "read many files"
... "generate large arrays"

w1=WorkerObjWrapper(Large_Array1);
w2=WorkerObjWrapper(Large_Array2);
w3=WorkerObjWrapper(Large_Array3);

parfor i:1=10^7

STAT(w1.Value, w2.Value, w3.Value, arg1, arg2, arg3, arg4);
..."slice STAT"

end

... "Write statistics in a text file"

The problem with my parallelized version of the code is that it takes much longer than the non-parallelized version. Inside the PARFOR loop, I have to call a function (STAT) and pass many large arrays (w1.Value) at each iteration.

Does anyone know what is the best way to optimize/parallelize this code?

Many thanks!

04 Mar 2013 Matt J

That's a fair point Sebastian. My only thought is that, somehow, SPMD manages to make this kind of thing possible, via codistributed arrays, so I wonder why PARFOR cannot.

04 Mar 2013 Sebastian

Maybe I'm wrong, but I'll try to explain better. Generally in a parfor loop called several times I have two data types. A constantly changing and one that is fixed over a parfor loop cycle.
For example:
a = rand (x2, 1);
b = rand (x2, 1);

for i1 = 1: x1

parfor i2 = 1: x2

a (i2) = fun1 (a (i2), b (i2));

end

a = fun2 (a, b);

end

It would be interesting that the sliced variable "b" to be persistent and that the sliced variable "a" the drive automatically by the parfor.
It is now certain that the remaining portion of b stored in each worker corresponds to the part that is sent from "a" to the worker in each call to parfor loop?
That is my question.

Sorry my english.

04 Mar 2013 Matt J

Hi Sebastian,

Not sure what you meant by "distributed the data in different way in each parfor call" or how it was relevant to my Comment. I was asking for a way to make the data slicing persist across parfor calls. This implies that the data would be distributed the same way in each parfor, not in different ways.

04 Mar 2013 Sebastian

Hi Matt J.
Good idea, but I think it is difficult in the parfor loop if it distributed the data in different way in each parfor call. As far as I know, in the parfor isn't defined what information receive each worker.

19 Jan 2013 Matt J

Hi Edric,

Looks very useful, but your example showing how data can be made to persist across 2 subsequent parfor loops seems restricted to unsliced data. I was wondering if the class would allow sliced data to persist as well (without needing to be resliced).

03 Dec 2012 Michael Völker  
03 Nov 2011 Christopher Kanan  
Updates
22 Jun 2012

Minor performance improvements; ability to construct wrapper from Composite.

Contact us