Code covered by the BSD License  

Highlights from
Worker Object Wrapper

5.0

5.0 | 5 ratings Rate this file 49 Downloads (last 30 days) File Size: 3.39 KB File ID: #31972

Worker Object Wrapper

by

 

27 Jun 2011 (Updated )

Simplifies managing resources such as large data within PARFOR loops and SPMD blocks

Editor's Notes:

This file was selected as MATLAB Central Pick of the Week

| Watch this File

File Information
Description

The WorkerObjWrapper is designed for situations where a piece of
data is needed multiple times inside the body of a PARFOR loop or
an SPMD block, and this piece of data is both expensive to
create, and does not need to be re-created multiple
times. Examples might include: database connection handles, large
arrays, and so on.

Consider a situation where each worker needs access to a large
but constant set of data. While this data set can be passed in to
the body of a PARFOR block, it does not persist there, and will
be transferred to each worker for each PARFOR block. For example:

largeData = generateLargeData( 5000 );
parfor ii = 1:20
  x(ii) = someFcn( largeData );
end
parfor ii = 1:20
  y(ii) = someFcn( largeData, x(ii) );
end

This could be simplified like so:

wrapper = WorkerObjWrapper( @generateLargeData, 5000 );
parfor ii = 1:20
  x(ii) = someFcn( wrapper.Value );
end
parfor ii = 1:20
  y(ii) = someFcn( wrapper.Value, x(ii) );
end

In that case, the function "generateLargeData" is evaluated only
once on each worker, and no large data is transferred from the
client to the workers. The large data is cleared from the workers
when the variable "wrapper" goes out of scope or is cleared on
the client.

Another example might be constructing a worker-specific
log-file. This can be achieved like so:

% build a function handle to open a numbered text file:
fcn = @() fopen( sprintf( 'worker_%d.txt', labindex ), 'wt' );

% opens the file handle on each worker, specifying that fclose
% will be used later to "clean up" the file handle created.
w = WorkerObjWrapper( fcn, {}, @fclose );

% Run a parfor loop, logging to disk which worker operated on which
% loop iterates
parfor ii=1:10
   fprintf( w.Value, '%d\n', ii );
end

clear w; % causes "fclose(w.Value)" to be invoked on the workers
type worker_1.txt % see which iterates worker 1 got

Required Products Parallel Computing Toolbox
MATLAB release MATLAB 8.2 (R2013b)
Tags for This File   Please login to tag files.
Please login to add a comment or rating.
Comments and Ratings (25)
03 Sep 2014 Edric Ellis

@Dan, try replacing line 136 with the following:

Map = containers.Map(uint32(0), [], 'UniformValues', false);

which is the R2009a syntax.

03 Sep 2014 Dan

@Edric - Hi. Thank you for responding so quickly! I am running a fairly old version of MATLAB, MATLAB 7.8.0 (R2009a).

When I run:
>> containers.Map( 'KeyType', 'uint32', 'ValueType', 'any' )

I get the same error I got before:

??? No constructor 'containers.Map' with matching signature found.

Thanks!

01 Sep 2014 Edric Ellis

@Dan - that error is definitely not expected. What version of MATLAB/PCT are you using? What happens if you try to execute

containers.Map('KeyType', 'uint32', 'ValueType', 'any');

in MATLAB?

30 Aug 2014 Dan

I get the following error when I run the example code
w = WorkerObjWrapper( magic(5) );

??? No constructor 'containers.Map' with matching signature found.

How do I fix the error?

Thanks

Dan

03 Mar 2014 Joao Henriques

Works well, thanks!

It does consume a fair bit amount of memory though. Since "random access constant data" seems to be the most common model for many data-intensive applications, it would be great if Matlab somehow supports read-only shared memory across labs in the future.

22 Jan 2014 Leah K

Worked perfectly for SQL JDBC connection. Thanks for your help Edric.

w = WorkerObjWrapper(@connJDBC, {'dbname','servername'}, @close);

function conn = connJDBC( databasename, servername )

s.DataReturnFormat = 'dataset';
s.NullNumberRead = 'NaN';
s.NullNumberWrite = 'NaN';
s.NullStringRead = 'null';
s.NullStringWrite = 'null';
setdbprefs(s)

conn = database(databasename,'','','com.microsoft.sqlserver.jdbc.SQLServerDriver',...
['jdbc:sqlserver://' servername ';database=' databasename ';integratedSecurity=true;']);

01 Nov 2013 Edric Ellis

Hi Matt, FYI - I updated the version here to include that fix. Cheers, Edric.

31 Oct 2013 Matt J

Hi Edric,

That seemed to work! Thanks.

31 Oct 2013 Edric Ellis

Hi Matt, calling "clear" inside SPMD is not necessary - that's not the problem. Because of the way Composites work, when they go out of scope the workers don't find out immediately, only on the next SPMD block. If you add an SPMD block to your function, that's not sufficient since that happens before the Composites are created during the WorkerObjWrapper deletion.

Next suggestion: try adding the following line

val = []; dtor = []; valdtor = [];

as the last line inside the SPMD block in workerDelete.

30 Oct 2013 Matt J

I meant to say, it is _peculiar_ that sending no command to the workers in an empty spmd block should have this effect.

30 Oct 2013 Matt J

Hi Edric. An empty spmd block does clear them, but only from the command line. Putting it at the end of the function that created the wrapper object does no good. I also tried inserting an empty spmd block in the classdef where you recommended. That also didn't affect anything.

Is it desired behavior for an empty spmd block to clear the object? It is particular that sending no command to the workers forces a clear. Wouldn't it be better to enable

spmd, clear obj; end

Why wouldn't spmd

30 Oct 2013 Edric Ellis

Hi Matt, with your latest reproduction put inside a function, I see that the workers hold on to the memory after the function returns, but an empty "spmd, end" block is sufficient to clear them. Do you not see that behaviour?

Note that the Composite causing the memory to be retained is one created in WorkerObjWrapper, not your code.

25 Oct 2013 Matt J

Hi Edric,

No, even doing

spmd, clear w, end

doesn't release memory. Also, the problem is there even in a modification of my test when SPMD is not used it all (i.e., no Composites)

[N,R,C]=deal(300,380,480);

S=sprand(C,N^2,4/N);

w = WorkerObjWrapper( @(a,b)zeros(a,b), {N^2,R});

tic;
parfor i=1:60
y{i}=S*w.Value;

end
toc;

25 Oct 2013 Edric Ellis

Hi Matt,

I see the same thing. The problem is that the memory for a Composite value is not getting cleaned up. The next "spmd" block will fix things, or you could try adding a line saying "spmd, end" at line 120 just after the call to workerDelete.

Cheers, Edric.

24 Oct 2013 Matt J

I don't know if this is a platform dependent thing, but in the following code, I am not seeing the memory allocated by w.Value clear from the workers when w goes out of scope.

function test

[N,R,C]=deal(300,380,480);

S=sprand(C,N^2,4/N);

w = WorkerObjWrapper( @(a,b)zeros(a,b)+labindex, {N^2,R});

spmd
w.Value;
end

tic;
parfor i=1:60
y{i}=S*w.Value;

end
toc;

end

After running the above, the Task Manager shows about 3GB more memory usage. This is on a

Dell Precision T7500
Intel Xeon X5680 3.33Ghz
dual hexacore

19 Sep 2013 Edric Ellis

Hi Joe, There's a bug in WorkerObjWrapper that I'll submit a fix for - but in the meantime you can try running

spmd, ?WorkerObjWrapper; end

before doing anything else and that should fix things for you.

18 Sep 2013 Joe

Great file! It's improved the speed of the parfor loop in a simulation I've written by 30%.

One issue: the first time I run this in my program, I get the following error:

Error using WorkerObjWrapper>(spmd body) (line 161)
You cannot get the 'Map' property of 'WorkerObjWrapper'.
You cannot get the 'Map' property of 'WorkerObjWrapper'.

This error occurs once for every worker I have, and aborts my script. When I then run my script again (no changes), WorkerObjWrapper works wonderfully and no errors occur. I'm calling it in this form (simplified):

tsteps = 1:10000;
xpoints = 1:100;
for jj = tsteps
% omitting stuff here that generates a large matrix of kk=6 different quantities ("bigMatrix(ii,kk)") that evolves in time and needs to be checked for each time step at each x point.

bigMatrixWrapped = WorkerObjWrapper(bigMatrix);

parfor ii = xpoints
bigMatrixW = bigMatrixWrapped.Value;
results(ii,jj) = processResults(bigMatrixW(ii));
end
end
......

The error occurs on the line "bigMatrixWrapped = WorkerObjWrapper(bigMatrix);" so I'm not even sure that the structure of my parfor loop matters, unless for some reason WorkerObjWrapper doesn't like being called repeatedly. But then running the whole script a second time works, with no "Map" error (I'm not even sure what that means) and the expected results collected in the "results(ii,jj)" matrix.

Any ideas?

13 Jun 2013 Mohsen

Thanks for the nice file!

However, I have tried to use it in my code in the following way and it resulted in an increase of the simulation running time compared to the non-parallelized version of the code.

I am dealing with large arrays and data files and have to calculate some statistics for a very large number of different cases.

In the initial non-parallelized version of my code, I calculate the statistics for case 1 to case 10^7 inside a for loop.

Non-parallelized Code:

... "read many files"
... "generate large arrays"

for i:1=10^7
... "Calculate statistics"
end

... "Write statistics in a text file"

In the parallelized version of the code, I use a PARFOR loop. However, I cannot have all the codes to calculate the statistics directly inside the PARFOR loop due to Matlab restrictions and errors. So, I had to create a new function called STAT and copy all the codes to calculate the statistics in this function.

Parallelized Code:

... "read many files"
... "generate large arrays"

w1=WorkerObjWrapper(Large_Array1);
w2=WorkerObjWrapper(Large_Array2);
w3=WorkerObjWrapper(Large_Array3);

parfor i:1=10^7

STAT(w1.Value, w2.Value, w3.Value, arg1, arg2, arg3, arg4);
..."slice STAT"

end

... "Write statistics in a text file"

The problem with my parallelized version of the code is that it takes much longer than the non-parallelized version. Inside the PARFOR loop, I have to call a function (STAT) and pass many large arrays (w1.Value) at each iteration.

Does anyone know what is the best way to optimize/parallelize this code?

Many thanks!

04 Mar 2013 Matt J

That's a fair point Sebastian. My only thought is that, somehow, SPMD manages to make this kind of thing possible, via codistributed arrays, so I wonder why PARFOR cannot.

04 Mar 2013 Sebastian

Maybe I'm wrong, but I'll try to explain better. Generally in a parfor loop called several times I have two data types. A constantly changing and one that is fixed over a parfor loop cycle.
For example:
a = rand (x2, 1);
b = rand (x2, 1);

for i1 = 1: x1

parfor i2 = 1: x2

a (i2) = fun1 (a (i2), b (i2));

end

a = fun2 (a, b);

end

It would be interesting that the sliced variable "b" to be persistent and that the sliced variable "a" the drive automatically by the parfor.
It is now certain that the remaining portion of b stored in each worker corresponds to the part that is sent from "a" to the worker in each call to parfor loop?
That is my question.

Sorry my english.

04 Mar 2013 Matt J

Hi Sebastian,

Not sure what you meant by "distributed the data in different way in each parfor call" or how it was relevant to my Comment. I was asking for a way to make the data slicing persist across parfor calls. This implies that the data would be distributed the same way in each parfor, not in different ways.

04 Mar 2013 Sebastian

Hi Matt J.
Good idea, but I think it is difficult in the parfor loop if it distributed the data in different way in each parfor call. As far as I know, in the parfor isn't defined what information receive each worker.

19 Jan 2013 Matt J

Hi Edric,

Looks very useful, but your example showing how data can be made to persist across 2 subsequent parfor loops seems restricted to unsliced data. I was wondering if the class would allow sliced data to persist as well (without needing to be resliced).

03 Dec 2012 Michael Völker  
03 Nov 2011 Christopher Kanan  
Updates
22 Jun 2012

Minor performance improvements; ability to construct wrapper from Composite.

19 Sep 2013

Fix first-time initialization problems in R2013a and R2013b.

01 Nov 2013

Change worker cleanup to retrieve memory sooner.

Contact us