How to make data persist on workers between calls to parfor?

4 views (last 30 days)
Essentially I want to evaluate a loop of the following form as quick as possible:
for k = 1:K
S = 0;
for n = 1:N
S = S + f(a,X{n});
end
a = g(a,S);
end
where 'f' and 'g' are functions, 'X' is an N-by-1 cell array (~1GB) with N large and 'a' is a small numeric array (<1kB).
Using parfor in the inner loop will speed it up but surely it would be even faster if the data 'X' persisted on the workers between calls to parfor. It's just I have so far failed to get codistributed arrays and spmd to work as I'm sure they're intended. If someone could provide an explicit example for this special case I'd be very grateful.

Answers (1)

Edric Ellis
Edric Ellis on 17 Mar 2014
This looks like it might be a good case for the Worker Object Wrapper. The examples there should show you how to use it - but basically you need to wrap 'X' and then extract the Value field inside the PARFOR loop.
  4 Comments
Max
Max on 18 Mar 2014
Thanks for your feedback also Sean. I've read a lot of the documentation on parfor, spmd, composite/codistributed arrays and so forth and I have to say that I feel like I understand the functionality conceptually --- it's just I can't make any of it work in practice the way I imagine it should. It would be great if you could provide an example. Or at least comment on my own (bad) example below where the parfor version takes less than a second and the spmd version takes over 15 minutes! (with 8 workers on my dual quadcore desktop)
L = 8; N = 400; testdata = rand(L,L,N);
tic
result_par = zeros(1,N);
for k = 1:10
parfor n = 1:N
result_par(n) = result_par(n) + norm(testdata(:,:,n));
end
end
toc
tic
testdata_dist = distributed(testdata);
resdist = zeros(1, N, codistributor());
for k = 1:10
for n = drange(1:N)
resdist(n) = resdist(n) + norm(testdata_dist(:,:,n));
end
end
result_spmd = gather(resdist);
toc
max(abs(result_par-result_spmd))
Edric Ellis
Edric Ellis on 18 Mar 2014
Here's how to use WorkerObjWrapper in your first example. (For reasons that aren't entirely clear to me, this actually slows things down - perhaps your actual code will exhibit speedup though if the amount of memory to be transferred becomes more significant)
L = 8; N = 400; testdata = WorkerObjWrapper(rand(L,L,N));
tic
result_par = zeros(1,N);
for k = 1:10
parfor n = 1:N
v = testdata.Value;
result_par(n) = result_par(n) + norm(v(:,:,n));
end
end
toc
I think your second example is a bit confused about whether you should be dealing with distributed or codistributed arrays. You've made 'testdata_dist' be 'distributed', but 'redist' is 'codistributed'. In general, you should deal with only 'distributed' outside SPMD, and 'codistributed' inside. The data types are automatically transformed when you cross the SPMD boundary. Also, for-drange should only be used inside an SPMD context. For reference, here's what I think you should be doing there - but indexing distributed arrays is rather slow, so this does not achieve good performance:
tic
testdata_dist = distributed(testdata);
resdist = distributed.zeros(1, N);
for k = 1:10
spmd
% inside this block, 'testdata_dist' and 'resdist'
% are both transformed to 'codistributed'.
for n = drange(1:N)
resdist(n) = resdist(n) + norm(testdata_dist(:,:,n));
end
end
end
result_spmd = gather(resdist);
toc

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!