Parallel speedup problem with spmd

10 views (last 30 days)
Trevor
Trevor on 6 Jul 2011
Hi I'm having a problem with the SPMD function, and getting the parallel speedups I want. A simple example of the type of serial code I want to make parallel is:
Nt = 100;
N = 1e6;
a = rand(N, 1);
b = rand(N, 1);
tic
for loop = 1 : Nt
c = a.*b;
% Other functions here which make the loop NOT independent
% and which use the result of c, and change a and b
end
toc;
This takes 0.55 sec on my computer, and if I halve N to 5e5, it takes half the time (i.e. 0.28 sec). Trouble is Nt actually needs to be about 1e6 which takes too long, so I was hoping to split N between different cores. The parallel code I wrote is:
matlabpool open 'local' 2
spmd
a_p = getLocalPart(codistributed.rand(N, 1));
b_p = getLocalPart(codistributed.rand(N, 1));
end
tic;
spmd
for loop = 1 : Nt
c = a_p.*b_p;
% Other functions here which make the loop NOT independent
% and which use the result c, and change a and b
end
end
toc;
On my dual core computer, this takes 0.87 sec, which is larger than the serial version, even though N is split between the cores. Is this a communication overhead problem, and is there a way to get around this problem?
Thanks

Answers (1)

Edric Ellis
Edric Ellis on 6 Jul 2011
There is definitely some overhead to entering and leaving an SPMD block. On my machine with "matlabpool local 2", it's about 0.02 seconds. When running locally, you also need to consider that many operations in MATLAB are multithreaded behind the scenes, and this multithreading has a much lower overhead than explicit parallelism. So, it may be that if your loop body can be multithreaded, that will always beat an SPMD approach.
(On the machine I tested on, the SPMD version was very slightly faster than the serial version)
One other thing, I notice you're building codistributed arrays and then immediately extracting the local part - why not simply use "rand(N/2,1)" to build the arrays?
  1 Comment
Trevor
Trevor on 6 Jul 2011
Hi Edric
On my machine the time to enter and leave the SPMD block is also about 0.02 sec. In my proper serial version (not just the simple one posted here) Nt = 1e6, and the for-loop (which is not independent) takes days to run. I had hoped that by implementing a parallel approach I could split N and reduce the time significantly. I basically have an embarassingly parallel problem, where there is essentially only one instance where data from all workers is needed and passed back. Do you know of a better approach of doing this within the parallel toolbox?
I am still learning the parallel toolbox, so thanks for the suggestion on just using "rand(N/2, 1)"

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!