Large overheads in parallel solving

Question

philjdc on 14 Jan 2016

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/263808-large-overheads-in-parallel-solving

Commented: philjdc on 15 Jan 2016

I am trying to distribute some simple tasks (1500 by 1500 random matrix diagonalisation) over local workers to gain some speed up. My attempt at parelellising a simple process is below, and simply records the time taken to finish the jobs when distributed vs when performed sequentially. However I find the overheads seem enormous, and they seem to scale with the problem. I don't know if am a making a simple avoidable error

For context, ultimately I want to distribute the problem of diagonalising large numbers (~10^5) of large matrices (n by n with n~10^5) which was very slow, so slow that when I ran in parpool matlab would typically lose the connection with the (local) workers before they finished at crash the script. Apparently the job-task structure is more robust to this problem.

profileName=parallel.defaultClusterProfile();
cluster_prof=parcluster(profileName);
job=createJob(cluster_prof);
N=100;
C=cell(1,50);
for i=1:(N/2)
  C{i}={1500};
end
myfun = @(n) eig(randn(n));
createTask(job,myfun,1,C);
createTask(job,myfun,1,C);
t_par=0;
tic;
submit(job);
wait(job);
t_par=toc;
t_seq=0;
tic;
for i=1:N
  myfun(1500);
end
t_seq=toc;

the output tells me that for this job of N=100 diagonalisations (50 per core when parallel) the times for running in parallel vs running sequentially were

t_seq = 224.3...
t_par = 430.8...

performing the same code with N=40 (20 matrix diagonalisations per job) gives

t_seq = 91.9...
t_par = 165.9...

performing the same code with N=20 (10 matrix diagonalisations per job) gives

t_seq = 45.2...
t_par = 125.4...

Seemingly distributing the job over two cores was consistently about twice as slow as doing it sequentially - regardless of the size of the job. This surprised me as I imagined the overheads of distributing would have been a one of cost of setting up the process. Is there something I can do to improve the efficiency of this? Or is there a better way share the workload of processes that do not interact over several workers?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Walter Roberson on 14 Jan 2016

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/263808-large-overheads-in-parallel-solving#answer_206198

Sometimes you just hit communications bottlenecks, if you are sending enough data around.

In such cases sometimes you can take advantage of Worker Object Wrapper, http://www.mathworks.com/matlabcentral/fileexchange/31972-worker-object-wrapper

and if you have a new enough MATLAB, you can might be able to use parallel.pool.Constant

1 Comment
Show -1 older commentsHide -1 older comments

philjdc on 15 Jan 2016

Sorry I'm not sure I understand, are you saying that the parallelised code is slowed by handing data to the workers? currently the data being passed to each worker is a single integer value. It seems unlikely to me this should present a problem.

Sign in to comment.

Large overheads in parallel solving

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

Large overheads in parallel solving

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments