Large overheads in parallel solving
3 views (last 30 days)
Show older comments
I am trying to distribute some simple tasks (1500 by 1500 random matrix diagonalisation) over local workers to gain some speed up. My attempt at parelellising a simple process is below, and simply records the time taken to finish the jobs when distributed vs when performed sequentially. However I find the overheads seem enormous, and they seem to scale with the problem. I don't know if am a making a simple avoidable error
For context, ultimately I want to distribute the problem of diagonalising large numbers (~10^5) of large matrices (n by n with n~10^5) which was very slow, so slow that when I ran in parpool matlab would typically lose the connection with the (local) workers before they finished at crash the script. Apparently the job-task structure is more robust to this problem.
profileName=parallel.defaultClusterProfile();
cluster_prof=parcluster(profileName);
job=createJob(cluster_prof);
N=100;
C=cell(1,50);
for i=1:(N/2)
C{i}={1500};
end
myfun = @(n) eig(randn(n));
createTask(job,myfun,1,C);
createTask(job,myfun,1,C);
t_par=0;
tic;
submit(job);
wait(job);
t_par=toc;
t_seq=0;
tic;
for i=1:N
myfun(1500);
end
t_seq=toc;
the output tells me that for this job of N=100 diagonalisations (50 per core when parallel) the times for running in parallel vs running sequentially were
t_seq = 224.3...
t_par = 430.8...
performing the same code with N=40 (20 matrix diagonalisations per job) gives
t_seq = 91.9...
t_par = 165.9...
performing the same code with N=20 (10 matrix diagonalisations per job) gives
t_seq = 45.2...
t_par = 125.4...
Seemingly distributing the job over two cores was consistently about twice as slow as doing it sequentially - regardless of the size of the job. This surprised me as I imagined the overheads of distributing would have been a one of cost of setting up the process. Is there something I can do to improve the efficiency of this? Or is there a better way share the workload of processes that do not interact over several workers?
0 Comments
Answers (1)
Walter Roberson
on 14 Jan 2016
Sometimes you just hit communications bottlenecks, if you are sending enough data around.
In such cases sometimes you can take advantage of Worker Object Wrapper, http://www.mathworks.com/matlabcentral/fileexchange/31972-worker-object-wrapper
and if you have a new enough MATLAB, you can might be able to use parallel.pool.Constant
See Also
Categories
Find more on MATLAB Parallel Server in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!