How to avoid broadcast variable while optimizing a cost function in parallel computing?

1 view (last 30 days)
I'm trying to minimize a heavy cost function (2500X2500 is the biggest matrix in it) using PSO in parallel computing. It takes me a couple of days for only one (!) iteration and I'm not sure why. Will be very thankfull for any help.
I use parallel computing in order to fasten things, but for now I get the message "The entire array or structure 'CostFunction' is a broadcast variable. This might result in unnecessary communication overhead". This are the problematic lines:
parfor i=1:nPop
% Evaluation (position value in the cost function)
particle(i).Cost = CostFunction(particle(i).Position);
end
While CostFunction is a function handle I defined earlier in the code, and it's input changes each iteration.
Using MATLAB profiler I managged to get statistics of the running time of my code, pointing that most of running time is in that single parfor loop
While ICF is my original cost function, and diss+null are the children of it. As I understand from the flame graph ICF and it's children are not children of the parfor loop, hence the running time is divided between the loop and the cost function seperately. And the time consuming Java method I dont know, but I do know it's part of the parallel process.
So I'm basically asking two questions:
  1. Is the broadcast variable problem the cause for the long running time?
  2. how can I avoid broadcasting my cost function?
thanks in advance

Answers (1)

Edric Ellis
Edric Ellis on 7 Dec 2022
Investigating performance of parfor loops can be a bit tricky. Here are a few pointers:
  • Do you happen to know if your function already benefits from MATLAB's intrinsic multi-threading? (Check using your system's "Task Manager" or equivalent). If so, using only local workers with PCT will not speed things up as you are already using all your machine's resources. (Process workers run in single-threaded mode so each worker might well process things more slowly than your client - but if you've got several of them, you can still get speedup overall)
  • You can check the data transfer size using ticBytes and tocBytes. However, 2500x2500 is not particularly large, and I wouldn't expect it to cause things to take that long
  • You can use mpiprofile to profile the execution time on the workers - the client profile only shows that you're waiting for workers to complete their work.(This works fine with parfor, despite the name)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!