How do I set the path for Matlab workers to access packages when using a parallel pool?

24 views (last 30 days)
I have several packages that I've made in order to better organize my functions, they work in non parallel pool situations and I can set them in the path by using the line,
import bedGeometry.*
in which my package is called bedGeometry, and contains the Matlab functions to construct a simulated particle bed.
That same method actually worked for the parallel pool situation when I used the same line inside the parfor loop. That method has since stopped working and gives me the error (note that the function initializedBed is inside my package bedGeometry),
An UndefinedFunction error was thrown on the workers for 'initializeBed'. This might be because the file containing 'initializeBed' is not accessible on the workers. Use addAttachedFiles(pool, files) to specify the required files to be attached. See the documentation for 'parallel.Pool/addAttachedFiles' for more details.
So of course I tried their suggestion and added the line,
addAttachedFiles(pool, {'bedGeometry', 'particle.m', 'dataAnalysis'});
which worked for an even longer time period, I mean, I did a lot of calculations in parfor, and now it suddenly doesn't work anymore. Although the pool claims that my packages are still attached to the it (they show up under the pool variable field called attachedFiles), it now throws the error
Error using initializeBed (line 24) Not enough input arguments.
in which line 24 refers not to the function in the package that I added, but instead to an older version of the same function stored it stored in
/private/tmp/tp14c1c490_ab01_4e7c_8294_0d96f6cac198rp2137/a/tp69bedd4e_277e_47fc_b926_7ddc2e3bb7b9/initializeBed.m
that has more input arguments. When I delete this file Matlab throws the original error about undefined functions and how I should add them to the path.
Why is Matlab calling this older version stored in this random place? How can I get Matlab to get back on the right path so to speak? Anyone ever had the same problem? What lines do you use to add packages to the worker's path?
This has completely boggled my mind, I even reinstalled Matlab. Any help would be appreciated, thanks.

Answers (2)

Edric Ellis
Edric Ellis on 21 Sep 2015
It's a little tricky from your description to work out exactly what's gone wrong here, but I think it ought to be possible to resolve this.
Firstly, note that the MATLAB import statement doesn't actually modify the MATLAB path; rather, it simply changes how MATLAB interprets names that it sees in your program. import statements that you make outside a parfor loop have no effect on the how names are interpreted in the body of the loop.
So, in conclusion, you either need the import statements inside the parfor loops as well as outside, or you could use the fully-qualified names everywhere. Using the fully-qualified names might make subsequent debugging simpler (I realise that this does make the code more awkward to write).
Next, we need to address how the parallel pool workers actually get hold of the source code to execute. Normally, when you open a parallel pool, the infrastructure synchronises MATLAB path entries that you have on your client with the workers. This system works well if client and workers can see the same code at the same path. For example, if you're using the 'local' cluster type, it is always the case that the workers can see the same code at the same location, so the path synchronisation tends to work well. For other cluster types, this doesn't always work (e.g. if the cluster workers don't have access to your home directory where your code is located).
If the workers can't see the same folders, that's when you need to use "attached files". When attaching files, note that a snapshot is taken of the file contents at the point you attach those files to the pool. You can use updateAttachedFiles to update files after you've modified them.

Francis Turney
Francis Turney on 24 Sep 2015
Thanks for the response Edric, I was making it more complicated than it needed to be. Instead of adding the packages to the path I just converted them to regular directories (removed the + from the front of their name) and used the line,
addpath('bedGeometry', 'dataAnalysis')
at the beginning of my script, and like you said, Matlab just synced the path to all the workers when the pool was created. Still not quite sure why import didn't work but this seems to be a good work around for me.

Categories

Find more on Parallel for-Loops (parfor) in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!