I'm going to answer this question for myself, approximately.
It seems the issue was not specific to the MATLAB2014b compiler; we replicated the issue with 2015b and 2013b (although the number of processors was much smaller).
It seems the issue had to do with the data loading. Basically, our workflow was like this:
In this case, the loadData() stage was very large, loading in several gigs of data into memory, including a set of anonymous functions. This was fine, hardware-wise, but it seems that upon parallel pool creation, this made it very slow. We believe (but are not sure) that the creation was so slow because it was replicating the memory several hundred times. Precisely why this was slow, I am not sure.
However, we were able to resolve the problem by moving the createParallelPool() command to the beginning of our function, so the revised workflow looked like:
Because no jobs were assigned to the pool before the doEstimation() stage, this did not encounter the same problem we had earlier. Essentially, we dramatically reduced the amount of memory necessary to transfer to the other instances of Matlab.
As an aside, we ended up using MATLAB2015b because the 2014b version has great difficulty with anonymous functions in a parallel environment; it would quickly use far too much memory, then page, crashing the program.
- Use as little data in RAM as possible when doing parallel computations, and especially avoid complex data-types. The matfile command is very useful.
- Declare parallel pools as early as possible in your program to reduce overhead.