Parfor problems

21 views (last 30 days)
Christopher Kanan
Christopher Kanan on 2 Nov 2011
I'm having some trouble with a parfor loop. I'm getting this error when running with a large dataset that has to be copied into the workers (it works great otherwise):
Warning: Error caught during construction of remote parfor code.
The parfor construct will now be run locally rather than
on the remote matlabpool. The most likely cause of this is
an inability to send over input arguments because of a
serialization error. The error report from the caught error is:
Error using ==> distcompserialize
Error during serialization
Error in ==> distcomp.remoteparfor.remoteparfor at 41
obj.SerializedInitData = distcompMakeByteBufferHandle(distcompserialize(varargin));
Error in ==> parallel_function at 437
P = distcomp.remoteparfor(W, @make_channel, parfor_C);
I thought this might be because I am running out of memory; however, my machine has 24GB and MATLAB is only consuming 3GB. It gives this error even when only creating two workers, so I don't think it is a simple issue of running out of memory. I'm running 64 bit 2011A on a Win7 machine.
When I use a dataset that is only about 1GB in size, the program works great with 6 workers.
My code looks something like this.
M = Big_Database;
res = zeros(1, n);
parfor t = 1:n
data = load(file{t});
res(t) = Process(M, data);
end
M is a structure with many substructures.
The error is not parfor specific, and spmd caused a similar error message. All of the workers are local.
The problem occurs even if the number of workers is just 1, so I really don't think it is a memory problem. It works fine as far as I can tell with a for loop, although I have only allowed it to run for 10 hours. The parfor problem happens immediately after entering the loop.
  3 Comments
Christopher Kanan
Christopher Kanan on 2 Nov 2011
Yes, it works fine with a for loop. It hasn't crashed in the 10 hours it has been running.
Christopher Kanan
Christopher Kanan on 3 Nov 2011
The problem also occurs after increasing my Java heap size by a factor of 10 and after upgrading to 2011B.

Sign in to comment.

Accepted Answer

Edric Ellis
Edric Ellis on 3 Nov 2011
The memory problem may well be to do with the PARFOR communication mechanism which is more restrictive than MATLAB in general. Does it work to instantiate 'M' inside the PARFOR loop? If so, that would cut down the amount of data to transfer. To do that efficiently, you could use my WorkerObjWrapper like this:
Mw = WorkerObjWrapper(@Big_Database, {})
parfor ...
data = load(file{t});
res(t) = Process(Mw.Value, data);
end
This ensures that Big_Database() is evaluated once per worker, and you can use the value across multiple PARFOR loops (and SPMD blocks if you wish).
  5 Comments
Edric Ellis
Edric Ellis on 4 Nov 2011
The Wrapper gets its job done by only sending the function handle @Big_Database across the wire, and then invoking it in parallel on the workers. The 'workerInit' method is where this happens. Walter is completely correct to say that the fact that the workers may be on different machines is one reason why we don't have any shared memory support. Another reason is that then we'd have to manage concurrent accesses.
Ali Arslan
Ali Arslan on 4 Apr 2013
Edric, could you please explain the use of that anonymous function in
if true
Mw = WorkerObjWrapper(@Big_Database, {})
end
That Big_Database here apparently cannot be a variable that's already initialized (Matlab complains if it is), but it doesn't look like a function either. What exactly is it?

Sign in to comment.

More Answers (1)

Walter Roberson
Walter Roberson on 2 Nov 2011
I do not know much about the configuration and implementation of parfor, and my memory can be weak at this time of day so keep your salt grains handy:
The error message was nearly identical to the one you are seeing, and the cause was that the workers were not allocated enough memory.
I seem to recall that the configuration for workers allows memory limits to be placed. I also seem to recall from a different thread that the JAVA space for workers can be set to be different than for base MATLAB.
I do recall a thread not long ago in which someone asked how to place memory resource limits on MATLAB in MS Windows, and my research at the time indicated that you could not really do that in MS Windows; that would tend to argue against the memory configuration possibility (but it was the solution for someone...)
One possibility is that perhaps your worker that are getting started are the 32 bit version of MATLAB. I do not know at present the mechanism for selecting 32 vs 64 bits for the workers, but if a 32 bit worker was indeed getting started, then it would not be able to handle the array sizes you are using.
Sorry, no solutions, just hints -- at least until The Mathworks (Lagos) Online Email Lottery picks my lucky Toolkit Serial Numbers ;-)
  3 Comments
Walter Roberson
Walter Roberson on 2 Nov 2011
Does the memory() command exist in the 64 bit Windows version? Or feature('memstats') ? If so then perhaps you could parfor a call to one of those to determine the worker limits.
Christopher Kanan
Christopher Kanan on 2 Nov 2011
Yes, I actually use memory to decide how many workers to use at most. I subtract a few from it to be conservative. The number I compute is 5.5, but since it issues the error with a single worker.
I verified that all workers are 64 bit.
All workers report being able to allocate an array of the same maximum size, and they claim to be able to access an amount of memory more than necessary to hold the data (assuming there isn't some enormous memory overhead being imposed I'm unaware of).

Sign in to comment.

Categories

Find more on MATLAB Parallel Server in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!