parfor error - variable cannot be classified
Show older comments
I have been using Matlab for a while but I am new to parallel computing. I have a set of .m files that take a long time and after profiling the code and seeing which sections suck of most of the time, I thought perpahs I can take advantage of parallel computing.
Here is one of those sections (sxyt is ~1million, numFrames = 10, numParams is ~ 18).
numParams = length(a);
sxyt = size(xytIn, 1);
DxDyOut = single(zeros(sxyt, numFrames, 2));
osc = single(zeros(sxyt, numFrames, numParams));
mag = single(zeros(sxyt, numFrames, numParams));
t = zeros(sxyt, numFrames);
randomPhase = single(zeros(1, numParams));
randomPhaseOffset = single(ones(1, numParams));
randomPhase = randomPhaseOffset .* rand(1,numParams) .* 2 .* pi;
for k = 1:numParams
osc(:,:,k) = exp((2.*pi.*f(k).*t(:,:) + randomPhase).* 1i);
mag(:,:,k) = a(k) .* (k1(k) + k2(k).*exp(-t(:,:)/(tau(k) + 0.0001)));
DxDyOut(:,:,1) = DxDyOut(:,:,1) + real( jonesx(k) .* osc(:,:,k) .* mag(:,:,k) );
DxDyOut(:,:,2) = DxDyOut(:,:,2) + real( jonesy(k) .* osc(:,:,k) .* mag(:,:,k) );
end
When I turn the for loop to parfor loop, I get "The variable DxDyOut in a parfor cannot be classified" error. I read the Matlab help on parfor and a number of the other submissions but still can't figure it out. In some earlier posts, there were double for loops which were solved by making the outer loop a parfor and leaving the inner one as a for loop to slice the data. I don't get any errors for osc and mag lines. So, this made me think that the error has to do with the fact that I am accumulating the results in the variable DxDyOut. So, I changed the for loop thusly:
A = single(zeros(sxyt, numFrames, numParams));
B = single(zeros(sxyt, numFrames, numParams));
DxDyOut2 = single(zeros(sxyt, numFrames, 2));
for k = 1:numParams
osc(:,:,k) = exp((2.*pi.*f(k).*t(:,:) + randomPhase).* 1i);
mag(:,:,k) = a(k) .* (k1(k) + k2(k).*exp(-t(:,:)/(tau(k) + 0.0001)));
A(:,:,k) = real( jonesx(k) .* osc(:,:,k) .* mag(:,:,k) );
B(:,:,k) = real( jonesy(k) .* osc(:,:,k) .* mag(:,:,k) );
end
DxDyOut2(:,:,1) = sum(A,3);
DxDyOut2(:,:,2) = sum(B,3);
I no longer get the error but this time the loop takes 10x longer to complete when using parfor. I noticed that Matlab worker used all of my 16GB RAM with parfor. A, B, mag arrays are about 750MB each and osc is 1.5 GB, with all the variables in the workspace adding up to ~4.2 GB.
So, I should have enough memory with two workers running. I don't understand why I am running out of memory. Either way, it seems like in cases where large data is manipulated, parallel computing won't help because one runs into memory limits. I would appreciate any help.
A quick update - when both forms of the for loop are run, the results are identical, i.e., isequal(DxDyOut, DxDyOut2) = 1. However when I convert the second form to parfor, isequal(DxDyOut, DxDyOut2) = 0. The first form won't work with parfor.
Answers (1)
Walter Roberson
on 12 May 2015
Try
A = single(zeros(sxyt, numFrames));
B = single(zeros(sxyt, numFrames));
DxDyOut2 = single(zeros(sxyt, numFrames, 2));
for k = 1:numParams
osc(:,:,k) = exp((2.*pi.*f(k).*t(:,:) + randomPhase).* 1i);
mag(:,:,k) = a(k) .* (k1(k) + k2(k).*exp(-t(:,:)/(tau(k) + 0.0001)));
A = A + real( jonesx(k) .* osc(:,:,k) .* mag(:,:,k) );
B = B + real( jonesy(k) .* osc(:,:,k) .* mag(:,:,k) );
end
DxDyOut2(:,:,1) = A;
DxDyOut2(:,:,2) = B;
5 Comments
Varoujan
on 12 May 2015
Walter Roberson
on 12 May 2015
Edited: Walter Roberson
on 12 May 2015
You have neglected floating point roundoff. Remember in floating point, P+Q+R might not be the same as P+R+Q . parfor does the addition reductions in an unspecified order that might vary dynamically (e.g., whatever is ready first.) If you require bit-for-bit reproduction of the for-loop results then parfor reduction variables are not an appropriate tool.
parfor can take longer if there is not enough work per iteration, due to the overhead of creating the tasks and coordinating them. One technique to reduce that is to "unroll". For example, presuming numParams is even,
parfor k = 1:2:numParams
osc(:,:,k) = exp((2.*pi.*f(k).*t(:,:) + randomPhase(k)).*1i);
mag(:,:,k) = a(k) .* (k1(k) + k2(k).*exp(-t(:,:)/(tau(k) + 0.0001)));
om = osc(:,:,k) .* mag(:,:,k);
A1 = real( jonesx(k) .* om );
B1 = real( jonesy(k) .* om );
osc(:,:,k+1) = exp((2.*pi.*f(k+1).*t(:,:) + randomPhase(k+1)).*1i);
mag(:,:,k+1) = a(k+1) .* (k1(k+1) + k2(k+1).*exp(-t(:,:)/(tau(k+1) + 0.0001)));
om = osc(:,:,k+1) .* mag(:,:,k+1);
A = A + A1 + real( jonesx(k+1) .* om );
B = B + B1 + real( jonesy(k+1) .* om );
end
Walter Roberson
on 13 May 2015
drange() might also be a useful mechanism in the case when each worker does not do enough work.
for k = drange(1:numParams)
osc(:,:,k) = exp((2.*pi.*f(k).*t(:,:) + randomPhase(k)).*1i);
mag(:,:,k) = a(k) .* (k1(k) + k2(k).*exp(-t(:,:)/(tau(k) + 0.0001)));
A = A + real( jonesx(k) .* osc(:,:,k) .* mag(:,:,k) );
B = B + real( jonesy(k) .* osc(:,:,k) .* mag(:,:,k) );
end
This allocates chunks of k to workers, each chunk to be done by the same worker. The order that the chunks will be given the individual k is unspecified, so this has the same limitation about round-off error.
Varoujan
on 15 May 2015
Walter Roberson
on 15 May 2015
Could you try it with double precision and see how the speed changes?
Categories
Find more on Parallel for-Loops (parfor) in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!