Asked by Jia Hao
on 14 Sep 2013

Hello,everyone! Nowadays I am trying to do some parallel computing on Matlab.I get a question and the codes below may explain it.

a=1:1000; b=zeros(10,1); for i=1:1000 j=mod(i,10); b(j+1)=b(j+1)+a(i); end

In this example,I want to calculate a sum to *a* in every ten numbers,and save the results in an array *b*. And as it can be seen,it works well in a serial program. However,when I try to do it in parallel(below),an error appear:

a=1:1000; b=zeros(10,1); matlabpool local; parfor i=1:1000 j=mod(i,10); b(j+1)=b(j+1)+a(i); end matlabpool close;

"Error: The variable b in a parfor cannot be classified."

I think variable *b* is a reduction variable,because in line 5 of the code,it has the form "X=X+expr" (expr is any expression). But the examples of reduction variables in the "product help" of Matlab are all simple variable,while *b* is an array.So I turn to "sliced variable" for help.It is a pity that the index of *b* is not *i*(the loop variable).So maybe Matlab cannot make sure what kind of variable the "_b_" is.
So,my question is:How to achieve my goal(sum *a* in a sliced way) in parallel? In other words, how to modify the code in parallel so as to get the same results as that in serial way?
Your answer or idea is important for me.Thank you in advance!

*No products are associated with this question.*

Answer by BookSword
on 20 Sep 2013

Accepted answer

Hi, I think you can try to use spmd programming to reach your goal. It can deal with your problem regardless of the form of j(i). See below for example:

matlabpool local; % startup of the parallel environment tic; spmd a=1:100000; D = codistributed(a); % distribute a in parallel workers b_in_worker=zeros(10,1); % temporal b array in each parallel worker % do the operation on the partial a array in each parallel worker % and restore the partial results in b_in_worker. for i = drange(1:length(D)) j = mod(i,10); b_in_worker(j+1) = b_in_worker(j+1)+D(i); end end % sum up all partial results b = zeros(10,1); for i = 1:length(b_in_worker) b = b+ b_in_worker{i}; end disp Parallel: toc; matlabpool close;

I have not tested the parallel efficiency. However, since matlab is based on java, the communication between prallel workers can be very consuming. I advise that you need to make some effort on reducing the data exchange between workers.

The following are some selected official references of spmd which I think could be helpful:

Working with Codistributed Arrays

Answer by Walter Roberson
on 14 Sep 2013

For arrays that are not effectively local to the parfor loop, the only permitted indexing is constants, or the loop index, or a constant expression involving the loop index. Your j is not a constant expression.

I do not have the toolbox, so I do not know if expanding j would work:

b(mod(i-1,10)+1) = b(mod(i-1,10)+1) + a(i);

You might have better luck writing everything out:

parfor i = 1 : 10 : 1000 b(1) = b(1) + a(i); b(2) = b(2) + a(i+1); b(3) = b(3) + a(i+2); ... b(10) = b(10) + a(i+9); end

safer yet might be

parfor i = 1 : 10 b(i) = sum(a(i:10:1000)); end

You could also do the task serially by using

b = accumarray( 1 + mod((1:1000).'-1,10), a(:), [10, 1] );

Show 1 older comment

Walter Roberson
on 14 Sep 2013

parfor() is not suitable for use in writing to a random location.

For sliced variables, parfor tries to enforce that the outputs are all to different locations, and then it lets them all run as needed, without interlock, counting on the fact that the result will be order-invariant because no location will be touched twice.

I *gather* (but do not know) that in the situations in which parfor is able to identify a reduction variable, it creates a local variable per worker, runs the workers, and then at the end combines all the local variables. I do not know the circumstances under which parfor is able to identify a reduction as I do not have the toolbox. The static analysis parfor does is always going to be limited (it is at best an automated theorem prover, and Godel proved that there are always going to be true theorems that cannot be automatically proved.)

When the calculation of each iteration is sufficiently costly, a straight-forward way to handle situations like yours is to output two items per iteration, into arrays indexed by the iteration number: output the target destination index in one, and output the corresponding value into the other

outputidx(K) = mod(i-1,10)+1; outputval(K) = ....

Then after the parfor, run a serial reduction, possibly using accumarray

accumarray( outputidx(:), outputval(:), [], @ReductionFunction )

where @ReductionFunction defaults to sum

You might also be able to use parfor, as in

parfor K = 1 : maximumidx reducedvalue(K) = ReductionFunction( outputval(outputidx==K) ); end

which might or might not be more efficient in any given case.

Edric Ellis
on 16 Sep 2013

Opportunities for recent engineering grads.

## 0 Comments