You cannot use a
parfor-loop inside another
As an example, the following nesting of
is not allowed:
parfor i = 1:10 parfor j = 1:5 ... end end
You cannot nest
parfor directly within
can call a function that contains a
but you do not get any additional parallelism.
You cannot nest
because parallellization can be performed at only one level. Therefore,
choose which loop to run in parallel, and convert the other loop to
Consider the following performance issues when dealing with nested loops:
Parallel processing incurs overhead. Generally, you
should run the outer loop in parallel, because overhead only occurs
once. If you run the inner loop in parallel, then each of the multiple
incurs an overhead. See Convert Nested for-Loops to parfor for an example how to
measure parallel overhead.
Make sure that the number of iterations exceeds the number of workers. Otherwise, you do not use all available workers.
Try to balance the
parfor tries to compensate for
some load imbalance.
Always run the outermost loop in parallel, because you reduce parallel overhead.
You can also use a function that uses
embed it in a
parfor-loop. Parallellization occurs
only at the outer level. In the following example, call a function
parfor-loop. The inner
MyFun.m runs sequentially, not in
parfor i = 1:10 MyFun(i) end function MyFun(i) parfor j = 1:5 ... end end
parfor-loops generally give you
no computational benefit.
A typical use of nested loops is to step through an array using a one-loop variable to index one dimension, and a nested-loop variable to index another dimension. The basic form is:
X = zeros(n,m); for a = 1:n for b = 1:m X(a,b) = fun(a,b) end end
The following code shows a simple example. Use
measure the computing time needed.
A = 100; tic for i = 1:100 for j = 1:100 a(i,j) = max(abs(eig(rand(A)))); end end toc
Elapsed time is 49.376732 seconds.
You can parallelize either of the nested loops, but you cannot run both in parallel. The reason is that the workers in a parallel pool cannot start or access further parallel pools.
If the loop counted by
i is converted to
parfor-loop, then each worker in the pool executes
the nested loops using the
j loop counter. The
themselves cannot run as a
parfor on each worker.
Because parallel processing incurs overhead, you must choose
carefully whether you want to convert either the inner or the outer
parfor-loop. The following example shows how
to measure the parallel overhead.
First convert only the outer
measure the computing time needed. Use
measure how much data is transferred to and from the workers in the
Run the new code, and run it again. The first run is slower than subsequent runs, because the parallel pool takes some time to start and make the code available to the workers.
A = 100; tic ticBytes(gcp); parfor i = 1:100 for j = 1:100 a(i,j) = max(abs(eig(rand(A)))); end end tocBytes(gcp) toc
BytesSentToWorkers BytesReceivedFromWorkers __________________ ________________________ 1 32984 24512 2 33784 25312 3 33784 25312 4 34584 26112 Total 1.3514e+05 1.0125e+05 Elapsed time is 14.130674 seconds.
Next convert only the inner loop to a
Measure the time needed and data transferred as in the previous case.
A = 100; tic ticBytes(gcp); for i = 1:100 parfor j = 1:100 a(i,j) = max(abs(eig(rand(A)))); end end tocBytes(gcp) toc
BytesSentToWorkers BytesReceivedFromWorkers __________________ ________________________ 1 1.3496e+06 5.487e+05 2 1.3496e+06 5.4858e+05 3 1.3677e+06 5.6034e+05 4 1.3476e+06 5.4717e+05 Total 5.4144e+06 2.2048e+06 Elapsed time is 48.631737 seconds.
If you convert the inner loop to a
both the time and amount of data transferred are much greater than
in the parallel outer loop. In this case, the elapsed time is almost
the same as in the nested
for-loop example. The
speedup is smaller than running the outer loop in parallel, because
you have more data transfer and thus more parallel overhead. Therefore
if you execute the inner loop in parallel, you
get no computational benefit compared to running the serial
If you want to reduce parallel overhead and speed up your computation, run the outer loop in parallel.
If you convert the inner loop instead,
then each iteration of the outer loop initiates a separate
That is, the inner loop conversion creates 100
Each of the multiple
parfor executions incurs overhead.
If you want to reduce parallel overhead, you should run the outer
loop in parallel instead, because overhead only occurs once.
If you want to speed up your code, always run the outer loop in parallel, because you reduce parallel overhead.
If you want to convert a nested
parfor-loop, you must ensure that your loop
variables are properly classified, see Troubleshoot Variables in parfor-Loops.
For proper variable classification, you must define the range of a
nested in a
parfor-loop by constant numbers or
variables. In the following example, the code on the left does not
work because you define the upper limit of the
by a function call. The code on the right provides a workaround by
first defining a broadcast or constant variable outside the
A = zeros(100, 200); parfor i = 1:size(A, 1) for j = 1:size(A, 2) A(i, j) = i + j; end end
A = zeros(100, 200); n = size(A, 2); parfor i = 1:size(A,1) for j = 1:n A(i, j) = i + j; end end
The index variable for the nested
must never be explicitly assigned other than in its
When using the nested
for-loop variable for indexing
the sliced array, you must use the variable in plain form, not as
part of an expression. For example, the following code on the left
does not work, but the code on the right does:
A = zeros(4, 11); parfor i = 1:4 for j = 1:10 A(i, j + 1) = i + j; end end
A = zeros(4, 11); parfor i = 1:4 for j = 2:11 A(i, j) = i + j - 1; end end
If you use a nested
for-loop to index into
a sliced array, you cannot use that array elsewhere in the
In the following example, the code on the left does not work because
sliced and indexed inside the nested
The code on the right works because
v is assigned
A outside of the nested loop:
A = zeros(4, 10); parfor i = 1:4 for j = 1:10 A(i, j) = i + j; end disp(A(i, 1)) end
A = zeros(4, 10); parfor i = 1:4 v = zeros(1, 10); for j = 1:10 v(j) = i + j; end disp(v(1)) A(i, :) = v; end
Suppose that you use multiple
(not nested inside each other) inside a
to index into a single sliced array. In this case, the
must loop over the same range of values. A sliced output variable
can be used in only one nested for-loop. In the following example,
the code on the left does not work because
over different values. The code on the right works to index different
portions of the sliced array
A = zeros(4, 10); parfor i = 1:4 for j = 1:5 A(i, j) = i + j; end for k = 6:10 A(i, k) = pi; end end
A = zeros(4, 10); parfor i = 1:4 for j = 1:10 if j < 6 A(i, j) = i + j; else A(i, j) = pi; end end end
The body of a
parfor-loop cannot make reference
to a nested function, see Nested Functions (MATLAB). However, it can call a nested function
by a function handle. Try the following example. Note that
= nfcn(idx) in the
not work. You must use
feval to invoke the
function A = pfeg function out = nfcn(in) out = 1 + in; end fcn = @nfcn; parfor idx = 1:10 A(idx) = feval(fcn, idx); end end
>> pfeg Starting parallel pool (parpool) using the 'local' profile ... connected to 4 workers. ans = 2 3 4 5 6 7 8 9 10 11
If you use function handles that refer to nested functions inside
parfor-loop, then the values of externally scoped
variables are not synchronized among the workers. For more information
on handles, see Copying Objects (MATLAB).
The body of a
parfor-loop cannot contain
spmd statement, and an
cannot contain a
You can call P-code script files from within a
but P-code script cannot contain a
However, if a script introduces a variable, you cannot call
this script from within a
The reason is that this script would cause a transparency violation.
For more details, see Ensure Transparency in parfor-Loops.