Random Number Generation for Parallel Computing Toolbox
Show older comments
I am running monte carlo simulations and use multiple chains. To run the chains in parallel, I open a worker for each chain and use a parfor loop. The probelm is each time I run the code, the randomized initial values are the same. I have tried using the rng function but this does not seem to work when using the parallel computing toolbox. Is there a way to randomize the starting points for each matlabpool worker?
Thank you, Stephen
1 Comment
John Fox
on 20 Jul 2017
I had the exact same problem. My for loops gave a different answer than my parfor loops. The reason is
As described in Control Random Number Streams, each worker in a cluster has an independent random number generator stream. By default, therefore, each worker in a pool, and each iteration in a parfor-loop has a unique, independent set of random numbers. Subsequent runs of the parfor-loop generate different numbers.
I fixed this with rng(123,'twister'). At least this worked for me.
Answers (7)
Jill Reese
on 8 Nov 2012
1 vote
The R2012b documentation provides a section on controlling the random number streams on the client and on the workers. If it does not address your use case, that would be helpful to know so that we can improve it in future.
Best,
Jill
Peter Perkins
on 8 Nov 2012
1 vote
Just to be clear, MATLAB initializes the random number generators on each worker so that they are definitely not the same, and suitable for parallel computation. In many cases, (needing reproduceablility being one common exception), it should normally not be necessary to worry about initializing them.
It may be that something in your code is doing something to spoil that. The link Jill pointed to should help.
15 Comments
stephen
on 8 Nov 2012
Gabriele
on 26 Mar 2013
I have the very same problem with PCT in R2012a. Every time my code is run, I get the same random sequence. This is basically the usual matlab behavior if a random initialization is not provided at startup (e.g. through rng('shuffle') in startup.m).
Using rng('shuffle') in startupTask.m is not working, since two different tasks can get the same seed (tested!).
Sean de Wolski
on 26 Mar 2013
Use pctrunonall and labindex along with rng to set a different seed on each worker.
Gabriele
on 26 Mar 2013
I tried. I might be wrong, but it seems not to apply to my problem: I have a series of job(s) in a local cluster, each with different tasks.
Caroline
on 16 Apr 2013
I am also having this issue. I tried to search for what matlabpool/parfor does with different seeds and didn't find clear documentation. When I set the seed outside of the parfor loop, I get the same seed for each parfor iteration (which has nothing to do with the seed set outside the parfor loop).
Code:
[status seed] = system('od /dev/urandom --read-bytes=4 -tu | awk ''{print $2}'''); seed=str2double(seed); rng(seed); s = rng; display(s)
matlabpool(pool_size)
parfor i_pitch = 1:size(pitch_list,2) s2 = rng; display(s2) end
Result:
ans =
4079047459
Starting matlabpool using the 'local' configuration ... connected to 1 labs.
ans =
3216
ans =
3216
(I am running this code further in parallel on multiple Amazon instances which start at the same time, hence the serious attempt to get a good, time-independent seed at the beginning.)
Caroline
on 16 Apr 2013
Oh, sorry, I gave the result when I was just printing out the Seed, not the full rng state, slightly different than the code.
Caroline
on 16 Apr 2013
Ok, sorry, I guess I'm debugging this as I go along. The first seed is always 3216, the second seed is always 6433, etc. Setting rng outside of the parfor loop has no effect.
What remains unclear to me is the best way to initialize streams with different seeds for different workers. I've found two possible directions:
1 - Set different offsets for each loop, e.g.
stream = RandStream('mrg32k3a','Seed',seed);
parfor ii = 1:10
set(stream,'Substream',ii);
par(ii) = rand(stream);
end
Disadvantage: Worried that the way I use random numbers in my code, some weird symmetries between different runs could be introduced if the offsets (ii) are too small, thus it feels like bad coding practice.
(Code modified from http://blogs.mathworks.com/loren/2008/11/13/new-ways-with-random-numbers-part-ii/)
2 - Create multiple streams, e.g.
steam = RandStream.create('mrg32k3a','NumStreams',3,'StreamIndices',1);
Disadvantage: You can't set different seeds for the different streams and "The streams are not necessarily independent from streams created at other times" (from http://www.mathworks.com/help/matlab/ref/randstream.create.html)
3 - Initialize a new stream in each (parallel) iteration / different worker, e.g.
parfor ii = 1:10
sc = RandStream('CombRecursive','Seed',seed(ii));
RandStream.setGlobalStream(sc);
end
where seed is specified such that values well chosen different seeds.
Disadvantages: Need to pick a good set of seeds. Code is unclear because you call RandStream.setGlobalStream(sc), which only applies to the current worker, despite the word Global in the call. (Or I have been lead to believe so by the last example here http://www.mathworks.com/help/distcomp/control-random-number-streams.html)
--
I think 3 is the best if it works but I'd be glad to hear other opinions. Also for any Matlab developers reading this, I find this to be horribly documented. I was only able to find out the proper practice (hopefully) by seeing and searching for why things weren't working, not by reading the documentation.
Caroline
on 16 Apr 2013
And for those following the conversation I'm having with myself, I can confirm that #3 does indeed work.
Peter Perkins
on 17 Apr 2013
Caroline, first of all, as I said, unless you need to be able to reproduce results, you may not need to do anything at all. The parallel workers are set up to give you different results on each.
Second, the mrg32k3a generator is not intended to be parallelized using seeds. It has multiple streams and substreams that are specifically intended for parallel generation. You do not need to worry about correlations between streams or substreams. And each substream is VERY long. Are you really worried that you'l use up 2^57 random number in each iteration of your loop?
Use something like (1) if you really need reproduceability, but first ask yourself if you really do need reproduceable results. Or use something like (2) but initialize each worker with different stream indicesr. I really recommend not using something like (3).
@Peter: I have the feeling you are misunderstanding the question. I have the same problems and it really drives me crazy. Stephen said "each time I run the code, the randomized initial values are the same" and that is indeed a problem. We simply need on each run of the code different seeds (depending on time or whatever), comparable to
rng('shuffle');
in the non-parallel case... it's not about the different seeds among the workers.
There are couple of obscure methods that address this problem at stackexchange , but I would really appreciate having some official advise how to handle this problem without introducing hidden dependencies or whatever.
And yes, the documentation is not really helpful here.
Schuyler
on 9 Apr 2014
@peter I agree with @matheburg and would also like some guidance from mathworks. It appears that opening the matlab pool resets some global seed to some fixed value. See below for a quick example from the command line:
>> parfor i = 1:3
disp(rand(1));
end
Starting parallel pool (parpool) using the 'local' profile ... connected to 3 workers.
0.3246
0.2646
0.8847
>> matlabpool close
Parallel pool using the 'local' profile is shutting down.
>> parfor i = 1:3
disp(rand(1));
end
Starting parallel pool (parpool) using the 'local' profile ... connected to 3 workers.
0.3246
0.2646
0.8847
Michael
on 14 Sep 2014
Hello All,
I agree with @Schuyler and @matheburg. I have this problem with some Monte Carlo simulations I have been running. I would also appreciate an official answer and explanation from Mathworks.
I think a non-optimal workaround I have found is to include the rng function inside the parfor loop and explicitly seeding the random number generator with a time-based seed. However, if you only use the classic sum(100*clock) as the seed then any worker that happens to initialize at the same time will use the same random numbers. I end up using the parfor loop iteration to distinguish the seed.
Include inside parfor: rng(sum(100*clock)+i)
Like I wrote, not ideal, but I think it works. I would be interested in continued discussion.
Carl
on 5 Feb 2019
Worked perfectly for me!
Ebru Angun
on 12 Mar 2022
I have a related question for independent and reproducible random number generation in parallel computing.
1- Can I use 'Threefry' instead of 'mrg32k3a' below?
stream = RandStream('mrg32k3a','Seed',seed);
parfor ii = 1:10
set(stream,'Substream',ii);
par(ii) = rand(stream);
end
2- In the parfor loop, I am using 'normrnd' and 'mvnrnd' which need 'rng' to set the seed. How can I make the 'normrnd' and 'mvnrnd' functions use the substream on a local worker?
Thanks in advance.
Ebru
Ebru Angun
on 12 Mar 2022
If we have to run a single program on 60 different randomly generated data, is it a better idea to use rng command instead of creating 60 substreams as follows? We have 12 workers (so each time at most 12 problems can be solved), and the workers do not communicate with each other. The important issue here is to obtain 60 non-overlapping (independent) and reproducible random number streams that can be used with functions such as 'normrnd' and 'mvnrnd'. Thanks in advance.
parpool(60)
parfor i=1:60
rng(i);
r=normrnd(mu,sigma);
end
delete(gcp);
Peter Perkins
on 18 Sep 2014
If I'm understanding correctly, the problem is that, just as with ordinary non-parallel MATLAB, the random numbers on each worker are the same each time you start up (the random number generators are set up using each worker's labindex). If you are doing one calculation in one session, that's fine. But if you want to combine results of MC simulations from multiple sessions, and be able to treat them as statistically independent, then obviously that is a problem.
If that's right, then the solution is to (re)initialize the generator differently on each worker each time you start it up, using pctrunonall. "Differently on each worker each time you start it up" can be achieved using something involving 'shuffle', but it's theoretically possible to get the same initialization in two places by random chance. So a better idea is a combination of labindex and some sort of unique session number.
Just as in the serial case, you could use rng(i), where i is based on the lab index and the session number. But there are parallel generators that are designed specifically for this kind of large-scale MC simulation context: mrg32k3a and mlfg6331_64. If you know how many workers and sessions, then do something like this:
stream = RandStream.create('mrg32k3a','NumStreams',workers*sessions, ...
'StreamIndices',workers*session+worker)
That gives you statistical independence across workers, across sessions. That will work for those two generators. With a non-parallel generator like mt19937ar, your only course would be to use different seeds, but again you could base the seeds on labindex and the session number.
Hope this helps.
Daniel Golden
on 25 Feb 2015
Try something like this to shuffle the random number generator on the local worker and on all the parallel workers:
pool = gcp;
rng('shuffle'); % Shuffles on local worker only
% Shuffle on each parallel worker
seed_offset = randi(floor(intmax/10));
parfor kk = 1:pool.NumWorkers
rng(kk + seed_offset);
end
Tested on R2014b
Matteo
on 9 Mar 2016
0 votes
I experienced this problem several times. The proposed approach, that is set the seed as 100*clock was the solution. However, when the loop run fast, it is better to increase the multiplier, otherwise the seed will not change at every iteration.
Chuck
on 5 May 2016
0 votes
It does work with Parallel Computing Toolbox. Just add rng("shuffle") after the parfor line.
It might be because your version, since this post is from 2012.
Chibuzo Nnonyelu
on 10 Mar 2018
0 votes
One way to approach this is to generate the random numbers just before the parfor-loop. This may use for memory depending on the size of the parfor-loop
Categories
Find more on Parallel for-Loops (parfor) in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!