Creating a large set of random numbers
15 views (last 30 days)
Show older comments
Hello! I have the following problem. I need to generate 55 705 600 000 unique random numbers uniformly distributed from 0 to 1 (quite a lot, huh?). Usually I have a certain loop and I generate a random number on every iteration. However, that is obviously slow which is why I switched to generating all the random numbers I will need at the start of the program and putting them in a vector element. The problem is that with this huge number (once again, 55 705 600 000) I cannot put them all in one vector because I get an "Out of Memory" message. That leaves me with the only option of generating one random number per iteration of a loop which (combined with the other parts of my code) will take me about half an year to complete!!! Do you have any alternative ideas about how I can speed up the process? Thank you in advance!
Best regards
0 Comments
Accepted Answer
John D'Errico
on 21 Nov 2014
People get spoiled. They see a big, fast computer that can handle normal problems in stride, so what the heck, I want to solve a REALLY BIG problem! Then they are surprised when that computer runs out of memory or time to solve it. COMPUTERS ARE FINITE IN SIZE.
So if this large of a problem is too much for you, then get more memory. Yeah, it will take over half a terabyte of RAM, and that is just to store one vector. Don't make a copy of it, even just a temporary one. Anyway, the RAM required is your problem, since you have set your goal to solve it. The fact is, if you could solve this problem trivially, then odds are next week you will decide to try solving a problem that is 3 orders of magnitude larger yet, or more!
John's axiom: Computer programs (and the problems they solve) expand to fit the RAM available, and often just a wee bit more.
If you insist on solving this particular problem, then generate the random numbers in smaller chunks, say 1e6 of them at a time. When you use them up, generate more. Don't expect it to run much faster, since I'll bet that most of the computation time will NOT be spent generating those random numbers. Instead of a half year to finish, expect it to take 5.5 months. But again, that would be your problem, not mine. Get a faster computer, lol.
It is also true that there are often ways to optimize your code, or even reformulating your problem in a more efficient way to solve it far more rapidly. But I have no idea what you are doing, so how could I even guess?
0 Comments
More Answers (2)
Martin
on 21 Nov 2014
Edited: Martin
on 21 Nov 2014
2 Comments
Thorsten
on 21 Nov 2014
The idea is to work on each vector separately; otherwise you would still end up needing > 0.5 TB RAM...
John D'Errico
on 22 Nov 2014
NO! NEVER do that. It is just poor programming style to create many numbered variables. And as Torsten points out, it would still force you to allocate 0.5 terabyte of RAM! Unless you have a supercomputer on your desk, I doubt you have the RAM.
There is no need to create all of those arrays up front. As I explained in my answer, create ONE block of numbers. Use them one at a time. When the last element is used, create another block, overwriting the first.
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!