Creating a large set of random numbers

15 views (last 30 days)
Hello! I have the following problem. I need to generate 55 705 600 000 unique random numbers uniformly distributed from 0 to 1 (quite a lot, huh?). Usually I have a certain loop and I generate a random number on every iteration. However, that is obviously slow which is why I switched to generating all the random numbers I will need at the start of the program and putting them in a vector element. The problem is that with this huge number (once again, 55 705 600 000) I cannot put them all in one vector because I get an "Out of Memory" message. That leaves me with the only option of generating one random number per iteration of a loop which (combined with the other parts of my code) will take me about half an year to complete!!! Do you have any alternative ideas about how I can speed up the process? Thank you in advance!
Best regards

Accepted Answer

John D'Errico
John D'Errico on 21 Nov 2014
People get spoiled. They see a big, fast computer that can handle normal problems in stride, so what the heck, I want to solve a REALLY BIG problem! Then they are surprised when that computer runs out of memory or time to solve it. COMPUTERS ARE FINITE IN SIZE.
So if this large of a problem is too much for you, then get more memory. Yeah, it will take over half a terabyte of RAM, and that is just to store one vector. Don't make a copy of it, even just a temporary one. Anyway, the RAM required is your problem, since you have set your goal to solve it. The fact is, if you could solve this problem trivially, then odds are next week you will decide to try solving a problem that is 3 orders of magnitude larger yet, or more!
John's axiom: Computer programs (and the problems they solve) expand to fit the RAM available, and often just a wee bit more.
If you insist on solving this particular problem, then generate the random numbers in smaller chunks, say 1e6 of them at a time. When you use them up, generate more. Don't expect it to run much faster, since I'll bet that most of the computation time will NOT be spent generating those random numbers. Instead of a half year to finish, expect it to take 5.5 months. But again, that would be your problem, not mine. Get a faster computer, lol.
It is also true that there are often ways to optimize your code, or even reformulating your problem in a more efficient way to solve it far more rapidly. But I have no idea what you are doing, so how could I even guess?

More Answers (2)

Thorsten
Thorsten on 21 Nov 2014
Edited: Thorsten on 21 Nov 2014
You could split your random numbers into, e.g., 557056 vectors of 10^5 random numbers and iteratively work on each 10^5 element vector, which is probably faster than running your algorithm on one random number.

Martin
Martin on 21 Nov 2014
Edited: Martin on 21 Nov 2014
OK... Thank you for the answers, guys. Then another question comes up. Say I will split the data in multiple vectors... Is there way to name those vectors so that I can use them in a loop? Here's an example:
a1=rand(1,10^6); a2=rand(1,10^6); a3=rand(1,10^6); ...... for i=1:3 %here I want to access the vectors by their name so I want to somehow implement a variable i IN THE NAME of the vectors. Is that possible? end
I need this because these vectors will be still a big number and I can't exactly type each one separately when I want to use it... It will get too complicated. I hope you understand what I'm asking... Thanks!
  2 Comments
Thorsten
Thorsten on 21 Nov 2014
The idea is to work on each vector separately; otherwise you would still end up needing > 0.5 TB RAM...
John D'Errico
John D'Errico on 22 Nov 2014
NO! NEVER do that. It is just poor programming style to create many numbered variables. And as Torsten points out, it would still force you to allocate 0.5 terabyte of RAM! Unless you have a supercomputer on your desk, I doubt you have the RAM.
There is no need to create all of those arrays up front. As I explained in my answer, create ONE block of numbers. Use them one at a time. When the last element is used, create another block, overwriting the first.

Sign in to comment.

Categories

Find more on Mathematics in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!