How to know if my data is randomly generated?

14 views (last 30 days)
nawaf
nawaf on 3 Sep 2014
Answered: Christian on 28 Jul 2015
Hi,
Thank you all in advance...
My statistics is not really good. I have a random function that generates a 10000 random digit between (0-9). So, the questions are:
Is there away I can check to verify whether the generated data is random? Should it be in normal distribution (which produce a bell shape on a histogram)?
Suggestion and resources are most welcome
  1 Comment
Guillaume
Guillaume on 3 Sep 2014
Usually, when you talk about random number generators, they generate uniform distributions.

Sign in to comment.

Answers (3)

Roger Stafford
Roger Stafford on 3 Sep 2014
You have asked whether your source of the ten possible digits is random. Obviously if you are getting many different results, there is something about that source that is "random" in a sense. It is changing with time. However, I suspect that what you are really asking is whether the sequence of varying digits is statistically "stationary" - that is, whether successive digits are independent of one another. Expressed more exactly, is the probability of getting an i followed by a j equal to the product of the probability of getting an i times the probability of getting a j for each pair i and j?
This is something you can test with your "random" function by accumulating counts of each possible pairing of successive digits, though getting significant estimates would undoubtedly require many more than a paltry 10000 samples. Collect a count in a 10 x 10 matrix where the i,j-th entry records the number of times digit j immediately followed digit i. Then ask if the total number of i digits divided by N, (the number of samples,) times the total number of j digits divided by N is approximately equal to the total number of i immediately followed by j cases divided by N. That is the essence of statistical independence.
  1 Comment
Roger Stafford
Roger Stafford on 4 Sep 2014
I should clear up a misleading assertion in my previous discussion. Actually a time series is considered stationary if its statistical distribution is constant through time. The fact of successive events being independent is a further requirement and does not follow from their merely being stationary.

Sign in to comment.


José-Luis
José-Luis on 3 Sep 2014
To know what function generates a random number is a tall order. Looking at the distribution of your values is an altogether different thing.
To look at the distribution, just generate an histogram:
doc hist
To test if it comes from a certain distribution:
doc kstest
To try and fit it to several distributions interactively:
doc dfittool
  2 Comments
nawaf
nawaf on 3 Sep 2014
So would the distribution be sufficient to say whether the data was generated randomly or not?
José-Luis
José-Luis on 3 Sep 2014
Edited: José-Luis on 3 Sep 2014
No. It would be very difficult to know how the data was generated. It might be that somebody used this or that algorithm. It might be that someone drew numbers from a hat. It might be that someone counted thunderstrikes. You would be delving into cryptanalysis, which an entirely different beast.

Sign in to comment.


Christian
Christian on 28 Jul 2015
There is actually a function for testing if data is random called runstest
But in reality it is almost impossible to prove that data is random, the best you can do is say that your numbers look quite random.

Categories

Find more on Random Number Generation in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!