| Statistics Toolbox™ | ![]() |
| On this page… |
|---|
Random number generators for supported distributions are discussed in Random Number Generators.
A GUI for generating random numbers from supported distributions is discussed in Random Number Generation Tool.
This section discusses additional topics in random number generation.
In statistics, randomness is a notion commonly associated with groups of, rather than individual, samples. More accurately, it is a notion associated with a process that produces samples. D. H. Lehmer, an early pioneer in computing, said:
A random sequence is a vague notion... in which each term is unpredictable to the uninitiated and whose digits pass a certain number of tests traditional with statisticians...
The notion of randomness is formalized by the Central Limit Theorem, which says that sample averages from a random process always approximate a normal distribution. This formalization is central to the theory of statistical estimation.
Pseudo-random numbers are generated by deterministic algorithms. They are "random" in the sense that, on average, they pass statistical tests regarding their distribution and correlation. Random number generators (RNGs), like those in MATLAB and Statistics Toolbox software, are algorithms for generating pseudo-random numbers with a specified distribution.
Methods for generating pseudo-random numbers usually start with uniform random numbers, like those produced by the MATLAB rand function. Random numbers from other distributions are produced using the methods described in this section.
Direct methods make direct use of the definition of the distribution.
For example, consider binomial random numbers. A binomial random number is the number of heads in N tosses of a coin with probability p of a heads on any single toss. If you generate N uniform random numbers on the interval (0,1) and count the number less than p, then the count is a binomial random number with parameters N and p.
The following function is a simple implementation of a binomial RNG using this direct approach:
function X = directbinornd(N,p,m,n)
X = zeros(m,n); % Preallocate memory
for i = 1:m*n
u = rand(N,1);
X(i) = sum(u < p);
endFor example,
X = directbinornd(100,0.3,1e4,1); hist(X,101) set(get(gca,'Children'),'FaceColor',[.8 .8 1])

The Statistics Toolbox function binornd uses a modified direct method, based on the definition of a binomial random variable as the sum of Bernoulli random variables.
The method above is easily converted to a random number generator for the Poisson distribution with parameter λ. The Poisson distribution is the limiting case of the binomial distribution as N approaches infinity, p approaches zero, and Np is held fixed at λ. To generate Poisson random numbers, create a version of the above generator that inputs λ rather than N and p, and internally sets N to some large number and p to λ / N.
The Statistics Toolbox function poissrnd actually uses two direct methods: a waiting time method for small values of λ, and a method due to Ahrens and Dieter for larger values of λ.
Inversion methods are based on the observation that continuous cumulative distribution functions (cdfs) range uniformly over the interval (0,1). If u is a uniform random number on (0,1), then a random number X from a continuous distribution with specified cdf F is obtained using X = F -1(U).
For example, the following code generates random numbers from a specific exponential distribution using the inverse cdf and the MATLAB uniform random number generator rand:
mu = 1; X = expinv(rand(1e4,1),mu);
To compare the distribution of the generated random numbers to the pdf of the specified exponential distribution, the pdf, with area = 1, must be scaled to the area of the histogram used to display the distribution:
numbins = 50; hist(X,numbins) set(get(gca,'Children'),'FaceColor',[.8 .8 1]) hold on [bincounts,binpositions] = hist(X,numbins); binwidth = binpositions(2) - binpositions(1); histarea = binwidth*sum(bincounts); x = binpositions(1):0.001:binpositions(end); y = exppdf(x,mu); plot(x,histarea*y,'r','LineWidth',2)

Inversion methods also work for discrete distributions. To generate a random number X from a discrete distribution with probability mass vector P(X = xi) = pi where x0 < x1 < x2 < ... , generate a uniform random number u on (0,1) and then set X = xi if F(xi–1) < u < F(xi).
For example, the following function implements an inversion method for a discrete distribution with probability mass vector p:
function X = discreteinvrnd(p,m,n)
X = zeros(m,n); % Preallocate memory
for i = 1:m*n
u = rand;
I = find(u < cumsum(p));
X(i) = min(I);
endUse the function to generate random numbers from any discrete distribution:
p = [0.1 0.2 0.3 0.2 0.1 0.1]; % Probability mass vector X = discreteinvrnd(p,1e4,1); [n,x] = hist(X,length(p)); bar(1:length(p),n) set(get(gca,'Children'),'FaceColor',[.8 .8 1])

The functional form of some distributions makes it difficult or time-consuming to generate random numbers using direct or inversion methods. Acceptance-rejection methods provide an alternative in these cases.
Acceptance-rejection methods begin with uniform random numbers, but require an additional random number generator. If the goal is to generate a random number from a continuous distribution with pdf f, acceptance-rejection methods first generate a random number from a continuous distribution with pdf g satisfying f (x) ≤ cg (x) for some c and all x.

A continuous acceptance-rejection RNG proceeds as follows:
Choose a density g.
Find a constant c such that f (x) / g (x) ≤ c for all x.
Generate a uniform random number u.
Generate a random number v from g.
If c*u ≤ f (v) / g (v) , accept and return v.
Otherwise, reject v and go to 3.
For efficiency, a "cheap" method is required for generating random numbers from g, and the scalar c should be small. The expected number of iterations to produce a single random number is c.
The following function implements an acceptance-rejection method for generating random numbers from pdf f, given f, g, the RNG grnd for g, and the constant c:
function X = accrejrnd(f,g,grnd,c,m,n)
X = zeros(m,n); % Preallocate memory
for i = 1:m*n
accept = false;
while accept == false
u = rand();
v = grnd();
if c*u <= f(v)/g(v)
X(i) = v;
accept = true;
end
end
endFor example, the function f (x) = xe–x2/2 satisfies the conditions for a pdf on [0,∞) (nonnegative and integrates to 1). The exponential pdf with mean 1, f (x) = e–x, dominates g for c greater than about 2.2. Thus, you can use rand and exprnd to generate random numbers from f :
f = @(x)x.*exp(-(x.^2)/2); g = @(x)exp(-x); grnd = @()exprnd(1); X = accrejrnd(f,g,grnd,2.2,1e4,1);
The pdf f is actually a Rayleigh distribution with shape parameter 1. The following compares the distribution of random numbers generated by the acceptance-rejection method with those generated by raylrnd:
Y = raylrnd(1,1e4,1);
hist([X Y])
h = get(gca,'Children');
set(h(1),'FaceColor',[.8 .8 1])
legend('A-R RNG','Rayleigh RNG')

The Statistics Toolbox function raylrnd uses a transformation method, expressing a Rayleigh random variable in terms of a chi-square random variable, which can be computed using randn.
Acceptance-rejection methods also work for discrete distributions. In this case, the goal is to generate random numbers from a distribution with probability mass Pp(X = i) = pi, assuming you have a method for generating random numbers from a distribution with probability mass Pq(X = i) = qi. The RNG proceeds as follows:
Choose a density Pq.
Find a constant c such that pi / qi ≤ c for all i .
Generate a uniform random number u.
Generate a random number v from Pq.
If c*u ≤ pv / qv , accept and return v.
Otherwise, reject v and go to 3.
The methods discussed in Common Generation Methods may be inadequate when sampling distributions are difficult to represent in computations. Such distributions arise, for example, in Bayesian data analysis and in the large combinatorial problems of Markov chain Monte Carlo (MCMC) simulations. An alternative is to construct a Markov chain with a stationary distribution equal to the target sampling distribution, using the states of the chain to generate random numbers after an initial burn-in period in which the state distribution converges to the target.
The Metropolis-Hastings algorithm draws samples from a distribution that is only known up to a constant. Random numbers are generated from a distribution with a probability density function that is equal to or proportional to a proposal function.
To generate random numbers:
Assume an initial value x(t).
Draw a sample, y(t), from a proposal distribution q(y | x(t)).
Accept y(t) as the next sample x(t+1) with probability r(x(t),y(t)), and keep x(t) as the next sample x(t+1) with probability 1–r(x(t),y(t)), where:
![]()
Increment t → t+1, and repeat steps 2 and 3 until the desired number of samples are obtained.
You can generate random numbers using the Metropolis-Hastings method with the mhsample function. To produce quality samples efficiently with Metropolis-Hastings algorithm, it is crucial to select a good proposal distribution. If it is difficult to find an efficient proposal distribution, you can use the slice sampling algorithm without explicitly specifying a proposal distribution.
In instances where it is difficult to find an efficient Metropolis-Hastings proposal distribution, there are a few algorithms, such as the slice sampling algorithm, that do not require an explicit specification for the proposal distribution. The slice sampling algorithm draws samples from the region under the density function using a sequence of vertical and horizontal steps. First, it selects a height at random between 0 and the density function f (x). Then, it selects a new x value at random by sampling from the horizontal "slice" of the density above the selected height. A similar slice sampling algorithm is used for a multivariate distribution.
If a function f (x) proportional to the density function is given, then do the following to generate random numbers:
Assume an initial value x(t) within the domain of f (x).
Draw a real value y uniformly from (0,f (x(t))), thereby defining a horizontal "slice" as S = {x: y < f (x)}.
Find an interval I = (L,R) around x(t) that contains all, or much of the "slice" S.
Draw the new point x(t+1) within this interval.
Increment t → t+1 and repeat steps 2 through 4 until the desired number of samples are obtained.
Slice sampling can generate random numbers from a distribution with an arbitrary form of the density function, provided that an efficient numerical procedure is available to find the interval I = (L,R), which is the "slice" of the density.
You can generate random numbers using the slice sampling method with the slicesample function.
Quasi-random number generators (QRNGs) produce highly uniform samples of the unit hypercube. They are designed to minimize the discrepancy between the distribution of generated points and a distribution with equal proportions of points in each sub-cube of a uniform partition of the hypercube. As a result, QRNGs systematically fill the "holes" in any initial segment of the generated quasi-random sequence.
Unlike the pseudo-random sequences described in the Introduction to common generation methods, quasi-random sequences fail many statistical tests for randomness. Approximating true randomness, however, is not their goal. Quasi-random sequences seek to fill space uniformly, and to do so in such a way that initial segments approximate this behavior up to a specified density.
Applications of QRNGs include:
Quasi-Monte Carlo (QMC) integration. Monte Carlo techniques are often used to evaluate difficult, multi-dimensional integrals without a closed-form solution. QMC uses quasi-random sequences to improve the convergence properties of these techniques.
Space-filling experimental designs. In many experimental settings, taking measurements at every factor setting is expensive or infeasible. Quasi-random sequences provide efficient, uniform sampling of the design space.
Global optimization. Optimization algorithms typically find a local optimum in the neighborhood of an initial value. By using a quasi-random sequence of initial values, searches for global optima uniformly sample the basins of attraction of all local minima.
Introduction. Statistics Toolbox functions support quasi-random sequences of the following types:
Halton sequences. Produced by the haltonset function. These sequences use different prime bases to form successively finer uniform partitions of the unit interval in each dimension.
Sobol sequences. Produced by the sobolset function. These sequences use a base of 2 to form successively finer uniform partitions of the unit interval, and then reorder the coordinates in each dimension.
Latin hypercube sequences. Produced by the lhsdesign function. Though not quasi-random in the sense of minimizing discrepancy, these sequences nevertheless produce sparse uniform samples useful in experimental designs.
Quasi-random sequences are functions from the positive integers to the unit hypercube. To be useful in application, an initial point set of a sequence must be generated. Point sets are matrices of size n-by-d, where n is the number of points and d is the dimension of the hypercube being sampled. The functions haltonset and sobolset construct point sets with properties of a specified quasi-random sequence. Initial segments of the point sets are generated by the net method of the qrandset class (parent class of the haltonset class and sobolset class), but points can be generated and accessed more generally using parenthesis indexing.
Because of the way in which quasi-random sequences are generated, they may contain undesirable correlations, especially in their initial segments, and especially in higher dimensions. To address this issue, quasi-random point sets often skip, leap over, or scramble values in a sequence. The haltonset and sobolset functions allow you to specify both a Skip and a Leap property of a quasi-random sequence, and the scramble method of the qrandset class allows you apply a variety of scrambling techniques. Scrambling reduces correlations while also improving uniformity.
Example: Quasi-Random Point Sets. For example, the following uses haltonset to construct a 2-dimensional Halton point set—an object p of the haltonset class—that skips the first 1000 values of the sequence and then retains every 101st point:
p = haltonset(2,'Skip',1e3,'Leap',1e2)
p =
Halton point set in 2 dimensions (8.918019e+013 points)
Properties:
Skip : 1000
Leap : 100
ScrambleMethod : none
The object p encapsulates properties of the specified quasi-random sequence. The point set is finite, with a length determined by the Skip and Leap properties and by limits on the size of point set indices (maximum value of 253).
Use scramble to apply reverse-radix scrambling:
p = scramble(p,'RR2')
p =
Halton point set in 2 dimensions (8.918019e+013 points)
Properties:
Skip : 1000
Leap : 100
ScrambleMethod : RR2
Use net to generate the first 500 points:
X0 = net(p,500);
This is equivalent to:
X0 = p(1:500,:);
Values of the point set X0 are not generated and stored in memory until you access p using net or parenthesis indexing.
To appreciate the nature of quasi-random numbers, create a scatter of the two dimensions in X0:
scatter(X0(:,1),X0(:,2),5,'r')
axis square
title('{\bf Quasi-Random Scatter}')

Compare this to a scatter of uniform pseudo-random numbers generated by the MATLAB rand function:
X = rand(500,2);
scatter(X(:,1),X(:,2),5,'b')
axis square
title('{\bf Uniform Random Scatter}')

The quasi-random scatter appears more uniform, avoiding the clumping in the pseudo-random scatter.
In a statistical sense, quasi-random numbers are too uniform to pass traditional tests of randomness. For example, a Kolmogorov-Smirnov test, performed by kstest, is used to assess whether or not a point set has a uniform random distribution. When performed repeatedly on uniform pseudo-random samples, such as those generated by rand, the test produces a uniform distribution of p-values:
nTests = 1e5;
sampSize = 50;
PVALS = zeros(nTests,1);
for test = 1:nTests
x = rand(sampSize,1);
[h,pval] = kstest(x,[x,x]);
PVALS(test) = pval;
end
hist(PVALS,100)
set(get(gca,'Children'),'FaceColor',[.8 .8 1])
xlabel('{\it p}-values')
ylabel('Number of Tests')

The results are quite different when the test is performed repeatedly on uniform quasi-random samples:
p = haltonset(1,'Skip',1e3,'Leap',1e2);
p = scramble(p,'RR2');
nTests = 1e5;
sampSize = 50;
PVALS = zeros(nTests,1);
for test = 1:nTests
x = p(test:test+(sampSize-1),:);
[h,pval] = kstest(x,[x,x]);
PVALS(test) = pval;
end
hist(PVALS,100)
set(get(gca,'Children'),'FaceColor',[.8 .8 1])
xlabel('{\it p}-values')
ylabel('Number of Tests')

Small p-values call into question the null hypothesis that the data are uniformly distributed. If the hypothesis is true, about 5% of the p-values are expected to fall below 0.05. The results are remarkably consistent in their failure to challenge the hypothesis.
Introduction. Quasi-random streams, produced by the qrandstream function, are used to generate sequential quasi-random outputs, rather than point sets of a specific size. Streams are used like pseudo-RNGS, such as rand, when client applications require a source of quasi-random numbers of indefinite size that can be accessed intermittently. Properties of a quasi-random stream, such as its type (Halton or Sobol), dimension, skip, leap, and scramble, are set when the stream is constructed.
In implementation, quasi-random streams are essentially very large quasi-random point sets, though they are accessed differently. The state of a quasi-random stream is the scalar index of the next point to be taken from the stream. Use the qrand method of the qrandstream class to generate points from the stream, starting from the current state. Use the reset method to reset the state to 1. Unlike point sets, streams do not support parenthesis indexing.
Example: Quasi-Random Streams. For example, the following code, taken from the example at the end of Quasi-Random Point Sets, uses haltonset to create a quasi-random point set p, and then repeatedly increments the index into the point set, test, to generate different samples:
p = haltonset(1,'Skip',1e3,'Leap',1e2);
p = scramble(p,'RR2');
nTests = 1e5;
sampSize = 50;
PVALS = zeros(nTests,1);
for test = 1:nTests
x = p(test:test+(sampSize-1),:);
[h,pval] = kstest(x,[x,x]);
PVALS(test) = pval;
end
The same results are obtained by using qrandstream to construct a quasi-random stream q based on the point set p and letting the stream take care of increments to the index:
p = haltonset(1,'Skip',1e3,'Leap',1e2);
p = scramble(p,'RR2');
q = qrandstream(p)
nTests = 1e5;
sampSize = 50;
PVALS = zeros(nTests,1);
for test = 1:nTests
X = qrand(q,sampSize);
[h,pval] = kstest(X,[X,X]);
PVALS(test) = pval;
end
![]() | Multivariate Modeling | Hypothesis Tests | ![]() |

Includes the most popular MATLAB recorded presentations with Q&A sessions led by MATLAB experts.
| © 1984-2009- The MathWorks, Inc. - Site Help - Patents - Trademarks - Privacy Policy - Preventing Piracy - RSS |