Code covered by the BSD License  

Highlights from
Random Numbers from a Discrete Distribution

5.0

5.0 | 1 rating Rate this file 20 Downloads (last 30 days) File Size: 2.11 KB File ID: #34101
image thumbnail

Random Numbers from a Discrete Distribution

by Tristan Ursell

 

06 Dec 2011 (Updated 21 Mar 2012)

Simple algorithm to generate random numbers from a user-defined discrete probability distribution.

| Watch this File

File Information
Description

GENDIST - generate random numbers according to a discrete probability distribution
Tristan Ursell, 2011.

T = gendist(P,N,M)
T = gendist(P,N,M,'plot')

The function gendist(P,N,M) takes in a positive vector P whose values form a discrete probability distribution for the indices of P. The function outputs an N x M matrix of integers corresponding to the indices of P chosen at random from the given underlying distribution.

P will be normalized, if it is not normalized already. Both N and M must be greater than or equal to 1. The optional parameter 'plot' creates a plot displaying the input distribution in red and the generated points as a blue histogram.

Conceptual EXAMPLE:

If P = [0.2 0.4 0.4] (note sum(P)=1), then T can only take on values of 1, 2 or 3, corresponding to the possible indices of P. If one calls gendist(P,1,10), then on average the output T should contain two 1's, four 2's and four 3's, in accordance with the values of P, and a possible output might look like:

T = gendist(P,1,10)
T = 2 2 2 3 3 3 1 3 1 3

If, for example, P is a probability distribution for position, with positions X = [5 10 15] (does not have to be linearly spaced), then the set of generated positions is X(T).

EXAMPLE 1:

P = rand(1,50);
T = gendist(P,100,1000,'plot');

EXAMPLE 2:

X = -3:0.1:3;
P = 1+sin(X);
T = gendist(P,100,1000,'plot');
Xrand = X(T);

Required Products MATLAB
MATLAB release MATLAB 7.9 (2009b)
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (5)
07 Dec 2011 Derek O'Connor

It would be useful if the author would explain clearly what P, M, N are. For example, if P is a "discrete probability distribution for the indices of P", then why does it need to be normalized? Also I think the author means "density" or "mass function" rather than "distribution".

Is M the size of the sample and is N the length of P?

I presume the author is trying to do what this simple function does:

function S = DiscITSamp1(x,p,ns);

% Generate a random sample S of size ns
% with replacement from x(1:n) with prob. % density p(1:n), using the discrete
% inverse transform method.
% Time Complexity: O(n*ns).
% Derek O'Connor, 31 July 2011.
% derekroconnor@eircom.net
%
% USE: S = DiscITSamp1([5 7 8 11],
% [0.2 0.3 0.4 0.1],1000);hist(S)

cdf = cumsum(p);
S = zeros(1,ns);
for k = 1:ns
    u = rand;
    i = 1;
    while cdf(i) <= u
        i = i+1;
    end;
    S(k) = x(i);
end

This is three times faster than GENDIST on a 2.3GHz machine, Matlab2008a 64-bit

>> ns=10^6;n = 10^3;x=1:n;p=rand(1,n);p=p/sum(p);
>> tic;S1 = DiscITSamp1(x,p,ns);t=toc;disp([ns t])
       1e+006 5.6115

>> tic;S2 = gendist(p,1,ns);t=toc;disp([ns t])
       1e+006 19.278

07 Dec 2011 Derek O'Connor

ALSO:

 See here
 
 http://math.stackexchange.com/questions/48919/generating-a-non-uniform-discrete-random-variable

 and here
  
 http://math.stackexchange.com/questions/58060/minimize-expected-value

07 Dec 2011 Tristan Ursell

@Derek, I am aware and glad that there are other formulations of this task. 'N' and 'M' are clearly defined in the description above. There is no overt relationship between P and N & M. P is a distribution, in the mathematical sense of the word, and because it describes the probability of picking an index of P, the sum of those values must equal one, i.e. it must be normalized.

For reference:
http://en.wikipedia.org/wiki/Probability_distribution

http://en.wikipedia.org/wiki/Normalizing_constant

09 Dec 2011 Derek O'Connor

Here is a function that beats those above by a mile:

function S = DiscSampVec2(x,p,ns);
%
[~,idx] = histc(rand(1,ns),[0,cumsum(p)]);
S = x(idx);

This is a slight modification of the function on page 47, Kroese, Taimre, and Botev, Handbook of Monte Carlo Methods, Wiley, 2011.

>> ns=10^6; n=10^3; x=1:n;
>> p = rand(1,n); p = p/sum(p);

>> t2=tic;
>> S = DiscSampVec2(x,p,ns);
>> t2=toc(t2)

      0.15835

02 Mar 2012 NNNN

Thanks a lot, Derek, this one is indeed a champion !

On a 2.1GHz machine, Matlab2008a 32-bit,
I got:
t2 = 0.18
VS
t2 within 18-22 for a couple of other tested methods.
Kind regards,
N

Please login to add a comment or rating.
Updates
08 Dec 2011

minor code reorganization, a little faster, a little cleaner.

21 Mar 2012

replaced older for loop implementation with histc implementation

21 Mar 2012

plot fixed

Tag Activity for this File
Tag Applied By Date/Time
random Tristan Ursell 07 Dec 2011 10:36:11
number Tristan Ursell 07 Dec 2011 10:36:11
variable Tristan Ursell 07 Dec 2011 10:36:11
probability Tristan Ursell 07 Dec 2011 10:36:11
discrete Tristan Ursell 07 Dec 2011 10:36:11
distribution Tristan Ursell 07 Dec 2011 10:36:11
generate Tristan Ursell 07 Dec 2011 10:36:11
noise Tristan Ursell 09 Dec 2011 09:57:39
rand Tristan Ursell 22 Mar 2012 09:57:22
randn Tristan Ursell 22 Mar 2012 09:57:22
randi Tristan Ursell 22 Mar 2012 09:57:22
choose Tristan Ursell 22 Mar 2012 09:57:22
pick Tristan Ursell 22 Mar 2012 09:57:22
normrnd Tristan Ursell 22 Mar 2012 09:57:22
pick gfd 09 Apr 2012 00:46:20

Contact us at files@mathworks.com