Simple algorithm to generate random numbers from a userdefined discrete probability distribution.
GENDIST  generate random numbers according to a discrete probability distribution
Tristan Ursell, 2011.
T = gendist(P,N,M)
T = gendist(P,N,M,'plot')
The function gendist(P,N,M) takes in a positive vector P whose values form a discrete probability distribution for the indices of P. The function outputs an N x M matrix of integers corresponding to the indices of P chosen at random from the given underlying distribution.
P will be normalized, if it is not normalized already. Both N and M must be greater than or equal to 1. The optional parameter 'plot' creates a plot displaying the input distribution in red and the generated points as a blue histogram.
Conceptual EXAMPLE:
If P = [0.2 0.4 0.4] (note sum(P)=1), then T can only take on values of 1, 2 or 3, corresponding to the possible indices of P. If one calls gendist(P,1,10), then on average the output T should contain two 1's, four 2's and four 3's, in accordance with the values of P, and a possible output might look like:
T = gendist(P,1,10)
T = 2 2 2 3 3 3 1 3 1 3
If, for example, P is a probability distribution for position, with positions X = [5 10 15] (does not have to be linearly spaced), then the set of generated positions is X(T).
EXAMPLE 1:
P = rand(1,50);
T = gendist(P,100,1000,'plot');
EXAMPLE 2:
X = 3:0.1:3;
P = 1+sin(X);
T = gendist(P,100,1000,'plot');
Xrand = X(T);
1.7  plot fixed 

1.6  replaced older for loop implementation with histc implementation 

1.2  minor code reorganization, a little faster, a little cleaner. 
D G (view profile)
Does what it is supposed to do with good efficiency.
misbah aiad (view profile)
Very good
Radu Trimbitas (view profile)
Nice code! Instructive!
ChiFu (view profile)
Senthil (view profile)
Thanks. It helped me in my program
NNNN (view profile)
Thanks a lot, Derek, this one is indeed a champion !
On a 2.1GHz machine, Matlab2008a 32bit,
I got:
t2 = 0.18
VS
t2 within 1822 for a couple of other tested methods.
Kind regards,
N
Derek O'Connor (view profile)
Here is a function that beats those above by a mile:
function S = DiscSampVec2(x,p,ns);
%
[~,idx] = histc(rand(1,ns),[0,cumsum(p)]);
S = x(idx);
This is a slight modification of the function on page 47, Kroese, Taimre, and Botev, Handbook of Monte Carlo Methods, Wiley, 2011.
>> ns=10^6; n=10^3; x=1:n;
>> p = rand(1,n); p = p/sum(p);
>> t2=tic;
>> S = DiscSampVec2(x,p,ns);
>> t2=toc(t2)
0.15835
Tristan Ursell (view profile)
@Derek, I am aware and glad that there are other formulations of this task. 'N' and 'M' are clearly defined in the description above. There is no overt relationship between P and N & M. P is a distribution, in the mathematical sense of the word, and because it describes the probability of picking an index of P, the sum of those values must equal one, i.e. it must be normalized.
For reference:
http://en.wikipedia.org/wiki/Probability_distribution
http://en.wikipedia.org/wiki/Normalizing_constant
Derek O'Connor (view profile)
ALSO:
See here
http://math.stackexchange.com/questions/48919/generatinganonuniformdiscreterandomvariable
and here
http://math.stackexchange.com/questions/58060/minimizeexpectedvalue
Derek O'Connor (view profile)
It would be useful if the author would explain clearly what P, M, N are. For example, if P is a "discrete probability distribution for the indices of P", then why does it need to be normalized? Also I think the author means "density" or "mass function" rather than "distribution".
Is M the size of the sample and is N the length of P?
I presume the author is trying to do what this simple function does:
function S = DiscITSamp1(x,p,ns);
% Generate a random sample S of size ns
% with replacement from x(1:n) with prob. % density p(1:n), using the discrete
% inverse transform method.
% Time Complexity: O(n*ns).
% Derek O'Connor, 31 July 2011.
% derekroconnor@eircom.net
%
% USE: S = DiscITSamp1([5 7 8 11],
% [0.2 0.3 0.4 0.1],1000);hist(S)
cdf = cumsum(p);
S = zeros(1,ns);
for k = 1:ns
u = rand;
i = 1;
while cdf(i) <= u
i = i+1;
end;
S(k) = x(i);
end
This is three times faster than GENDIST on a 2.3GHz machine, Matlab2008a 64bit
>> ns=10^6;n = 10^3;x=1:n;p=rand(1,n);p=p/sum(p);
>> tic;S1 = DiscITSamp1(x,p,ns);t=toc;disp([ns t])
1e+006 5.6115
>> tic;S2 = gendist(p,1,ns);t=toc;disp([ns t])
1e+006 19.278