Path: news.mathworks.com!not-for-mail
From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Random number choice according to probability distribution
Date: Wed, 7 Mar 2012 20:53:12 +0000 (UTC)
Organization: The MathWorks, Inc.
Lines: 26
Message-ID: <jj8hro$c0q$1@newscl01ah.mathworks.com>
References: <jj5t0t$d5f$1@newscl01ah.mathworks.com> <jj62ua$432$1@newscl01ah.mathworks.com> <jj7i9c$l0s$1@newscl01ah.mathworks.com>
Reply-To: <HIDDEN>
NNTP-Posting-Host: www-01-blr.mathworks.com
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: newscl01ah.mathworks.com 1331153592 12314 172.30.248.46 (7 Mar 2012 20:53:12 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Wed, 7 Mar 2012 20:53:12 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 1187260
Xref: news.mathworks.com comp.soft-sys.matlab:760268

"Emerson" wrote in message <jj7i9c$l0s$1@newscl01ah.mathworks.com>...
> Hi Roger,
> I've already tried this, following your sugestion from an older message in this forum. My code is like this:
> 
> c = cumsum(p);
> r = rand(1,1);
> e = [0,c];
> [~,bin] = histc(r,e);
> x = s(bin);
> 
> But running the routine several times, I noticed that sometimes the distribution of the random choices doesn't follow the probability distribution. For example, sometimes the 2nd or the 3rd values are chosen more frequently than the 1st value of vector 's'. I'm expecting something like
> 
> nc_s(1) > nc_s(2) > nc_s(3) > ... > nc_s(19)
> 
> and instead of this, sometimes I have an output like
> 
> nc_s(2) > nc_s(1) > nc_s(3) > ... > nc_s(19)
> 
> (where nc_p(k) represents how many times the k-th value from vector 's' has ben chosen).
> Am I doing something wrong?
> 
> Thanks.
- - - - - - - - -
  Emerson, you might be surprised at how large your sample size would have to be in order to be reasonably sure that the first element of 's' whose probability is 0.0848 occurs more often than the second element whose probability is a smaller 0.0843 .  To simplify the calculation I eliminated the other 17 elements and calculated such a probability with two elements with probabilities in the same ratio: 0.0848/(0.0848+0.0843) and 0.0843/(0.0848+0.0843).  After 8190 samples the probability is still about 0.4 that the second, less likely, one will nevertheless be in the majority!  With all 19 present you would have to take a far greater number of samples than this to even achieve this same result.  That means for it to be fairly unlikely to ever see such a reversal of these particular two elements among your 19 you would have to have hundreds of thousands or possibly millions of samples.

Roger Stafford