Path: news.mathworks.com!not-for-mail From: <HIDDEN> Newsgroups: comp.soft-sys.matlab Subject: Re: Generation of random number from Define probability Date: Thu, 19 Feb 2009 08:52:02 +0000 (UTC) Organization: The MathWorks, Inc. Lines: 25 Message-ID: <gnj6ji$9i6$1@fred.mathworks.com> References: <gnepo2$18j$1@fred.mathworks.com> <gnf7ru$fg5$1@fred.mathworks.com> <gnivii$de5$1@fred.mathworks.com> Reply-To: <HIDDEN> NNTP-Posting-Host: webapp-05-blr.mathworks.com Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: 8bit X-Trace: fred.mathworks.com 1235033522 9798 172.30.248.35 (19 Feb 2009 08:52:02 GMT) X-Complaints-To: news@mathworks.com NNTP-Posting-Date: Thu, 19 Feb 2009 08:52:02 +0000 (UTC) X-Newsreader: MATLAB Central Newsreader 1187260 Xref: news.mathworks.com comp.soft-sys.matlab:519354 "Matt Fig" <spamanon@yahoo.com> wrote in message <gnivii$de5$1@fred.mathworks.com>... > "Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message > > Please forgive me if this is totally wrong, (I admit I hated probability and statistics back in college) but why wouldn't something simple like this work: > > p = [.679 .179 .129 .013]'; p = p/sum(p); % They must have a sum of one > n = 20000000; > [ps,idx] = sort(p); > ps = cumsum(ps); > x = rand(n,1); > E = x; > E(x<=ps(1)) = idx(1); > for ii = 2:length(p) > E(x>=ps(ii-1) & x<=ps(ii) ) = idx(ii); > end > > It seems pretty fast for large n, and I don't see the reason why this wouldn't work. It looks as though it would work all right, Matt, but I don't see the point in doing the sort. Since all the elements of p must be non-negative, after doing the cumsum you would necessarily have an ascending (or at least a non-descending) sequence in 'ps' anyway, and the inequality checks you have would still work. Also it wouldn't then be necessary to use 'idx' indexing; you could insert the ii's directly into E, as is. This algorithm as well as Jos' 'randp' are order (n*m) where m is the number of elements in p and n the length of x. For small m like the 4 in this example, one of these would certainly be the best way to go. However, as I said earlier, when m is large, which is what I had in mind with the for-loop stuff, the order n*log(m) binary search should eventually win out even if carried out iteratively in a for-loop. Try timing both of them with m = 2^16 along with a large n as well, and see if the binary search method wouldn't come in first. The binary search for-loop would be taking a lazy 16 trips against a frantic 65,536 for-loop trips for the serial search through ps. In both cases vectors of length n are being processed each time, though of course in the binary search they are admittedly more complicated. When trying to simulate accurately a probability density function f(x) with high resolution, one could conceivably use just such a binary search technique with very high values of m to accomplish the task rather than things like attempting to compute analytically the inverse of the desired cumulative distribution function along with the use of 'rand'. In a sense one could consider this binary search method a kind of fast table look-up technique as opposed to computing values of the inverse cdf. Roger Stafford