Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
faster implementation of discrete random number generator?

Subject: faster implementation of discrete random number generator?

From: Misha Koshelev

Date: 24 May, 2009 23:31:01

Message: 1 of 9

Seem like this is where most time is spent according to profiler:

function samples = discreternd(p,N)
% DISCRETERND Sample from a discrete distribution.
% SAMPLES = DISCRETERND(p,N) Return N elements from the discrete
% distribution with individual probabilities p.
%

    sum = [0 cumsum(p)];
    sum = sum ./ max(sum);

    samples = zeros(1,N);
    u = rand(N,1);
    for i=1:length(sum)-1
        samples((u > sum(i)) & (u <= sum(i+1))) = i;
    end

Thank you
Misha

Subject: faster implementation of discrete random number generator?

From: Roger Stafford

Date: 25 May, 2009 00:59:02

Message: 2 of 9

"Misha Koshelev" <mk144210@bcm.edu> wrote in message <gvclbk$39o$1@fred.mathworks.com>...
> Seem like this is where most time is spent according to profiler:
>
> function samples = discreternd(p,N)
> % DISCRETERND Sample from a discrete distribution.
> % SAMPLES = DISCRETERND(p,N) Return N elements from the discrete
> % distribution with individual probabilities p.
> %
>
> sum = [0 cumsum(p)];
> sum = sum ./ max(sum);
>
> samples = zeros(1,N);
> u = rand(N,1);
> for i=1:length(sum)-1
> samples((u > sum(i)) & (u <= sum(i+1))) = i;
> end
>
> Thank you
> Misha

  If m is the number of elements in your p vector, the algorithm you describe is of order m*N. For each of the m values of i in the for-loop, the formation of the logical vector

 (u > sum(i)) & (u <= sum(i+1))

takes N steps in scanning through vector u, giving m*N steps altogether.

  However, locating the appropriate value to assign each of the N samples variable ought to be only an order log(m) operation and therefore an order N*log(m) algorithm. Some time ago I wrote a procedure that would do just that, using a binary search method which would be of this latter kind, but I would have to do some searching to find it if you are interested.

  Whether such an alternative would prove to be appreciably faster depends strongly on how large m is. Just how large is the m (=length(p)) you contemplate using?

Roger Stafford

Subject: faster implementation of discrete random number generator?

From: Roger Stafford

Date: 25 May, 2009 01:50:03

Message: 3 of 9

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gvcqgm$seu$1@fred.mathworks.com>...
> ........
> Some time ago I wrote a procedure that would do just that, using a binary search method which would be of this latter kind, but I would have to do some searching to find it if you are interested.
> ........

  I did manage to locate that procedure I mentioned. It is contained in a thread entitled "Generation of random number from Define probability" dated Tue, 17 Feb 2009. In it I wrote this:

 x = rand(n,1);
 y = cumsum(p);
 m = size(p,1);
 f = ones(size(x));
 e = m*f;
 for k = 1:ceil(log2(m))
  c = floor((f+e)/2);
  t = x > y(c);
  f = f + t.*(c-f);
  e = c + t.*(e-c);
 end

Here p is the same as the p in your discreternd and n is the same as N. The vector e is the same result as in discreternd's 'samples'.

  You will note that the for-loop here takes about log2(m) steps and each one deals with arrays of length n, giving it an order log(m)*n algorithm. Whether it is actually faster depends on whether the smaller number of for-loop steps makes up for the larger amount of processing within the loop, and that in turn depends very much on the size of m.

  You may be interested in the rather extensive discussion that took place on the subject in the above thread.

  By the way, I am curious as to where you came across your particular 'discreternd' function. I couldn't find mention of it anywhere doing a Google search, though there is another one which obtains a different result that is also called 'discreternd'.

Roger Stafford

Subject: faster implementation of discrete random number generator?

From: Misha Koshelev

Date: 25 May, 2009 02:12:01

Message: 4 of 9

Thank you very much. I hope your function will be useful.. actually my N is fairly small but I call the function quite a bit so we'll see...

As for my discreternd function I believe it is the same one you found on Google but with some modifications I have made to it. I should probably credit the original author but did not really mean to post it publicly or anything I just noticed its where most time is spent.

Thank you again
Misha

"Misha Koshelev" <mk144210@bcm.edu> wrote in message <gvclbk$39o$1@fred.mathworks.com>...
> Seem like this is where most time is spent according to profiler:
>
> function samples = discreternd(p,N)
> % DISCRETERND Sample from a discrete distribution.
> % SAMPLES = DISCRETERND(p,N) Return N elements from the discrete
> % distribution with individual probabilities p.
> %
>
> sum = [0 cumsum(p)];
> sum = sum ./ max(sum);
>
> samples = zeros(1,N);
> u = rand(N,1);
> for i=1:length(sum)-1
> samples((u > sum(i)) & (u <= sum(i+1))) = i;
> end
>
> Thank you
> Misha

Subject: faster implementation of discrete random number generator?

From: Roger Stafford

Date: 25 May, 2009 03:05:02

Message: 5 of 9

"Misha Koshelev" <mk144210@bcm.edu> wrote in message <gvcuph$dvv$1@fred.mathworks.com>...
> Thank you very much. I hope your function will be useful.. actually my N is fairly small but I call the function quite a bit so we'll see...
> ........

  It is not N that is crucial to speed here but that of m=length(p). Its length - that is the number of discrete points in your distribution - is what determines whether it pays to choose an order log(m)*N algorithm as opposed to just an order m*N algorithm.

  I should mention that the code I sent you assumes that p has already been normalized to a sum of 1 and possesses only positive elements.

Roger Stafford

Subject: faster implementation of discrete random number generator?

From: John D'Errico

Date: 25 May, 2009 03:09:02

Message: 6 of 9

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gvctgb$rff$1@fred.mathworks.com>...
> "Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gvcqgm$seu$1@fred.mathworks.com>...
> > ........
> > Some time ago I wrote a procedure that would do just that, using a binary search method which would be of this latter kind, but I would have to do some searching to find it if you are interested.
> > ........
>
> I did manage to locate that procedure I mentioned. It is contained in a thread entitled "Generation of random number from Define probability" dated Tue, 17 Feb 2009. In it I wrote this:
>
> x = rand(n,1);
> y = cumsum(p);
> m = size(p,1);
> f = ones(size(x));
> e = m*f;
> for k = 1:ceil(log2(m))
> c = floor((f+e)/2);
> t = x > y(c);
> f = f + t.*(c-f);
> e = c + t.*(e-c);
> end
>
> Here p is the same as the p in your discreternd and n is the same as N. The vector e is the same result as in discreternd's 'samples'.
>
> You will note that the for-loop here takes about log2(m) steps and each one deals with arrays of length n, giving it an order log(m)*n algorithm. Whether it is actually faster depends on whether the smaller number of for-loop steps makes up for the larger amount of processing within the loop, and that in turn depends very much on the size of m.
>
> You may be interested in the rather extensive discussion that took place on the subject in the above thread.
>
> By the way, I am curious as to where you came across your particular 'discreternd' function. I couldn't find mention of it anywhere doing a Google search, though there is another one which obtains a different result that is also called 'discreternd'.
>
> Roger Stafford

The downside is Roger's code is extremely slow. He
has an old release of MATLAB that does not have histc,
which was not introduced until roughly version 5.3.

n = 100000;
p = rand(1000,1);
p = p/sum(p);

tic
x = rand(n,1);
y = cumsum(p);
m = size(p,1);
f = ones(size(x));
e = m*f;
for k = 1:ceil(log2(m))
  c = floor((f+e)/2);
  t = x > y(c);
  f = f + t.*(c-f);
  e = c + t.*(e-c);
end
toc

Elapsed time is 0.909336 seconds.

tic
edges = cumsum([0;p]);
[junk,E] = histc(x,edges);
toc

Elapsed time is 0.042687 seconds.

See that both code fragments generate the same
result. Just that histc is far better at what it does.

>> e(1:10)
ans =
    59
   288
   850
   970
   374
   978
   379
   736
   915
   502
>> E(1:10)
ans =
    59
   288
   850
   970
   374
   978
   379
   736
   915
   502

John

Subject: faster implementation of discrete random number generator?

From: Roger Stafford

Date: 25 May, 2009 14:49:55

Message: 7 of 9

"John D'Errico" <woodchips@rochester.rr.com> wrote in message <gvd24e$j6$1@fred.mathworks.com>...
> .........
> n = 100000;
> p = rand(1000,1);
> p = p/sum(p);
>
> tic
> x = rand(n,1);
> y = cumsum(p);
> m = size(p,1);
> f = ones(size(x));
> e = m*f;
> for k = 1:ceil(log2(m))
> c = floor((f+e)/2);
> t = x > y(c);
> f = f + t.*(c-f);
> e = c + t.*(e-c);
> end
> toc
>
> Elapsed time is 0.909336 seconds.
>
> tic
> edges = cumsum([0;p]);
> [junk,E] = histc(x,edges);
> toc
>
> Elapsed time is 0.042687 seconds.
> ......

  Yes, you are quite right, John. My thinking was not clear on this problem.

  For a single value in x and many in y, a binary search method would be a decent method of solving the problem. However for large numbers of values in both x and y, if a sort is performed on the combined x and y quantities, an order log(n+m) algorithm is possible. This can make a dramatic improvement over a merely order n*log(m) algorithm, as the numbers you have used in your example, n=100000 and m=1000, clearly illustrate.

Roger Stafford

Subject: faster implementation of discrete random number generator?

From: Roger Stafford

Date: 25 May, 2009 15:44:02

Message: 8 of 9

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gveb6j$qq1$1@fred.mathworks.com>...
> .......
> However for large numbers of values in both x and y, if a sort is performed on the combined x and y quantities, an order log(n+m) algorithm is possible.
> .......

  Hmm! My thinking wasn't clear on that response either. The sorting I mentioned would give an (n+m)*log(n+m) algorithm, not log(n+m). I am not sure now how 'histc' manages such a dramatic improvement.

Roger Stafford

Subject: faster implementation of discrete random number generator?

From: Bruno Luong

Date: 25 May, 2009 16:15:05

Message: 9 of 9

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gveec2$ejh$1@fred.mathworks.com>...
> "Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gveb6j$qq1$1@fred.mathworks.com>...
> > .......
> > However for large numbers of values in both x and y, if a sort is performed on the combined x and y quantities, an order log(n+m) algorithm is possible.
> > .......
>
> Hmm! My thinking wasn't clear on that response either. The sorting I mentioned would give an (n+m)*log(n+m) algorithm, not log(n+m). I am not sure now how 'histc' manages such a dramatic improvement.

Hi Roger,

I guess the difference come mainly from a built-in dichotomy search and the Matlab inefficient FOR loop. If you have MEX running, you might try my MEX code I have implemented the binary search:

http://www.mathworks.com/matlabcentral/fileexchange/23049

Bruno

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us