Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Generation of random number from Define probability

Subject: Generation of random number from Define probability

From: Salman Siddiqui

Date: 17 Feb, 2009 16:48:02

Message: 1 of 14

Hi
Dear All

If I have 4 things which have following probability

Probability of 1 = 0.679,

Probability of 2 = 0.179,

Probability of 3 = 0.129,

Probability of 4 = 0.013


Can I generate any scheme by which I can get the same result...

I meant to say how can I manage to get the same result?? OR Can it manage by the generation of random variables but how???

Plz guide me in this regard....

Thanks

Subject: Generation of random number from Define probability

From: Jomar Bueyes

Date: 17 Feb, 2009 17:35:08

Message: 2 of 14

On Feb 17, 11:48=A0am, "Salman Siddiqui" <sad_withoutfri...@yahoo.com>
wrote:
> Hi
> Dear All
>
> If I have 4 things which have following probability
>
> Probability of 1 =3D 0.679,
>
> Probability of 2 =3D 0.179,
>
> Probability of 3 =3D 0.129,
>
> Probability of 4 =3D 0.013
>
> Can I generate any scheme by which I can get the same result...
>
> I meant to say how can I manage to get the same result?? OR Can it manage=
 by the generation of random variables but how???
>
> Plz guide me in this regard....
>
> Thanks

Hi Salman,

To generate a pseudo random number (PRN) Y with the above
distribution, start with X =3D rand(1). This is a PRN 0 <=3D X <=3D 1.
If X <=3D 0.679, then Y =3D 1;
If 0.679 < X <=3D 0.679+0.179 then Y =3D 2
if 0.679+0.179 < X <=3D 0.679+0.179+0.129, then Y =3D 3,
etc

(This is called the rejection method)

HTH

Jomar

Subject: Generation of random number from Define probability

From: Roger Stafford

Date: 17 Feb, 2009 20:49:02

Message: 3 of 14

"Salman Siddiqui" <sad_withoutfriend@yahoo.com> wrote in message <gnepo2$18j$1@fred.mathworks.com>...
> ......
> If I have 4 things which have following probability
>
> Probability of 1 = 0.679,
> Probability of 2 = 0.179,
> Probability of 3 = 0.129,
> Probability of 4 = 0.013
>
> Can I generate any scheme by which I can get the same result...
> I meant to say how can I manage to get the same result?? OR Can it manage by the generation of random variables but how???
> .....

  For the general case when your desired probabilities are given in a column vector p of m elements, you can generate a column of n random integers, e, ranging from 1 to m - your "things" - in accordance with the probabilities in p as follows:

% Generate probability vector
p = rand(100,1); p = p/sum(p); % They must have a sum of one
n = 8000; % Choose how many random integers you want

% The Engine
x = rand(n,1);
y = cumsum(p);
m = size(p,1);
f = ones(size(x));
e = m*f;
for k = 1:ceil(log2(m))
 c = floor((f+e)/2);
 t = x > y(c);
 f = f + t.*(c-f);
 e = c + t.*(e-c);
end

The n-element column vector, e, will contain the desired random integers.

  The algorithm here is order n*log(m) which would be useful if m is large. It might not be the right method with only the four probabilities you mentioned.

Roger Stafford

Subject: Generation of random number from Define probability

From: Salman Siddiqui

Date: 18 Feb, 2009 19:58:02

Message: 4 of 14

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gnf7ru$fg5$1@fred.mathworks.com>...
> "Salman Siddiqui" <sad_withoutfriend@yahoo.com> wrote in message <gnepo2$18j$1@fred.mathworks.com>...
> > ......
> > If I have 4 things which have following probability
> >
> > Probability of 1 = 0.679,
> > Probability of 2 = 0.179,
> > Probability of 3 = 0.129,
> > Probability of 4 = 0.013
> >
> > Can I generate any scheme by which I can get the same result...
> > I meant to say how can I manage to get the same result?? OR Can it manage by the generation of random variables but how???
> > .....
>
> For the general case when your desired probabilities are given in a column vector p of m elements, you can generate a column of n random integers, e, ranging from 1 to m - your "things" - in accordance with the probabilities in p as follows:
>
> % Generate probability vector
> p = rand(100,1); p = p/sum(p); % They must have a sum of one
> n = 8000; % Choose how many random integers you want
>
> % The Engine
> x = rand(n,1);
> y = cumsum(p);
> m = size(p,1);
> f = ones(size(x));
> e = m*f;
> for k = 1:ceil(log2(m))
> c = floor((f+e)/2);
> t = x > y(c);
> f = f + t.*(c-f);
> e = c + t.*(e-c);
> end
>
> The n-element column vector, e, will contain the desired random integers.
>
> The algorithm here is order n*log(m) which would be useful if m is large. It might not be the right method with only the four probabilities you mentioned.
>
> Roger Stafford


Really Thanks for your kind replies.

I got a lot of clues from your kind and gentle ideas

Really thanks...

Subject: Generation of random number from Define probability

From: Salman Siddiqui

Date: 18 Feb, 2009 19:58:02

Message: 5 of 14

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gnf7ru$fg5$1@fred.mathworks.com>...
> "Salman Siddiqui" <sad_withoutfriend@yahoo.com> wrote in message <gnepo2$18j$1@fred.mathworks.com>...
> > ......
> > If I have 4 things which have following probability
> >
> > Probability of 1 = 0.679,
> > Probability of 2 = 0.179,
> > Probability of 3 = 0.129,
> > Probability of 4 = 0.013
> >
> > Can I generate any scheme by which I can get the same result...
> > I meant to say how can I manage to get the same result?? OR Can it manage by the generation of random variables but how???
> > .....
>
> For the general case when your desired probabilities are given in a column vector p of m elements, you can generate a column of n random integers, e, ranging from 1 to m - your "things" - in accordance with the probabilities in p as follows:
>
> % Generate probability vector
> p = rand(100,1); p = p/sum(p); % They must have a sum of one
> n = 8000; % Choose how many random integers you want
>
> % The Engine
> x = rand(n,1);
> y = cumsum(p);
> m = size(p,1);
> f = ones(size(x));
> e = m*f;
> for k = 1:ceil(log2(m))
> c = floor((f+e)/2);
> t = x > y(c);
> f = f + t.*(c-f);
> e = c + t.*(e-c);
> end
>
> The n-element column vector, e, will contain the desired random integers.
>
> The algorithm here is order n*log(m) which would be useful if m is large. It might not be the right method with only the four probabilities you mentioned.
>
> Roger Stafford


Really Thanks for your kind replies.

I got a lot of clues from your kind and gentle ideas

Really thanks...

Subject: Generation of random number from Define probability

From: Jos

Date: 18 Feb, 2009 20:37:01

Message: 6 of 14

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gnf7ru$fg5$1@fred.mathworks.com>...
> "Salman Siddiqui" <sad_withoutfriend@yahoo.com> wrote in message <gnepo2$18j$1@fred.mathworks.com>...
> > ......
> > If I have 4 things which have following probability
> >
> > Probability of 1 = 0.679,
> > Probability of 2 = 0.179,
> > Probability of 3 = 0.129,
> > Probability of 4 = 0.013
> >
> > Can I generate any scheme by which I can get the same result...
> > I meant to say how can I manage to get the same result?? OR Can it manage by the generation of random variables but how???
> > .....
>
> For the general case when your desired probabilities are given in a column vector p of m elements, you can generate a column of n random integers, e, ranging from 1 to m - your "things" - in accordance with the probabilities in p as follows:
>
> % Generate probability vector
> p = rand(100,1); p = p/sum(p); % They must have a sum of one
> n = 8000; % Choose how many random integers you want
>
> % The Engine
> x = rand(n,1);
> y = cumsum(p);
> m = size(p,1);
> f = ones(size(x));
> e = m*f;
> for k = 1:ceil(log2(m))
> c = floor((f+e)/2);
> t = x > y(c);
> f = f + t.*(c-f);
> e = c + t.*(e-c);
> end
>
> The n-element column vector, e, will contain the desired random integers.
>
> The algorithm here is order n*log(m) which would be useful if m is large. It might not be the right method with only the four probabilities you mentioned.
>
> Roger Stafford

A vectorized solution is offered by RANDP:

% data
  Prob = [0.679 0.179 0.129 0.013] ;
  N = 100000 ;
% engine
  R = randp(Prob,N,1) ;
% results
  Pobserved = histc(R,1:4)./ N ;
  plot([0 1],[0 1],'k-', Prob,Pobserved,'bo') ; % almost on a straight line ...
  disp([Prob(:) Pobserved])

RANDP can be found here:
http://www.mathworks.com/matlabcentral/fileexchange/8891

hth
Jos

Subject: Generation of random number from Define probability

From: Roger Stafford

Date: 19 Feb, 2009 04:30:04

Message: 7 of 14

"Jos " <#10584@fileexchange.com> wrote in message <gnhrhd$ll7$1@fred.mathworks.com>...
> A vectorized solution is offered by RANDP:
> ......

Hello Jos,

  I wrote the for-loop procedure above to provide an order (n*log(m)) algorithm for large values of m. It is a simple binary search for the appropriate value among m values, and for a sufficiently large m it is bound to go faster than any order (n*m) algorithm, the for-loop structure notwithstanding. However, it defeated my attempts to vectorize it and eliminate the for-loop structure. Do you have any ideas for doing that which would allow it to remain an order (n*log(m)) algorithm?

Roger Stafford

Subject: Generation of random number from Define probability

From: Matt Fig

Date: 19 Feb, 2009 06:52:02

Message: 8 of 14

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message

Please forgive me if this is totally wrong, (I admit I hated probability and statistics back in college) but why wouldn't something simple like this work:

p = [.679 .179 .129 .013]'; p = p/sum(p); % They must have a sum of one
n = 20000000;
[ps,idx] = sort(p);
ps = cumsum(ps);
x = rand(n,1);
E = x;
E(x<=ps(1)) = idx(1);
for ii = 2:length(p)
   E(x>=ps(ii-1) & x<=ps(ii) ) = idx(ii);
end

It seems pretty fast for large n, and I don't see the reason why this wouldn't work.




h]LJJRJhPKhXoY^QK]JoVJQXbhNNLJhWY]6_.hXXNhUT)QvJB#Rh^UNXVWX

Subject: Generation of random number from Define probability

From: Bruno Luong

Date: 19 Feb, 2009 07:53:02

Message: 9 of 14

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gnin8c$35u$1@fred.mathworks.com>...
> "Jos " <#10584@fileexchange.com> wrote in message <gnhrhd$ll7$1@fred.mathworks.com>...
> > A vectorized solution is offered by RANDP:
> > ......
>
> Hello Jos,
>
> I wrote the for-loop procedure above to provide an order (n*log(m)) algorithm for large values of m. It is a simple binary search for the appropriate value among m values, and for a sufficiently large m it is bound to go faster than any order (n*m) algorithm, the for-loop structure notwithstanding. However, it defeated my attempts to vectorize it and eliminate the for-loop structure. Do you have any ideas for doing that which would allow it to remain an order (n*log(m)) algorithm?
>

Hi Roger,

What about using sort of concatenated bins and data. This should give you
(m+n)*log(m+n) complexity. Much better than (m*n); but not quite (n*log(m)).

I also bump into this problem previously, and so far leave my dichotomy search loop un-vectorized.

Bruno

Subject: Generation of random number from Define probability

From: Bruno Luong

Date: 19 Feb, 2009 08:09:01

Message: 10 of 14

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <gnj34u$rl2$1@fred.mathworks.com>...

>
> Hi Roger,
>
> What about using sort of concatenated bins and data. This should give you
> (m+n)*log(m+n) complexity. Much better than (m*n); but not quite (n*log(m)).
>
> I also bump into this problem previously, and so far leave my dichotomy search loop un-vectorized.
>
> Bruno

There is also the INTERP1(..., 'nearest') which in principle can achieve in (n*log(m)), but my experience is this function is poorly implemented for speed.

Bruno

Subject: Generation of random number from Define probability

From: Jos

Date: 19 Feb, 2009 08:24:01

Message: 11 of 14

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <gnin8c$35u$1@fred.mathworks.com>...
> "Jos " <#10584@fileexchange.com> wrote in message <gnhrhd$ll7$1@fred.mathworks.com>...
> > A vectorized solution is offered by RANDP:
> > ......
>
> Hello Jos,
>
> I wrote the for-loop procedure above to provide an order (n*log(m)) algorithm for large values of m. It is a simple binary search for the appropriate value among m values, and for a sufficiently large m it is bound to go faster than any order (n*m) algorithm, the for-loop structure notwithstanding. However, it defeated my attempts to vectorize it and eliminate the for-loop structure. Do you have any ideas for doing that which would allow it to remain an order (n*log(m)) algorithm?
>
> Roger Stafford

Hello Roger (and others!),

I know too little about algorithm complexities, so I am wondering about the use of HISTC in this context:

p = [.679 .179 .129 .013]'; p = p/sum(p);
N = 10000 ;
[dum,RandomIntegers] = histc(rand(N,1),[0 ; cumsum(p)]) ;

Best,
Jos

Subject: Generation of random number from Define probability

From: us

Date: 19 Feb, 2009 08:33:01

Message: 12 of 14

"Salman Siddiqui"
> If I have 4 things which have following probability
> Probability of 1 = 0.679,
> Probability of 2 = 0.179,
> Probability of 3 = 0.129,
> Probability of 4 = 0.013
> Can I generate any scheme by which I can get the same result...

one of the many solutions is outlined below (copy/paste)
in addition, if you own the stats tbx, look at

     help randsample;

% the snippet
% the data
     nr=100000; % <- #samples
     d=[.1,.5,.1,.3]; % <- the distribution: sum==1!
     ds=cumsum([0,d]);
% - show dist
     subplot(2,1,1);
     stairs(ds(2:end),'o-');
     set(gca,...
          'xlim',[.5,numel(d)+.5],...
          'xtick',1:numel(d),...
          'xticklabel',d);
     ylabel('cumulative frequency');
     title('distribution');
% the engine
% - get rands according to dist
     [n,ix]=histc(rand(1,nr),ds);
% - the random numbers according to your dist
     rn=d(ix);
% - prepare the hist
     n=n(1:end-1); % we never get rands == 1
     ns=cumsum(n);
     nsm=max(ns);
     n=n./nsm;
     ns=ns./nsm;
% the result
     subplot(2,1,2);
     bar(n);
     hold on;
     line(1:numel(n),ns,...
         'marker','s',...
         'markerfacecolor',[0,0,0],...
         'color',[0,0,0]);
     set(gca,'xticklabel',d);
     xlabel('probability distribution');
     ylabel('frequency / cumulative frequency');
     title(sprintf('%d samples',nr));

us

Subject: Generation of random number from Define probability

From: Roger Stafford

Date: 19 Feb, 2009 08:52:02

Message: 13 of 14

"Matt Fig" <spamanon@yahoo.com> wrote in message <gnivii$de5$1@fred.mathworks.com>...
> "Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message
>
> Please forgive me if this is totally wrong, (I admit I hated probability and statistics back in college) but why wouldn't something simple like this work:
>
> p = [.679 .179 .129 .013]'; p = p/sum(p); % They must have a sum of one
> n = 20000000;
> [ps,idx] = sort(p);
> ps = cumsum(ps);
> x = rand(n,1);
> E = x;
> E(x<=ps(1)) = idx(1);
> for ii = 2:length(p)
> E(x>=ps(ii-1) & x<=ps(ii) ) = idx(ii);
> end
>
> It seems pretty fast for large n, and I don't see the reason why this wouldn't work.

  It looks as though it would work all right, Matt, but I don't see the point in doing the sort. Since all the elements of p must be non-negative, after doing the cumsum you would necessarily have an ascending (or at least a non-descending) sequence in 'ps' anyway, and the inequality checks you have would still work. Also it wouldn't then be necessary to use 'idx' indexing; you could insert the ii's directly into E, as is.

  This algorithm as well as Jos' 'randp' are order (n*m) where m is the number of elements in p and n the length of x. For small m like the 4 in this example, one of these would certainly be the best way to go. However, as I said earlier, when m is large, which is what I had in mind with the for-loop stuff, the order n*log(m) binary search should eventually win out even if carried out iteratively in a for-loop. Try timing both of them with m = 2^16 along with a large n as well, and see if the binary search method wouldn't come in first. The binary search for-loop would be taking a lazy 16 trips against a frantic 65,536 for-loop trips for the serial search through ps. In both cases vectors of length n are being processed each time, though of course in the binary search they are admittedly more complicated.

  When trying to simulate accurately a probability density function f(x) with high resolution, one could conceivably use just such a binary search technique with very high values of m to accomplish the task rather than things like attempting to compute analytically the inverse of the desired cumulative distribution function along with the use of 'rand'. In a sense one could consider this binary search method a kind of fast table look-up technique as opposed to computing values of the inverse cdf.

Roger Stafford

Subject: Generation of random number from Define probability

From: Bruno Luong

Date: 19 Feb, 2009 22:25:03

Message: 14 of 14

Hi Roger and all,

I just submitted this in FEX. The function does not seem to overrun histc, but beats interp1 by 5 times in my laptop. Complexity is known since we can see what under the hood.

function idx=find_idx(xi, xgrid, options, InputCheck)
% function idx=find_idx(xi, xgrid)
% function idx=find_idx(xi, xgrid, 'extrap')
%
% Purpose: bins of array 'xi' in the container 'xgrid'
%
% INPUT
% xgrid: is a sorted array (ascending or descending)
% xi: array of points
% CALL idx = find_idx(..., InputCheck)
% InputCheck [false]: optional boolean flag, force check whereas
% xgrid is sorted
% OUTPUT
% idx: fractional indices of xi in xgrid, same size as xi
% - The first element starts from 1
% - The fractional part is calculated as linear interpolation
% between two brakets
%
% Note 1: This function is equivalent to
% idx = interp1(xgrid,(1:length(xgrid)),xi,'linear',options)
% except that overflow values are clipped (interp1 returns NaN).
% Speed improvement is about 5 times
% Note 2: round(idx) is equivalent to interp1(..., 'nearest')
% Note 3: floor(idx) is equivalent to [trash idx] = histc(xi, xgrid)
% except overflow values
%
% Call mex function findidxmex.c,
% recompile it for other platform than Win32 (mex -O -v findidxmex.c)
%
% Example: bining 1 billions data points into 1000 intervals
% xgrid=cumsum(3*rand(1,1000));
% xi=rand(1,1e6)*2000;
% idx=find_idx(xi, xgrid,'extrap');
%
% Author: Bruno Luong
% Original: 19/Feb/2009

if nargin<4 || isempty(InputCheck)
    InputCheck=false; % Not checking by default
end

if InputCheck
    if ~isvector(xgrid) || (~issorted(xgrid) && ~issorted(-xgrid))
        error('find_idx: xgrid must be sorted vector');
    elseif any(diff(xgrid)==0)
        error('find_idx: xgrid elements must be distinct');
    end
end

nx=length(xgrid);

descending = (nx>0) && xgrid(nx)<xgrid(1);
if descending % decending
    xgrid = xgrid(end:-1:1); % flip
end

% Cast if necessary
if ~strcmp(class(xgrid),'double')
    xgrid = double(xgrid);
end
if ~strcmp(class(xi),'double')
    xi = double(xi);
end

% Call the mex engine (dichotomy search)
idx=findidxmex(xgrid(:),xi(:));
if isempty(idx) && ~isempty(xi) % problem
    error('find_idx: out of memory');
end

if descending % decending
    idx = nx+1-idx;
end

if nargin<3 || ~strcmpi(options,'extrap')
    idx=max(min(idx,nx),1); % clipping
end

% Reshape the output
idx=reshape(idx,size(xi));

% End of find_idx

//////////////////////////////////////////////////////////////////////////
// mex function findidxmex.c
// called by find_idx.m
// Dichotomy search of indices
// Calling:
// idx=findidxmex(xgrid, xi);
// where xgrid and xi are double column vectors
// xgrid must be sorted in the ascending order
// Compile:
// mex -O -v findidxmex.c
// Author: Bruno Luong
// Original: 19/Feb/2009
//////////////////////////////////////////////////////////////////////////

#include "mex.h"
#include "matrix.h"

// Gateway routine

void mexFunction(int nlhs, mxArray *plhs[],int nrhs, const mxArray *prhs[])
{
    const mxArray *xi, *xgrid;
    mxArray *idx;
    size_t nx, m, k, i1, i9, imid;
    double *xiptr, *xgridptr, *idxptr;
    double xik;
    mwSize dims[2];
    
    // Get inputs and dimensions
    xgrid = prhs[0];
    nx = mxGetM(xgrid);
    xi = prhs[1];
    m = mxGetM(xi);
    
    // Create output idx
    dims[0] = m; dims[1] = 1;
    plhs[0] = idx = mxCreateNumericArray(2, dims, mxDOUBLE_CLASS, mxREAL);
    if (idx==NULL) // Cannot allocate memory
    {
        // Return empty array
        dims[0] = 0; dims[1] = 0;
        plhs[0] = mxCreateNumericArray(2, dims, mxDOUBLE_CLASS, mxREAL);
        return;
    }
    idxptr = mxGetPr(idx);
    
    // Get pointers
    xiptr = mxGetPr(xi);
    xgridptr = mxGetPr(xgrid);
  
    // Loop over the points
    for (k=m; k--;) // Reverse of for (k=0; k<m; k++) {...}
    {
        // Get data value
        xik = xiptr[k];
        
        i1=0;
        i9=nx-1;
        while (i9>i1+1) // Dichotomy search
        {
            imid = (i1+i9+1)/2;
            if (xgridptr[imid]<xik)
                i1=imid;
            else
                i9=imid;
        } // of while loop
        if (i1==i9)
            idxptr[k] = (double)(i1+1);
        else
            idxptr[k] = (double)(i1+1) + (xik-xgridptr[i1])/(xgridptr[i9]-xgridptr[i1]);
    } // for loop
    
    return;
        
}

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us