Thread Subject: Probability matching

Subject: Probability matching

From: Shonkhor

Date: 3 May, 2008 13:55:04

Message: 1 of 19

Can anyone help with this one?

I want to use a probability matching rule in an ideal observer model. From a non-uniform discrete probability distrubution (in this case p of angles between 1 and 360 degreed in 1 degree integers) I want to select a response angle with a probability equal to its probability in this distribution.

I hope this makes sense but any questions just ask. Does anyone have any idea how to implement this in Matlab? It has me stumped.

Cheers,

Will

Subject: Probability matching

From: ImageAnalyst

Date: 3 May, 2008 16:21:28

Message: 2 of 19

On May 3, 9:55=A0am, Shonkhor <will.al...@hotmail.com> wrote:
> Can anyone help with this one?
>
> I want to use a probability matching rule in an ideal observer model. From=
 a non-uniform discrete probability distrubution (in this case p of angles b=
etween 1 and 360 degreed in 1 degree integers) I want to select a response a=
ngle with a probability equal to its probability in this distribution. =A0
>
> I hope this makes sense but any questions just ask. Does anyone have any i=
dea how to implement this in Matlab? It has me stumped. =A0
>
> Cheers,
>
> Will

Um, not really. At least not to me. What is a "response angle"? Can
it have non-integer values? Can you just use ceil(rand()*360) to
generate random numbers between 1 and 360? I think with this, every
integer between 1 and 360 will be generated with equal probability.
Is that what you're after? If not, please explain in more detail.

Subject: Probability matching

From: alessandro mura

Date: 3 May, 2008 17:50:39

Message: 3 of 19

Hi,
generating a random variable according to a given
PDF is not trivial. Usually, I either:
1) use the Von Neuman algorithm:
x=rand
while rand max(PDF)> PDF(x)
x=rand
end

that is, chose a random point in the PDF vs x
plan: x,y=(rand, rand), and if the point is below the PDF curve, I take the
x, otherwise I chose again...


2) integrate the PDF. The IPDF always goes from 0 to 1.
Then I calculate the inverse function inIPDF
y=rand
then x=inIPDF(y).

regards

--
Alessandro Mura
Istituto Nazionale di Astrofisica - IFSI
http://pptt4.ifsi-roma.inaf.it/~mura/index.html
http://www.alessandromura.it


Subject: Probability matching

From: Roger Stafford

Date: 3 May, 2008 19:38:02

Message: 4 of 19

Shonkhor <will.allen@hotmail.com> wrote in message
<3675663.1209822934493.JavaMail.jakarta@nitrogen.mathforum.org>...
> Can anyone help with this one?
>
> I want to use a probability matching rule in an ideal observer model. From a
non-uniform discrete probability distrubution (in this case p of angles
between 1 and 360 degreed in 1 degree integers) I want to select a response
angle with a probability equal to its probability in this distribution.
>
> I hope this makes sense but any questions just ask. Does anyone have any
idea how to implement this in Matlab? It has me stumped.
>
> Cheers,
>
> Will
-------------
  If I understand you correctly, you have a set of 360 probability numbers
which are known, and you wish to set up a random selection of angles in one
degree intervals in accordance with these probabilities.

  Here is one way. Let the probabilities be given by a 360-element vector
called p. We assume that sum(p) is equal to 1. (If the p values are only
proportional to the probabilities, do this p = p/sum(p).) Set n to the desired
number of random samples.

 x = rand(n,1); % Choose n random numbers in (0,1)
 N = 360;
 a = ceil(N/2)*ones(size(x)); % Initially set all angles to 180
 b = zeros(size(x)); % Lower limit minus one
 c = N*ones(size(x)); % Upper limit
 q = cumsum(p); % Generate cumulative probabilities
 for k = 1:ceil(log2(N)) % To choose from 512 takes 9 tests
  t1 = +(x<=q(a)); % The test
  t2 = 1-t1; % The negation of the test
  c = c.*t2+a.*t1; % If x<=q(a), bring upper limit down to a
  b = b.*t1+a.*t2; % If x>q(a), bring lower limit up to a
  a = ceil((b+c)/2); % Next test half way between limits
 end

This produces an n x 1 vector, a, containing the corresponding n angles from
1 to 360 selected in accordance with the given probabilities. The method
used here is sometimes referred to as a "binary search". (Note: The "+" sign
in the t1 generation converts a logical array to a numeric one, but this may
not be necessary here.)

  Note: The values in a are all integers from 1 to 360. I can't be sure from
your description that you actually want integer-valued angles. If not, please
state precisely what it is you want.

Roger Stafford

Subject: Probability matching

From: Roger Stafford

Date: 3 May, 2008 19:56:03

Message: 5 of 19

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in
message <fvieuq$1qb$1@fred.mathworks.com>...
> ......
> for k = 1:ceil(log2(N)) % To choose from 512 takes 9 tests
> ......
------
  In the line

 for k = 1:ceil(log2(N))

I meant for the comment to say "% To choose from 360 takes 9 tests".

Roger Stafford


Subject: Probability matching

From: alessandro mura

Date: 3 May, 2008 22:33:26

Message: 6 of 19

Here is an example of the two possible methods i've mentioned.

Regards

% generation of N casual variables with a given probability distribution



N=5d4;


% x: casual variable in an interval A-B


A=0;

B=360;


dx=1;


X=[A:dx:B];


% Some Probability distribution function


p=randn(5,1);


pdf=(polyval(p,X/B)).^2;


% must be normalized to 1

pdf=pdf/sum(dx*pdf);

mpdf=max(pdf);



tic

% 1) Von Neumann method (the code can be optimized a lot!)

for i=1:N;

j=ceil(rand*(B-A)/dx);

x(i)=j*dx+A;

while pdf(j) < rand*mpdf

j=ceil(rand*(B-A)/dx);

x(i)=j*dx+A;

end

end

[Y]=hist(x,X);

subplot(2,1,1)

plot(X,Y/N,'R')

hold on

plot(X,pdf)

toc



tic

% 2) with integrated PDF


ipdf=[cumsum(pdf*dx)];


y=rand(1,N);


for i=1:N

j=find(y(i)<ipdf);

x(i)=X(j(1));

end


[Y]=hist(x,X);

subplot(2,1,2)

plot(X,Y/N,'R')

hold on

plot(X,pdf)

toc


--
Alessandro Mura
Istituto Nazionale di Astrofisica - IFSI
http://pptt4.ifsi-roma.inaf.it/~mura/index.html
http://www.alessandromura.it
"alessandro mura" <fake@address.com> wrote in message
news:fvi8lc$v0f$1@aioe.org...


> Hi,
> generating a random variable according to a given
> PDF is not trivial. Usually, I either:
> 1) use the Von Neuman algorithm:
> x=rand
> while rand max(PDF)> PDF(x)
> x=rand
> end
>
> that is, chose a random point in the PDF vs x
> plan: x,y=(rand, rand), and if the point is below the PDF curve, I take
> the x, otherwise I chose again...
>
>
> 2) integrate the PDF. The IPDF always goes from 0 to 1.
> Then I calculate the inverse function inIPDF
> y=rand
> then x=inIPDF(y).
>
> regards
>
> --
> Alessandro Mura
> Istituto Nazionale di Astrofisica - IFSI
> http://pptt4.ifsi-roma.inaf.it/~mura/index.html
> http://www.alessandromura.it
>


Subject: Probability matching

From: Simon Preston

Date: 3 May, 2008 23:56:03

Message: 7 of 19

Shonkhor <will.allen@hotmail.com> wrote in message
<3675663.1209822934493.JavaMail.jakarta@nitrogen.mathforum.org>...
> Can anyone help with this one?
>
> I want to use a probability matching rule in an ideal
observer model. From a non-uniform discrete probability
distrubution (in this case p of angles between 1 and 360
degreed in 1 degree integers) I want to select a response
angle with a probability equal to its probability in this
distribution.
>
> I hope this makes sense but any questions just ask. Does
anyone have any idea how to implement this in Matlab? It has
me stumped.
>
> Cheers,
>
> Will

For given probability vector p, a one line (though not
necessarily fast) solution is:
find(histc(rand,[0 cumsum(p)]))

Better is to draw a sample of n, in which case the number of
each type in this sample is:
x = histc(rand(1,n),[0 cumsum(p)])); x = x(1:end-1)

Best wishes, S


Subject: Probability matching

From: Roger Stafford

Date: 4 May, 2008 00:14:04

Message: 8 of 19

"alessandro mura" <fake@address.com> wrote in message <fvip7h$nus
$1@aioe.org>...
> Here is an example of the two possible methods i've mentioned.
> Regards
>
> % generation of N casual variables with a given probability distribution
> N=5d4;
>
> % x: casual variable in an interval A-B
> A=0;
> B=360;
> dx=1;
> X=[A:dx:B];
>
> % Some Probability distribution function
> p=randn(5,1);
> pdf=(polyval(p,X/B)).^2;
> % must be normalized to 1
> pdf=pdf/sum(dx*pdf);
> mpdf=max(pdf);
>
> tic
> % 1) Von Neumann method (the code can be optimized a lot!)
> for i=1:N;
> j=ceil(rand*(B-A)/dx);
> x(i)=j*dx+A;
> while pdf(j) < rand*mpdf
> j=ceil(rand*(B-A)/dx);
> x(i)=j*dx+A;
> end
> end
> ......
> toc
>
> tic
> % 2) with integrated PDF
> ipdf=[cumsum(pdf*dx)];
> y=rand(1,N);
> for i=1:N
> j=find(y(i)<ipdf);
> x(i)=X(j(1));
> end
> ......
> toc
> --
> Alessandro Mura
----------
Hello Alessandro Mura,

  In your second method involving ipdf=cumsum(pdf), the 'find' operation in
the line "j=find(y(i)<ipdf)" requires a full sweep through all of 'ipdf', making
this an order (N*M) algorithm, where M = 360. To make a fair comparison
with the first, Von Neumann rejection method, there ought to be a binary
search done for j rather than the full sweep. This would make an order
(N*log(M)) algorithm out of it.

Roger Stafford


Subject: Probability matching

From: Shonkhor

Date: 4 May, 2008 09:50:35

Message: 9 of 19

Thanks very much to all who responded. You understood my problem perfectly. All solutions seemed to be effective although I have not tested one against another rigorously. Creating each solution would have also been well beyond my current ability. I have used Roger Stafford's - can I credit your effort in a report I have for submission?
This has been a big help to me. Kind regards,

Will

Subject: Probability matching

From: Roger Stafford

Date: 4 May, 2008 18:49:03

Message: 10 of 19

Shonkhor <will.allen@hotmail.com> wrote in message
<27978954.1209894665341.JavaMail.jakarta@nitrogen.mathforum.org>...
> Thanks very much to all who responded.

  You are quite welcome.

> I have used Roger Stafford's - can I credit your effort in a report I have for
submission?

  Yes, you may do that.

Roger Stafford

Subject: Probability matching

From: alessandro mura

Date: 4 May, 2008 21:14:05

Message: 11 of 19

Dear Roger,
thank you. Yes, both methods require massive
optimizations. For example, I usually need to
generate N (millions or more) of test particles in my
montecarlo models, and I prefer the first method,
picking N/w particles per iteraction, having more than w iteractions
until I get n> N particles. This leads to a number of "good" values
slighlty higher than N,
but allows some vectorialization in the code.
The second method: I would go for an analytical (or polynomial best fit)
expression for ipdf to avoid "find". I was posting a code like that, then I
preferred the "find" option
for better explain the phylosophy of the method.

Regards

Ale
--
Alessandro Mura
Istituto Nazionale di Astrofisica - IFSI
http://pptt4.ifsi-roma.inaf.it/~mura/index.html
http://www.alessandromura.it


Subject: Probability matching

From: Roger Stafford

Date: 4 May, 2008 22:32:03

Message: 12 of 19

"alessandro mura" <fake@address.com> wrote in message <fvl8um$1np
$1@aioe.org>...
> Dear Roger,
> thank you. Yes, both methods require massive
> optimizations. For example, I usually need to
> generate N (millions or more) of test particles in my
> montecarlo models, and I prefer the first method,
> picking N/w particles per iteraction, having more than w iteractions
> until I get n> N particles. This leads to a number of "good" values
> slighlty higher than N,
> but allows some vectorialization in the code.
> The second method: I would go for an analytical (or polynomial best fit)
> expression for ipdf to avoid "find". I was posting a code like that, then I
> preferred the "find" option
> for better explain the phylosophy of the method.
>
> Regards
>
> Ale
> --
> Alessandro Mura
---------
  Yes, some kind of reasonably easy to calculate, analytical fit to the inverse
cumulative probability distribution function would be fine if it can be found.
Even if it is somewhat inaccurate, it can be used in conjunction with a
subsequent search, hopefully requiring only a very few comparisons, to find
the exact value, starting from the estimate and referring to the known cdf.
However, I think you would have trouble doing polynomial fits to inverse cdf's
for distributions with unbounded ranges.

Roger Stafford


Subject: Probability matching

From: Peter Perkins

Date: 5 May, 2008 15:54:59

Message: 13 of 19

Shonkhor wrote:
> Can anyone help with this one?
>
> I want to use a probability matching rule in an ideal observer model. From a non-uniform discrete probability distrubution (in this case p of angles between 1 and 360 degreed in 1 degree integers) I want to select a response angle with a probability equal to its probability in this distribution.
>
> I hope this makes sense but any questions just ask. Does anyone have any idea how to implement this in Matlab? It has me stumped.

Will, unless I've misunderstood your description, you have

1) the values 1:360, and
2) a probability for each of those angles.

If you have access to the Statistics Toolbox, all you need is

    randsample(1:360,p,n,true)

to generate a sample of size n, drawing values from 1:360 independently with
replacement.

Subject: Probability matching

From: Roger Stafford

Date: 5 May, 2008 18:25:05

Message: 14 of 19

Peter Perkins <Peter.PerkinsRemoveThis@mathworks.com> wrote in message
<fvnakj$17t$1@fred.mathworks.com>...
> Will, unless I've misunderstood your description, you have
>
> 1) the values 1:360, and
> 2) a probability for each of those angles.
>
> If you have access to the Statistics Toolbox, all you need is
>
> randsample(1:360,p,n,true)
>
> to generate a sample of size n, drawing values from 1:360 independently
> with replacement.
-----------
  That will teach me not to go diving into a problem without checking to see if
someone else has already done it. I have a weakness in that direction!
Anyway, Shonkhor got three versions that don't require the Statistics Toolbox.

  According to the documentation for 'randsample', the call should be made
as:

 randsample(1:360,n,true,p);

Does your argument sequence really work also?

  I do have a question about 'randsample' if it doesn't violate Mathworks'
proprietary interests. The four-argument case is markedly more difficult to
implement than the others. Does it use something like the binary search I
used to effect the inverse cumulative distribution of p, Simon's 'histc' method,
or perhaps something akin to Alessandro's acceptance-rejection method
directly on the p values? In other words, if m = 360, is it an order(n*log(m))
algorithm, order ((n+m)*log(n+m)) algorithm, or something altogether
different? Users would probably like to know this, but the documentation
doesn't say.

Roger Stafford

Subject: Probability matching

From: Peter Perkins

Date: 5 May, 2008 20:30:43

Message: 15 of 19

Roger Stafford wrote:
> Peter Perkins <Peter.PerkinsRemoveThis@mathworks.com> wrote in message
>> randsample(1:360,p,n,true)
>>
>> to generate a sample of size n, drawing values from 1:360 independently
>> with replacement.
> -----------
> That will teach me not to go diving into a problem without checking to see if
> someone else has already done it. I have a weakness in that direction!
> Anyway, Shonkhor got three versions that don't require the Statistics Toolbox.
>
> According to the documentation for 'randsample', the call should be made
> as:
>
> randsample(1:360,n,true,p);
>
> Does your argument sequence really work also?

No, and that will teach _me_ to not check the help before typing example code.
The correct argument order is what you described. Thanks.

Subject: Probability matching

From: Simon Preston

Date: 6 May, 2008 10:55:05

Message: 16 of 19


> I do have a question about 'randsample' if it doesn't
violate Mathworks'
> proprietary interests. The four-argument case is markedly
more difficult to
> implement than the others. Does it use something like the
binary search I
> used to effect the inverse cumulative distribution of p,
Simon's 'histc' method,
> or perhaps something akin to Alessandro's
acceptance-rejection method
> directly on the p values? In other words, if m = 360, is
it an order(n*log(m))
> algorithm, order ((n+m)*log(n+m)) algorithm, or something
altogether
> different? Users would probably like to know this, but
the documentation
> doesn't say.
>
> Roger Stafford
>

Having looked, it uses histc. (I promise I hadn't known
about randsample before my original reply!)

In fact, the second output of histc is the length-n vector
which indexes the bin into which each observation falls,
i.e. what Shonkhor wanted. Hence
[x,y] = histc(rand(1,n),[0 cumsum(p)]);
returns as y Shonkhor's desired sample.

As for the order of this approach, maybe Roger could
explain: are the other methods (theoretically) faster? I
suppose, though, i) the order calculations are asymptotic
results, and ii) it's hard to imagine an application where
this calculation, whichever method used, would be
rate-limiting. Still it's interesting to see there are
various nice ways to solve the same problem.

Best wishes, S




Subject: Probability matching

From: Roger Stafford

Date: 6 May, 2008 20:32:03

Message: 17 of 19

"Simon Preston" <preston.simon+mathsworks@gmail.com> wrote in
message <fvpde9$rm2$1@fred.mathworks.com>...
> Having looked, it uses histc. (I promise I hadn't known
> about randsample before my original reply!)
>
> In fact, the second output of histc is the length-n vector
> which indexes the bin into which each observation falls,
> i.e. what Shonkhor wanted. Hence
> [x,y] = histc(rand(1,n),[0 cumsum(p)]);
> returns as y Shonkhor's desired sample.
>
> As for the order of this approach, maybe Roger could
> explain: are the other methods (theoretically) faster? I
> suppose, though, i) the order calculations are asymptotic
> results, and ii) it's hard to imagine an application where
> this calculation, whichever method used, would be
> rate-limiting. Still it's interesting to see there are
> various nice ways to solve the same problem.
>
> Best wishes, S
---------
Hello Simon,

  In any discussion of algorithmic order it needs to be understood that, as you
say, this has only to do with asymptotic times of execution. I initially favored
the binary search approach with its order (n*log(m)) to provide for cases of
large values of both m and n. It would be better than (n+m)*log(n+m) for
very large n and better than n*m for very large m. (I am guessing 'histc' is
order ((n+m)*log(n+m)) because they may do a initial sort of combined edges
and points. However, perhaps they have a simpler order (n*m) algorithm
instead, as my own 'hist' does.) Whether it would be faster for numbers likely
to be used is quite another question and requires experimentation. I did
have the distinct impression from Shonkhor's mention of distributions that a
very large value of n might be needed and the value m = 360 was evidently
large.

  Whatever is true here, you did certainly provide Shonkhor with a very fine
one-liner that doesn't require the Statistics Toolbox, Simon. When I saw it, I
said to myself, "now why didn't I think of that?"

Roger Stafford


Subject: Probability matching

From: alessandro mura

Date: 7 May, 2008 17:45:25

Message: 18 of 19

> However, I think you would have trouble doing polynomial fits to inverse
> cdf's
> for distributions with unbounded ranges.
>
> Roger Stafford

Exactly. Usually what I need to simulate are random
distribution of velocity or energy for a test particle, and in fact they
have (almost) never upper bound.
What I sometimes do (when it is not a maxwellian, of course), is to split
the problem in two parts:
I use a Neumann rejection for the main part of the distribution, for example
E<E0when this method has a reasonable
fraction of "good" guesses (to avoid wasting machine time), and I add a
"high energy tail" (E>E0) with a polynomial fit of the inverse CDF.

Ciao
--
Alessandro Mura
Istituto Nazionale di Astrofisica - IFSI
http://pptt4.ifsi-roma.inaf.it/~mura/index.html
http://www.alessandromura.it


Subject: Probability matching

From: Shonkhor

Date: 19 May, 2008 18:12:38

Message: 19 of 19

Hi all,

Just looked back at the thread to see what had happened since my problem was initially solved. I'm both annoyed and delighted that it can be solved in one line. Very neat, thanks. I guess I have to keep learning everything each function can achieve. Cheers,

Will

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

Public Submission Policy

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Disclaimer prior to use.

Contact us at files@mathworks.com