Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Compute a bivariate probability mass function on a predefined grid without loops

Subject: Compute a bivariate probability mass function on a predefined grid without loops

From: Massimo

Date: 20 Apr, 2012 16:55:12

Message: 1 of 11

Hi,
I have a problem very similar to the one I posted some time ago here:

http://www.mathworks.it/matlabcentral/newsreader/view_thread/311682#849815

the main difference is that now I am dealing with a 2D grid and therefore the density needs to be distributed in 4 points to obtain a discretized pmf (you can think of it as a 2D histogram).

More precisely I have a predefined equally-spaced 2D grid of values over two vectors k,m , say:
k=linspace(k_min,k_max,N);
m=linspace(m_min,m_max,N);
[K, M]=ndgrid(k,m);

I also have 2 grids of values Sk and Sm of the same size (N*N) which are functions of k and m and whose values are suche that:
max(Sk(:))<k_max;
max(Sm(:))<m_max;
min(Sk(:))>k_min;
min(Sm(:))>m_min;

What I want to obtain is a joint probability mass function f(k,m) such that the joint density associated to each point (Sk(i,j),Sm(i,j)) in the domain spanned by k and m, is distributed to the grid points of the rectangle that contains (Sk(i,j),Sm(i,j)) according to its relative distance from such values.
In other words, suppose (Sk(i,j),Sm(i,j)) is contained in the rectangle: k(l),k(l+1),m(r),m(r+1),
then
define first:
a=Sk(i,j)-k(l);
b=Sm(i,j)-m(r);
c=k(l+1)-Sk(i,j);
d=m(r+1)-Sm(i,j);
and:
A=sqrt(a^2+b^2);
B=sqrt(c^2+b^2);
C=sqrt(c^2+d^2);
D=sqrt(a^2+d^2);
hence:

f(k(l),m(r))=A/(A+B+C+D);
f(k(l+1),m(r))=B/(A+B+C+D);
f(k(l+1),m(r+1))=C/(A+B+C+D);
f(k(l),m(r+1))=D/(A+B+C+D);

I am clearly able to compute f(k,m) using for loops, but since N is pretty large (between 5000 and 10000) and since this computation is inside an fsolve I am looking for a smart way to vectorize it.
I hope to have described the problem accurately, if you have questions please don't hesitate to ask them.
Thank you,
Massimo

Subject: Compute a bivariate probability mass function on a predefined grid without loops

From: Roger Stafford

Date: 21 Apr, 2012 03:48:17

Message: 2 of 11

"Massimo" <giovanma@bc.edu> wrote in message <jms4dg$7db$1@newscl01ah.mathworks.com>...
> I have a problem very similar to the one I posted some time ago here:
>
> http://www.mathworks.it/matlabcentral/newsreader/view_thread/311682#849815
> ..........
> A=sqrt(a^2+b^2);
> B=sqrt(c^2+b^2);
> C=sqrt(c^2+d^2);
> D=sqrt(a^2+d^2);
> hence:
>
> f(k(l),m(r))=A/(A+B+C+D);
> f(k(l+1),m(r))=B/(A+B+C+D);
> f(k(l+1),m(r+1))=C/(A+B+C+D);
> f(k(l),m(r+1))=D/(A+B+C+D);
> .........
- - - - - - - - -
  Your method of weighting here seems inconsistent with the intention established in your thread 311682 of Aug. 17, 2011. In that thread when s was equal to a(j) then a(j) received a weight of one and adjacent points weights of zero. In this thread when Sk = k(l) and Sm = m(r), then the point (k(l),m(r)) receives a weight of zero with the three other rectangle points receiving all the weight. As you pass over a point the particular three rectangular points receiving all the weights takes a discontinuous jump to a different rectangle. Why is this? Your weighting in accordance with relative Euclidean distances seems very strange.

  I would have thought you would make the following definitions:

 a = Sk(i,j)-k(l);
 b = Sm(i,j)-m(r);
 c = k(l+1)-Sk(i,j);
 d = m(r+1)-Sm(i,j);

 f(k(l),m(r)) = (c*d)/((a+c)*(b+d));
 f(k(l+1),m(r)) = (a*d)/((a+c)*(b+d));
 f(k(l+1),m(r+1)) = (a*b)/((a+c)*(b+d));
 f(k(l),m(r+1)) = (b*c)/((a+c)*(b+d));

This way when {Sk,Sm) coincides with a point of (k,m), that point would receive a weight of one and the surrounding grid points would all have weights of zero from (Sk,Sm). As you vary (Sk,Sm) the weighting would vary without discontinuous jumps.

Roger Stafford

Subject: Compute a bivariate probability mass function on a predefined grid without loops

From: Massimo

Date: 21 Apr, 2012 11:05:19

Message: 3 of 11

"Roger Stafford" wrote in message <jmtam1$e0q$1@newscl01ah.mathworks.com>...
> "Massimo" <giovanma@bc.edu> wrote in message <jms4dg$7db$1@newscl01ah.mathworks.com>...

> Your method of weighting here seems inconsistent with the intention established in your thread 311682 of Aug. 17, 2011. In that thread when s was equal to a(j) then a(j) received a weight of one and adjacent points weights of zero. In this thread when Sk = k(l) and Sm = m(r), then the point (k(l),m(r)) receives a weight of zero with the three other rectangle points receiving all the weight. As you pass over a point the particular three rectangular points receiving all the weights takes a discontinuous jump to a different rectangle. Why is this? Your weighting in accordance with relative Euclidean distances seems very strange.
>
> I would have thought you would make the following definitions:
>
> a = Sk(i,j)-k(l);
> b = Sm(i,j)-m(r);
> c = k(l+1)-Sk(i,j);
> d = m(r+1)-Sm(i,j);
>
> f(k(l),m(r)) = (c*d)/((a+c)*(b+d));
> f(k(l+1),m(r)) = (a*d)/((a+c)*(b+d));
> f(k(l+1),m(r+1)) = (a*b)/((a+c)*(b+d));
> f(k(l),m(r+1)) = (b*c)/((a+c)*(b+d));
>
> This way when {Sk,Sm) coincides with a point of (k,m), that point would receive a weight of one and the surrounding grid points would all have weights of zero from (Sk,Sm). As you vary (Sk,Sm) the weighting would vary without discontinuous jumps.
>
> Roger Stafford

Roger,
you are absolutely right!! Thanks for spotting my dumb mistake!!
I don't know honestly how I came up with the idea of redistributing weights according to the euclidean distance...maybe I was just very tired:)
What I intended to obtain is exactly what you suggested.
Now the point is to figure out how to do it on Matlab in a smart way..

Subject: Compute a bivariate probability mass function on a predefined grid without loops

From: Bruno Luong

Date: 21 Apr, 2012 12:23:07

Message: 4 of 11

p = interp1(k,1:length(k),Sk);
q = interp1(m,1:length(m),Sm);
l = min(floor(p),N-1);
r = min(floor(q),N-1);
u = p-l;
v = q-r;

% Not sure how you can put those values in a single function f as the thread suggests

f00 = (1-u).*(1-v) % f(k(l),m(r))
f10 = u.*(1-v) % f(k(l+1),m(r))
f01 = u.*v % f(k(l+1),m(r+1))
f11 = (1-u).*v % f(k(l),m(r+1))

% Bruno

Subject: Compute a bivariate probability mass function on a predefined grid without loops

From: Roger Stafford

Date: 21 Apr, 2012 14:51:24

Message: 5 of 11

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <jmu8rb$6fh$1@newscl01ah.mathworks.com>...
> p = interp1(k,1:length(k),Sk);
> q = interp1(m,1:length(m),Sm);
- - - - - - - - - -
  Hi Bruno. Why is it necessary to call on 'interp1'? Can't we just say:

 p = (Sk-k(1))*((N-1)/(k(N)-k(1)))+1;
 q = (Sm-m(1))*((N-1)/(m(N)-m(1)))+1;

which takes advantage of the assumption of uniform spacing in k and m.

Roger Stafford

Subject: Compute a bivariate probability mass function on a predefined grid without loops

From: Bruno Luong

Date: 21 Apr, 2012 15:33:11

Message: 6 of 11

Sure Roger. INTERP1 has no advantage whatsoever but clarity of the coding.

Bruno

Subject: Compute a bivariate probability mass function on a predefined grid without loops

From: Roger Stafford

Date: 21 Apr, 2012 16:08:13

Message: 7 of 11

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <jmujvn$i1b$1@newscl01ah.mathworks.com>...
> Sure Roger. INTERP1 has no advantage whatsoever but clarity of the coding.
- - - - - - - - - - -
  In my mind the question is whether there might be an actual disadvantage in calling on 'interp1' with such large numbers as Massimo's N = 10000. Or in other words, does 'interp1' check for uniform spacing in k and m before proceeding with its interpolation? If not, then I would think it would be forced into an order(N*log(N)) complexity algorithm, as compared with only the necessary order(N).

Roger Stafford

Subject: Compute a bivariate probability mass function on a predefined grid without loops

From: Bruno Luong

Date: 22 Apr, 2012 08:47:07

Message: 8 of 11

"Roger Stafford" wrote in message <jmum1d$pfn$1@newscl01ah.mathworks.com>...
> - - - - - - - - - - -
> In my mind the question is whether there might be an actual disadvantage in calling on 'interp1' with such large numbers as Massimo's N = 10000. Or in other words, does 'interp1' check for uniform spacing in k and m before proceeding with its interpolation? If not, then I would think it would be forced into an order(N*log(N)) complexity algorithm, as compared with only the necessary order(N).

Roger,

I test INTERP1 with various size and time it, and it looks like it does not check for uniform grid, meaning the run time seems to depends on the number of (known) grid points.

Now here is an interesting: The doc of the function INTERP2 mention that users can signal the input grid is uniform (by adding the '*' in front of the method string) for "faster interpolation". However my timing still show the run time still depends on the input grids.

I'm not sure what is the reason that the run time does not become independent with the '*' methods (I expect otherwise).

Back to direct vs interp1

The question I wonder is : does the direct calculation

p = (Sk-k(1))*((N-1)/(k(N)-k(1)))+1

always lead to the correct index from an input vector generated by LINSPACE due to numerical accuracy.

We have discussed in the past the discrepancy between LINSPACE and COLON-generated vectors. Similarly I guess in some vary extreme cases, p could return a wrong interval (?), whereas dichotomy search will always return the "right" interval, thus more robust. Your opinion?

Bruno

Subject: Compute a bivariate probability mass function on a predefined grid without loops

From: Bruno Luong

Date: 22 Apr, 2012 09:17:15

Message: 9 of 11

>
> p = (Sk-k(1))*((N-1)/(k(N)-k(1)))+1
>

Addendum:

Read from the code of LINSPACE, with the inversion calculation (note the parenthesis)

p = ( ( (Sk-k(1)) * (N-1) ) ./ (k(N)-k(1)) ) + 1;

should undo accurately what LINSPACE did.

Bruno

Subject: Compute a bivariate probability mass function on a predefined grid without loops

From: Roger Stafford

Date: 22 Apr, 2012 17:12:09

Message: 10 of 11

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <jn0gib$4dv$1@newscl01ah.mathworks.com>...
> .......
> The question I wonder is : does the direct calculation
>
> p = (Sk-k(1))*((N-1)/(k(N)-k(1)))+1
>
> always lead to the correct index from an input vector generated by LINSPACE due to numerical accuracy.
>
> We have discussed in the past the discrepancy between LINSPACE and COLON-generated vectors. Similarly I guess in some vary extreme cases, p could return a wrong interval (?), whereas dichotomy search will always return the "right" interval, thus more robust. Your opinion?
> .......
"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <jn0iar$ar0$1@newscl01ah.mathworks.com>...
> ......
> p = ( ( (Sk-k(1)) * (N-1) ) ./ (k(N)-k(1)) ) + 1;
>
> should undo accurately what LINSPACE did.
- - - - - - - - - - -
  Hi again, Bruno. My reason for suggesting the formulas

 p = (Sk-k(1))*((N-1)/(k(N)-k(1)))+1;
 q = (Sm-m(1))*((N-1)/(m(N)-m(1)))+1;

was based, as I have said, on the supposition that 'interp1' would have to use an order(N*log(N)) algorithm to accomplish the same task unless it first tested for uniformity of distribution of k and m, and would therefore take more time for large N.

  As to its accuracy, that is quite another matter. Certainly there is the possibility that Sk or Sm values that happen to be exceedingly close to k or m values will end up in the "wrong" rectangle. What this really amounts to is a difference of opinion, so to speak, between these formulas and where 'linspace' in generating k and m would place the rectangle boundaries. Both are subject to rounding errors.

  With the weighting functions that you and I have recommended in this thread, there is continuity as you cross these rectangular boundaries, so the consequences of being in the wrong rectangle are very small, of the same magnitude as rounding errors. That is, as you cross a boundary, two of the weighting factors are simultaneously dropping down to zero while two others on an adjacent rectangle on the opposite side are starting up from zero. In other words, I am suggesting that perhaps robustness in this respect isn't really necessary in this weighting problem.

  As to your addendum, yes, if you know exactly how 'linspace' does its computation, then using precisely the same formula with Sk and Sm with identical parentheses would (presumably) ensure that the correct rectangle is always selected. It would follow from the IEEE 754 rounding specifications (assuming no weird actions by the compiler.) That assumes of course that the user actually used 'linspace' to generate k and m rather than, say, 'colon'.

  I chose the other parentheses arrangement to ensure that the compiler only computes the factor (N-1)/(k(N)-k(1)) once, but a sufficiently smart compiler really ought to realize that anyway.

Roger Stafford

Subject: Compute a bivariate probability mass function on a predefined grid without loops

From: Roger Stafford

Date: 22 Apr, 2012 18:43:05

Message: 11 of 11

"Bruno Luong" <b.luong@fogale.findmycountry> wrote in message <jn0gib$4dv$1@newscl01ah.mathworks.com>...
> ........
> We have discussed in the past the discrepancy between LINSPACE and COLON-generated vectors. Similarly I guess in some vary extreme cases, p could return a wrong interval (?), whereas dichotomy search will always return the "right" interval, thus more robust. Your opinion?
- - - - - - - - -
  An afterthought, Bruno. In problems where it is essential to fall within the correct interval or rectangle, it would require only a single pass to reliably correct each formula after "flooring", rather than having to go the order(N*log(N)) route. This could take the place of your 'min' operation.

Roger Stafford

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us