Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Multivariate mutual independence test

Subject: Multivariate mutual independence test

From: sinoTrinity Liu

Date: 20 Jun, 2011 17:40:20

Message: 1 of 6

Hi everyone,

I have a series of observations {Xi,Yi,Zi} for random variable X, Y and Z, who actual distribution is unknown. Now I want to test if X, Y and Z are mutual independent based on these observations, can anyone help me?

Some pairwise independence tests like this one http://www.gatsby.ucl.ac.uk/~gretton/indepTestFiles/indep.htm are available, but I haven't found such tests for multivariate mutual independence yet?

BTW, all random variables are continuous, so chi-square independence test may not, at least directly, work.

Thanks in advance.

Subject: Multivariate mutual independence test

From: Roger Stafford

Date: 20 Jun, 2011 18:36:02

Message: 2 of 6

"sinoTrinity Liu" <whu_lxh@hotmail.com> wrote in message <ito0m4$j4m$1@newscl01ah.mathworks.com>...
> I have a series of observations {Xi,Yi,Zi} for random variable X, Y and Z, who actual distribution is unknown. Now I want to test if X, Y and Z are mutual independent based on these observations, can anyone help me?
- - - - - - - - - -
  If you don't already have advance information about the three variables' joint distribution, an accurate determination of their independence inherently requires a very large number of observations. There is no short cut to this.

  All such tests have as their essential ingredient the test of whether the probability equation

 P(a<=X<=b & c<=Y<=d & e<=Z<=f) = P(a<=X<=b) * P(c<=Y<=d) * P(e<=Z<=f)

holds for all combinations of intervals [a,b], [c,d], and [e,f]. You can approximate this using observed joint cumulative probabilities with some finite three dimensional grid of x,y,z values to see that over all grid points a, b, c

 P(X<=a,Y<=b,Z<=c)

is approximately equal to

 P(X<=a) * P(Y<=b) * P(Z<=c)

  Such computations are easy to do with matlab once you have accumulated the data, but a large amount of data is unavoidable in the case of three random variables. For example if you need a hundred points to accurately record variations in each random variable, that amounts to some suitable multiple of one million joint observations altogether, which is a very large number.

  Often the evidence for independence is based on more than simple repetitive observations but takes into account other matters. For example, the tosses of three dice simultaneously onto a table can be considered to be inherently independent unless, say, magnets are discovered within their interiors.

Roger Stafford

Subject: Multivariate mutual independence test

From: sinoTrinity Liu

Date: 20 Jun, 2011 19:21:04

Message: 3 of 6

Thanks for your help.

Large sample size required may not be the major issue here because in principle I can get infinite number of samples. The major issue is that neither the joint distribution P(X<=a,Y<=b,Z<=c) nor their marginal distribution (e.g., P(X<=a)) is known a prior because all I have are their samples though physical observation. So neither P(X<=a) * P(Y<=b) * P(Z<=c) nor P(X<=a,Y<=b,Z<=c) can be determined precisely, which makes the brute-force checking of equality or closeness for all combinations of a, b, c very difficult, if possible at all.

"Roger Stafford" wrote in message <ito3ui$13q$1@newscl01ah.mathworks.com>...
> "sinoTrinity Liu" <whu_lxh@hotmail.com> wrote in message <ito0m4$j4m$1@newscl01ah.mathworks.com>...
> > I have a series of observations {Xi,Yi,Zi} for random variable X, Y and Z, who actual distribution is unknown. Now I want to test if X, Y and Z are mutual independent based on these observations, can anyone help me?
> - - - - - - - - - -
> If you don't already have advance information about the three variables' joint distribution, an accurate determination of their independence inherently requires a very large number of observations. There is no short cut to this.
>
> All such tests have as their essential ingredient the test of whether the probability equation
>
> P(a<=X<=b & c<=Y<=d & e<=Z<=f) = P(a<=X<=b) * P(c<=Y<=d) * P(e<=Z<=f)
>
> holds for all combinations of intervals [a,b], [c,d], and [e,f]. You can approximate this using observed joint cumulative probabilities with some finite three dimensional grid of x,y,z values to see that over all grid points a, b, c
>
> P(X<=a,Y<=b,Z<=c)
>
> is approximately equal to
>
> P(X<=a) * P(Y<=b) * P(Z<=c)
>
> Such computations are easy to do with matlab once you have accumulated the data, but a large amount of data is unavoidable in the case of three random variables. For example if you need a hundred points to accurately record variations in each random variable, that amounts to some suitable multiple of one million joint observations altogether, which is a very large number.
>
> Often the evidence for independence is based on more than simple repetitive observations but takes into account other matters. For example, the tosses of three dice simultaneously onto a table can be considered to be inherently independent unless, say, magnets are discovered within their interiors.
>
> Roger Stafford

Subject: Multivariate mutual independence test

From: sinoTrinity Liu

Date: 20 Jun, 2011 19:27:04

Message: 4 of 6

To make things worse, all random variables here are continuous.

"Roger Stafford" wrote in message <ito3ui$13q$1@newscl01ah.mathworks.com>...
> "sinoTrinity Liu" <whu_lxh@hotmail.com> wrote in message <ito0m4$j4m$1@newscl01ah.mathworks.com>...
> > I have a series of observations {Xi,Yi,Zi} for random variable X, Y and Z, who actual distribution is unknown. Now I want to test if X, Y and Z are mutual independent based on these observations, can anyone help me?
> - - - - - - - - - -
> If you don't already have advance information about the three variables' joint distribution, an accurate determination of their independence inherently requires a very large number of observations. There is no short cut to this.
>
> All such tests have as their essential ingredient the test of whether the probability equation
>
> P(a<=X<=b & c<=Y<=d & e<=Z<=f) = P(a<=X<=b) * P(c<=Y<=d) * P(e<=Z<=f)
>
> holds for all combinations of intervals [a,b], [c,d], and [e,f]. You can approximate this using observed joint cumulative probabilities with some finite three dimensional grid of x,y,z values to see that over all grid points a, b, c
>
> P(X<=a,Y<=b,Z<=c)
>
> is approximately equal to
>
> P(X<=a) * P(Y<=b) * P(Z<=c)
>
> Such computations are easy to do with matlab once you have accumulated the data, but a large amount of data is unavoidable in the case of three random variables. For example if you need a hundred points to accurately record variations in each random variable, that amounts to some suitable multiple of one million joint observations altogether, which is a very large number.
>
> Often the evidence for independence is based on more than simple repetitive observations but takes into account other matters. For example, the tosses of three dice simultaneously onto a table can be considered to be inherently independent unless, say, magnets are discovered within their interiors.
>
> Roger Stafford

Subject: Multivariate mutual independence test

From: Roger Stafford

Date: 20 Jun, 2011 20:20:19

Message: 5 of 6

"sinoTrinity Liu" <whu_lxh@hotmail.com> wrote in message <ito6j0$amg$1@newscl01ah.mathworks.com>...
> Large sample size required may not be the major issue here because in principle I can get infinite number of samples. The major issue is that neither the joint distribution P(X<=a,Y<=b,Z<=c) nor their marginal distribution (e.g., P(X<=a)) is known a prior because all I have are their samples though physical observation. So neither P(X<=a) * P(Y<=b) * P(Z<=c) nor P(X<=a,Y<=b,Z<=c) can be determined precisely, which makes the brute-force checking of equality or closeness for all combinations of a, b, c very difficult, if possible at all.
> ..........
> To make things worse, all random variables here are continuous.
- - - - - - - -
  I assume that when you say that "all random variables here are continuous", you mean that their various probability distribution functions are continuous. What else could you mean, unless there is some underlying parameter(s) here you haven't talked about?

  In that case, continuity is a definite asset, not a disadvantage. The continuity of such distributions means that fewer observations are necessary. If your distributions were subject to numerous discontinuities, that would require heavy sampling in each of such discontinuous neighborhoods to get any kind of respectable estimates, whereas the property of continuity makes things far easier. If you are able to get by with some decent multiple of 100^3 observations it would undoubtedly be due to this fact of continuity. Otherwise you might be faced with the necessity of an astronomical number needed.

  As to not knowing the joint distribution and marginal distribution functions, that is precisely what observations are made for. If you know nothing else about your random variables, that is the only thing you have to work with. No-one can ever exactly determine random variables' distribution functions using only observations, but the more numerous these observations, the more accurate become the approximations. What you have to do is approximate the quantities I mentioned using all the observational data you have time to accumulate and then perform the necessary tests for independence, realizing that you can never satisfy them exactly with only a finite amount of data.

  As I asserted earlier, there is no shortcut for the large amount of data that would need to be gathered for this purpose. That is inherent in the lack of a priori knowledge of such random variables.

Roger Stafford

Subject: Multivariate mutual independence test

From: Roger Stafford

Date: 20 Jun, 2011 23:33:04

Message: 6 of 6

"sinoTrinity Liu" <whu_lxh@hotmail.com> wrote in message <ito6u8$bo8$1@newscl01ah.mathworks.com>...
> To make things worse, all random variables here are continuous.
- - - - - - - - - -
  I hadn't thought of the possibility that where you said "random variables here are continuous" you might have meant only that the range of values the variables can assume is a continuum, as opposed to being discrete-valued. That doesn't matter if the distribution functions themselves turn out to be (reasonably) continuous. Presumably such a continuity would make it seem as if your variables are in some sense discrete-valued and they could be analyzed accordingly. If you are faced with a continuum of values and discontinuous distributions, that would be an exceedingly difficult thing to analyze for independence, as I previously mentioned.

Roger Stafford

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us