From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Re: Multivariate mutual independence test
Date: Mon, 20 Jun 2011 20:20:19 +0000 (UTC)
Organization: The MathWorks, Inc.
Lines: 14
Message-ID: <itoa23$lcs$>
References: <ito0m4$j4m$> <ito3ui$13q$> <ito6j0$amg$>
Reply-To: <HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: 1308601219 21916 (20 Jun 2011 20:20:19 GMT)
NNTP-Posting-Date: Mon, 20 Jun 2011 20:20:19 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 1187260
Xref: comp.soft-sys.matlab:732862

"sinoTrinity Liu" <> wrote in message <ito6j0$amg$>...
> Large sample size required may not be the major issue here because in principle I can get infinite number of samples. The major issue is that neither the joint distribution P(X<=a,Y<=b,Z<=c)  nor their marginal distribution (e.g., P(X<=a)) is known a prior because all I have are their samples though physical observation. So neither P(X<=a) * P(Y<=b) * P(Z<=c) nor P(X<=a,Y<=b,Z<=c) can be determined precisely, which makes the brute-force checking of equality or closeness for all combinations of a, b, c very difficult, if possible at all.
> ..........
> To make things worse, all random variables here are continuous.
- - - - - - - -
  I assume that when you say that "all random variables here are continuous", you mean that their various probability distribution functions are continuous.  What else could you mean, unless there is some underlying parameter(s) here you haven't talked about?

  In that case, continuity is a definite asset, not a disadvantage.  The continuity of such distributions means that fewer observations are necessary.  If your distributions were subject to numerous discontinuities, that would require heavy sampling in each of such discontinuous neighborhoods to get any kind of respectable estimates, whereas the property of continuity makes things far easier.  If you are able to get by with some decent multiple of 100^3 observations it would undoubtedly be due to this fact of continuity.  Otherwise you might be faced with the necessity of an astronomical number needed.

  As to not knowing the joint distribution and marginal distribution functions, that is precisely what observations are made for.  If you know nothing else about your random variables, that is the only thing you have to work with.  No-one can ever exactly determine random variables' distribution functions using only observations, but the more numerous these observations, the more accurate become the approximations.  What you have to do is approximate the quantities I mentioned using all the observational data you have time to accumulate and then perform the necessary tests for independence, realizing that you can never satisfy them exactly with only a finite amount of data.

  As I asserted earlier, there is no shortcut for the large amount of data that would need to be gathered for this purpose.  That is inherent in the lack of a priori knowledge of such random variables.

Roger Stafford