Thread Subject: how to test independence in Matlab?

Subject: how to test independence in Matlab?

From: kiki

Date: 24 Jan, 2007 21:09:52

Message: 1 of 5

HI all,

I have some data and want to test if they are inpendengly distributed, which
function shall I use in Matlab?

Thanks a lot

Subject: how to test independence in Matlab?

From: Predictor

Date: 25 Jan, 2007 03:05:49

Message: 2 of 5

You don't say what type of data you have, and I will assume that
"independent" means "uncorrelated". There are many choices, but an
easy selection for numeric data is 'corrcoef'. With categorical data,
and the Statistics Toolbox handy, try the chi-squared output of
'crosstab'. Coding one's own chi-squared routine is very easy in
MATLAB. See the following on-line tutorial, which is very accessible:

http://www.mste.uiuc.edu/patel/chisquare/keyprob.html


-Will Dwinnell
http://matlabdatamining.blogspot.com/


On Jan 25, 12:09 am, "kiki" <losemi...@hotmail.com> wrote:
> I have some data and want to test if they are inpendengly distributed, which
> function shall I use in Matlab?

Subject: how to test independence in Matlab?

From: ellieandrogerxyzzy@mindspring.com.invalid (Roger Stafford)

Date: 25 Jan, 2007 12:11:57

Message: 3 of 5

In article <ep9e31$b8q$1@news.Stanford.EDU>, "kiki"
<loseminds@hotmail.com> wrote:

> HI all,
>
> I have some data and want to test if they are inpendengly distributed, which
> function shall I use in Matlab?
>
> Thanks a lot
--------------
  Without possessing any a-priori knowledge about a set of random
variables, the task of demonstrating their independence is a very
difficult one. If, for example, you happen to know they are jointly
normal, then it is sufficient to show that they are uncorrelated, but in
general, lack of correlation is not sufficient to establish true
statistical independence.

  With two random variables, x and y, you can prove independence if the
following equality holds for their cumulative distributions:

 P(x <= a & y <= b) = P(x <= a) * P(y <= b)

for all numbers a and b. The difficulty with this is that, even if
perfection is not expected, there are a great many possible combinations
of numbers a and b to reasonably test for, and this requires a even
greater number of joint samples to establish reliably.

  Similarly, a set of n random variables x1, x2,..., xn is independent if

 P(x1<=a1 & x2<=a2 & ... & xn<=an) = P(x1<=a1)*P(x2<=a2)*...*P(xn<=an)

but testing this can require a truly enormous number of joint samples as n
gets large.

  It is not the computing task that is the difficulty here. The problem
is collecting a sufficiently large amount of joint data. It is far, far
better to already have some kind of a-priori knowledge of the nature of
the random variables in trying to establish their independence.

Roger Stafford

Subject: how to test independence in Matlab?

From: ellieandrogerxyzzy@mindspring.com.invalid (Roger Stafford)

Date: 25 Jan, 2007 22:52:21

Message: 4 of 5

> In article <ep9e31$b8q$1@news.Stanford.EDU>, "kiki"
> <loseminds@hotmail.com> wrote:
>
> > HI all,
> >
> > I have some data and want to test if they are inpendengly distributed,
which
> > function shall I use in Matlab?
> >
> > Thanks a lot
---------------
  Some further thoughts about testing for the independence of random
variables. Suppose again that x and y are two random variables. From
sufficiently large samplings it is possible to obtain an estimate for the
separate cumulative distribution functions (cdf's) for each of them. Call
u the cdf for x, and v the cdf for y. Then for a very large joint
sampling of x and y one can make a plot, using

 plot(u(x),v(y),'y.') % Make a single yellow dot for each joint sample
 axis equal

This will be a plot over the unit u-v square.

  By the nature of statistical independence, the variables x and y are
independent if the dots on this plot are uniformly distributed area-wise
throughout the entire u-v square. That is, in each small delta-u by
delta-v rectangle, delta-u and delta-v are the respective probabilities
that x and y lie in the corresponding ranges, and their product which is
the rectangle's area should always be equal to the probability that both
do so, if x and y are to be independent. A non-uniform distribution will
indicate a lack of independence. This is something that can be done
fairly well by a visual inspection of the yellow dots, provided there are
enough of them.

  This rough description should give you a feeling for how many samples
are needed to arrive at a reliable indication of independence. As far as
I know, there is no single matlab function that is designed to carry out
such an investigation. (Perhaps the MathWorks people know otherwise?)

  Of course, for more than two variables, such a method becomes much more
difficult and obviously requires many more samples since the dots have to
be shown to fill an n-dimensional hypercube uniformly.

Roger Stafford

Subject: how to test independence in Matlab?

From: Peter Perkins

Date: 26 Jan, 2007 10:10:10

Message: 5 of 5

Roger Stafford wrote:

> This rough description should give you a feeling for how many samples
> are needed to arrive at a reliable indication of independence. As far as
> I know, there is no single matlab function that is designed to carry out
> such an investigation. (Perhaps the MathWorks people know otherwise?)

Roger, you're right, there is no single function, but the pieces are all there.

This is something like a bivariate copula -- transform your data to uniform
marginals, and look for dependencies on the unit square. You could use the ECDF
or KSDENSITY functions in the Statistics Toolbox to estimate the marginal
cumulative distribution functions empirically, or one of many univariate data
fitting functions to estimate a parametric distribution, then use the
corresponding CDF function to do the transformation.

The CHI2GOF or KSTEST functions could be used to determine how uniform the
result of that transformation is, by defining an m-by-m grid over the unit
square. Normally, you'd think of KSTEST as a univariate test, but in this case,
I think it might be reasonable to pick some 1D enumeration of the 2D grid.

This demo

<<http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/copulademo.html>>

touches on some the copula aspects of this idea, but kind of in the "opposite
direction" -- it's really aimed at simulating bivariate data, so it starts on
the unit square and transforms to the marginals.

- Peter Perkins
   The MathWorks, Inc.

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

Contact us at files@mathworks.com