From: <HIDDEN>
Newsgroups: comp.soft-sys.matlab
Subject: Re: why mvnrnd works with a singular covariance matrix
Date: Wed, 12 May 2010 02:05:20 +0000 (UTC)
Organization: The MathWorks, Inc.
Lines: 24
Message-ID: <hsd2d0$fuq$>
References: <hsc2vg$lj8$> <hscfl2$lma$> <hscji4$s04$>
Reply-To: <HIDDEN>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: 1273629920 16346 (12 May 2010 02:05:20 GMT)
NNTP-Posting-Date: Wed, 12 May 2010 02:05:20 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 1187260
Xref: comp.soft-sys.matlab:634869

"Farzin Zareian" <> wrote in message <hscji4$s04$>...
> Peter and Roger
> Many thanks for your instructive responses. After reading your comments and the few other sources I can see your points and now I have better handle on the situation.
> As my last question, what is the measure of reliability of the fitted distribution to data. As you mentioned, with three realizations it is meaningless to fit a seven dimensional joint normal distribution. How many realizations is the minimum for getting meaningful results for number of variables equal to N?
> many thanks
> farzin

  I was afraid you might ask that question.  I can only answer it in a general way.

  To begin with, if you have a single random variable, consider the effect of using a number of independent observations to estimate its mean value, and then ask what one can expect the standard deviation to be between this estimate and the true mean.  You will find that it is the original random variable's standard deviation divided by the square root of the number of observations.  In other words if you make a hundred independent observations, your accuracy of mean estimate only gets better from that of a single observation by a factor of ten.  Something similar pertains to sample estimates made of variance values for the random variable.  Also with more than one random variable one gets a similar situation in estimating covariance values.  It takes a lot of observations to make them accurate.

  A second consideration is that as the number of variables increases, the assumption of joint normality becomes increasingly critical.  Forget that assumption for a moment and imagine you have a standard two-dimensional chess board representing eight discrete values for each of two random variables.  Clearly you will have to have enough samples to put a fairly large number of pairs in each square to make a good estimate of joint probability - a good-sized multiple of 64 at the very least.  Now change this to a seven-dimensional chess board corresponding to your seven variables, if you can imagine such a thing.  If you want to have several samples in each - what shall we call them - each seven-dimensional "square", you would need a healthy multiple of eight to the power seven, which would be some large multiple of two million samples.  It is precisely to avoid such horrific requirements 
that statisticians are so eager to assume joint normality in so many situations.  It greatly reduces the number of observations that would be necessary in their work. 

  However, this reasoning should at least strike a note of caution to you that in a seven-variable situation, either you had better be awfully sure of joint normality, or the number of samples should be a good deal more than the above square-root-of-the-number-of-observations criterion would indicate.

  Now as you see I have done a lot of hand-waving over this question, but perhaps it will help you a little, even if it is unwelcome information.

  (By the way, I too was a graduate student at UCI in years past - in fact at the time they first opened.)

Roger Stafford