MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

# Thread Subject: Conditional sampling from multivariate normal distribution

Subject: Conditional sampling from multivariate normal distribution

From: Tomaz

### Tomaz

Date: 12 Apr, 2010 09:54:03

Message: 1 of 14

Hi all!

In Matlab I know the mvnrnd(MU,SIGMA) function. This gives me a random vector drawn from multivariate normal distribution characterized by MU and SIGMA. Now I am searching for the simplest way to get a value for only one of the attributes that make up the vector (given that I know the values of the rest attributes). If I am not mistaken this would be called conditional sampling?

To illustrate: I have 5 attributes (independent variables) all together and I build multivariate normal distribution based on dataset consisting of 999 data points. Next, I have 1000. data point with value for attribute nr. 5 missing, but I do know the values of attributes 1-4. I would like to sample the value for the missing attribute based on values of other 4. I imagine that expected value should be different when I would have [-1 2 1 5 missing] than in the case of [1000 3456 221 8901 missing]. Is there any simple way to achieve this/ take the values of n-1 attributes into account when sampling?

 Subject: Conditional sampling from multivariate normal distribution From: Peter Perkins Date: 12 Apr, 2010 14:14:13 Message: 2 of 14 On 4/12/2010 5:54 AM, Tomaz wrote: > In Matlab I know the mvnrnd(MU,SIGMA) function. This gives me a random > vector drawn from multivariate normal distribution characterized by MU > and SIGMA. Now I am searching for the simplest way to get a value for > only one of the attributes that make up the vector (given that I know > the values of the rest attributes). If I am not mistaken this would be > called conditional sampling? Tomaz, I think this Wikipedia section is what you are looking for:

Subject: Conditional sampling from multivariate normal distribution

From: Tomaz

### Tomaz

Date: 12 Apr, 2010 16:41:05

Message: 3 of 14

Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <hpv9rm$n2d$1@fred.mathworks.com>...
> On 4/12/2010 5:54 AM, Tomaz wrote:
>
> > In Matlab I know the mvnrnd(MU,SIGMA) function. This gives me a random
> > vector drawn from multivariate normal distribution characterized by MU
> > and SIGMA. Now I am searching for the simplest way to get a value for
> > only one of the attributes that make up the vector (given that I know
> > the values of the rest attributes). If I am not mistaken this would be
> > called conditional sampling?
>
> Tomaz, I think this Wikipedia section is what you are looking for:
>
> <http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Conditional_distributions>

Peter thanks, but is is this also useful when dealing with more than 2 independent variables?
And I guess that there is no 'straightforward' way of doing this in Matlab?

 Subject: Conditional sampling from multivariate normal distribution From: Peter Perkins Date: 12 Apr, 2010 17:09:24 Message: 4 of 14 On 4/12/2010 12:41 PM, Tomaz wrote: > Peter thanks, but is is this also useful when dealing with more than 2 > independent variables? And I guess that there is no 'straightforward' > way of doing this in Matlab? Look closer at those formulas, and the definitions above them: the formula is entirely general, and it is simple to implement in MATLAB. I'm guessing someone has already posted something like this to the MATLAB Central File Exchange, but I haven't checked. if you have N variables, then "1" and "2" in the formula represent subsets of 1:N. That Wikipedia page happens to have things set up so that the conditioning variables are all at the end (i.e., "2" corresponds to (q+1):N)) and the unobserved variables are all at the beginning (1:q), but that's just to make the notation simpler. Given a row vector mu and a cov matrix Sigma, define i2 as the coordinates that you are conditioning on, and i1 as everything else. Then let mu1 = mu(i1), Sigma11 = Sigma(i1,i1), etc., and apply those formulas. Two things: 1)You''ll want to do something like     Sigma1_2 = Sigma11 - Sigma21*(Sigma22\Sigma12) and similarly for mu, rather than explicitly using INV. Type "help slash". 2) You might have trouble because that Wikipedia page has the MVN in terms of col vectors. You'll want to use row vectors. And so:     mu1_2 = mu1 - ((a-mu2)/Sigma22)*Sigma21

Subject: Conditional sampling from multivariate normal distribution

From: Tomaz

### Tomaz

Date: 12 Apr, 2010 17:43:04

Message: 5 of 14

Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <hpvk44$2if$1@fred.mathworks.com>...
> On 4/12/2010 12:41 PM, Tomaz wrote:
>
> > Peter thanks, but is is this also useful when dealing with more than 2
> > independent variables? And I guess that there is no 'straightforward'
> > way of doing this in Matlab?
>
> Look closer at those formulas, and the definitions above them: the formula is entirely general, and it is simple to implement in MATLAB. I'm guessing someone has already posted something like this to the MATLAB Central File Exchange, but I haven't checked.
>
> if you have N variables, then "1" and "2" in the formula represent subsets of 1:N. That Wikipedia page happens to have things set up so that the conditioning variables are all at the end (i.e., "2" corresponds to (q+1):N)) and the unobserved variables are all at the beginning (1:q), but that's just to make the notation simpler.
>
> Given a row vector mu and a cov matrix Sigma, define i2 as the coordinates that you are conditioning on, and i1 as everything else. Then let mu1 = mu(i1), Sigma11 = Sigma(i1,i1), etc., and apply those formulas. Two things:
>
> 1)You''ll want to do something like
>
> Sigma1_2 = Sigma11 - Sigma21*(Sigma22\Sigma12)
>
> and similarly for mu, rather than explicitly using INV. Type "help slash".
>
> 2) You might have trouble because that Wikipedia page has the MVN in terms of col vectors. You'll want to use row vectors. And so:
>
> mu1_2 = mu1 - ((a-mu2)/Sigma22)*Sigma21

Thank you Peter! I appreciate your effort and I believe I will be able to solve my problem now (with some effort). Could you please just tell me what would be 'statistical expression' that describes my problem the best? Is it 'Conditional sampling', 'Conditional distributions' or something else? Any synonyms/ alternatives? I am asking this to be able to search for related data more efficiently...

 Subject: Conditional sampling from multivariate normal distribution From: Peter Perkins Date: 12 Apr, 2010 19:22:33 Message: 6 of 14 On 4/12/2010 1:43 PM, Tomaz wrote: > Could you please just tell > me what would be 'statistical expression' that describes my problem the > best? Is it 'Conditional sampling', 'Conditional distributions' or > something else? Any synonyms/ alternatives? I am asking this to be able > to search for related data more efficiently... I would think either of those, plus perhaps a "multivariate normal". The same simple result does not hold for MVT, for example.

Subject: Conditional sampling from multivariate normal distribution

From: Tomaz

### Tomaz

Date: 20 Apr, 2010 21:46:04

Message: 7 of 14

Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <hpvrtp$iun$1@fred.mathworks.com>...
> On 4/12/2010 1:43 PM, Tomaz wrote:
>
> > Could you please just tell
> > me what would be 'statistical expression' that describes my problem the
> > best? Is it 'Conditional sampling', 'Conditional distributions' or
> > something else? Any synonyms/ alternatives? I am asking this to be able
> > to search for related data more efficiently...
>
> I would think either of those, plus perhaps a "multivariate normal". The same simple result does not hold for MVT, for example.

I tried really hard to understand the formulas and solve my problem, but I get stuck. Peter (or anybody else), I would really appreciate if you would point out where I go wrong. I tried to follow your directions and Wiki page, but this happens (look below). Should I change anything because of row/ column vector thing?

mu=mean (origData)
mu1=mu (1)
mu2 = mu (2:4)
sigma = cov (origData)
sigma11 = sigma (1:1, 1:1)
sigma12 = sigma (1:1, 2:4)
sigma21 = sigma (2:4, 1:1)
sigma22= sigma (2:4, 2:4)

sigma1_2 = sigma11 - sigma21*(sigma22\sigma12)
??? Error using ==> mldivide
Matrix dimensions must agree.

 Subject: Conditional sampling from multivariate normal distribution From: Peter Perkins Date: 20 Apr, 2010 22:38:07 Message: 8 of 14 On 4/20/2010 5:46 PM, Tomaz wrote: > sigma1_2 = sigma11 - sigma21*(sigma22\sigma12) > ??? Error using ==> mldivide > Matrix dimensions must agree. My fault, I think. All of these are equivalent: sigma1_2 = sigma11 - sigma12*(sigma22\sigma21) sigma1_2 = sigma11 - (sigma12/sigma22)*sigma21 sigma1_2 = sigma11 - sigma12*inv(sigma22)*sigma21

Subject: Conditional sampling from multivariate normal distribution

From: Roger Stafford

### Roger Stafford

Date: 20 Apr, 2010 23:56:05

Message: 9 of 14

"Tomaz " <tomaz.bartolj@gmail.com> wrote in message <hpvif0$9ih$1@fred.mathworks.com>...
> ......
> To illustrate: I have 5 attributes (independent variables) all together and I build multivariate normal distribution based on dataset consisting of 999 data points.
> ........
> Peter thanks, but is is this also useful when dealing with more than 2 independent variables?
> .......

Tomaz, I would like to point out one assertion you made which I don't think you really meant. In the first two articles you said, "To illustrate: I have 5 attributes (independent variables) all together and I build multivariate normal distribution based on dataset consisting of 999 data points" and "is this also useful when dealing with more than 2 independent variables?"

I don't think you really meant that these were independent variables. If they were actually independent, then the conditional probability distribution of the one variable given other variables' values would be the same as its unconditional distribution. I suspect you really meant to say "jointly normal". That's the assumption you do need.

Roger Stafford

Subject: Conditional sampling from multivariate normal distribution

From: Tomaz

### Tomaz

Date: 21 Apr, 2010 10:13:05

Message: 10 of 14

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <hqleul$sgi$1@fred.mathworks.com>...
> "Tomaz " <tomaz.bartolj@gmail.com> wrote in message <hpvif0$9ih$1@fred.mathworks.com>...
> > ......
> > To illustrate: I have 5 attributes (independent variables) all together and I build multivariate normal distribution based on dataset consisting of 999 data points.
> > ........
> > Peter thanks, but is is this also useful when dealing with more than 2 independent variables?
> > .......
>
> Tomaz, I would like to point out one assertion you made which I don't think you really meant. In the first two articles you said, "To illustrate: I have 5 attributes (independent variables) all together and I build multivariate normal distribution based on dataset consisting of 999 data points" and "is this also useful when dealing with more than 2 independent variables?"
>
> I don't think you really meant that these were independent variables. If they were actually independent, then the conditional probability distribution of the one variable given other variables' values would be the same as its unconditional distribution. I suspect you really meant to say "jointly normal". That's the assumption you do need.
>
> Roger Stafford

Roger, you are absolutely correct. They are jointly normally distributed. Sorry for any confusion - I am learning as I go. In statistical sense as well, not just in use of Matlab. Thanks all you guys for helping me out - I really try to not only ask questions here, but try things on my own as much as I can. It is just I am under a time limit and I lack so much knowledge about multivariate distributions that I need in a sub part of my assignment...
Peter, if I understand correctly, I should simply try different sequence of calculations (so dimensions will match)? But in principle I am on the right track now?

Subject: Conditional sampling from multivariate normal distribution

From: Tomaz

### Tomaz

Date: 21 Apr, 2010 10:13:08

Message: 11 of 14

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <hqleul$sgi$1@fred.mathworks.com>...
> "Tomaz " <tomaz.bartolj@gmail.com> wrote in message <hpvif0$9ih$1@fred.mathworks.com>...
> > ......
> > To illustrate: I have 5 attributes (independent variables) all together and I build multivariate normal distribution based on dataset consisting of 999 data points.
> > ........
> > Peter thanks, but is is this also useful when dealing with more than 2 independent variables?
> > .......
>
> Tomaz, I would like to point out one assertion you made which I don't think you really meant. In the first two articles you said, "To illustrate: I have 5 attributes (independent variables) all together and I build multivariate normal distribution based on dataset consisting of 999 data points" and "is this also useful when dealing with more than 2 independent variables?"
>
> I don't think you really meant that these were independent variables. If they were actually independent, then the conditional probability distribution of the one variable given other variables' values would be the same as its unconditional distribution. I suspect you really meant to say "jointly normal". That's the assumption you do need.
>
> Roger Stafford

Roger, yes you are absolutely right. Sorry for the confusion - I am learning as I go. Also in statistical sense. I appreciate help of you guys very much: I seriously lack knowledge of multivariate distribution and at the same time I am under time pressure. I try to work on things on my own as much as possible - but when I get stuck, you guys here are my last hope: thanks again.

@Peter: so I just change the sequence of calculations so that dimensions match. Otherwise I am on the right track.
basically sigma11 is the element at coordinates (i,i) - given I always condition on n-1 variables.
sigma12 is row i of sigma without element in column i
sigma21 is column i of sigma without element in row i
sigma22 is everything else.

And there is no issue of column/ row vectors here, since sigma is diagonally symmetrical?
Are these statements above correct?

 Subject: Conditional sampling from multivariate normal distribution From: Peter Perkins Date: 21 Apr, 2010 13:15:28 Message: 12 of 14 On 4/21/2010 6:13 AM, Tomaz wrote: > @Peter: so I just change the sequence of calculations so that dimensions > match. Otherwise I am on the right track. As per my most recent post. > And about sigmas: > basically sigma11 is the element at coordinates (i,i) - given I always > condition on n-1 variables. > sigma12 is row i of sigma without element in column i > sigma21 is column i of sigma without element in row i > sigma22 is everything else. > And there is no issue of column/ row vectors here, since sigma is > diagonally symmetrical? Only that you need the right one in the right place. But yes, sigma12 is just sigma21', and that will be true regardless of how many variables you (don't) condition on.

Subject: Conditional sampling from multivariate normal distribution

From: Tomaz

### Tomaz

Date: 22 Apr, 2010 15:30:24

Message: 13 of 14

Peter Perkins <Peter.Perkins@MathRemoveThisWorks.com> wrote in message <hqmtpg$pab$1@fred.mathworks.com>...
> On 4/21/2010 6:13 AM, Tomaz wrote:
> > @Peter: so I just change the sequence of calculations so that dimensions
> > match. Otherwise I am on the right track.
>
> As per my most recent post.
> > basically sigma11 is the element at coordinates (i,i) - given I always
> > condition on n-1 variables.
> > sigma12 is row i of sigma without element in column i
> > sigma21 is column i of sigma without element in row i
> > sigma22 is everything else.
> > And there is no issue of column/ row vectors here, since sigma is
> > diagonally symmetrical?
>
> Only that you need the right one in the right place. But yes, sigma12 is just sigma21', and that will be true regardless of how many variables you (don't) condition on.

Peter, sorry I don't quite get the meaning of this comment:
"Only that you need the right one in the right place." What do you mean by that? I have values for each independent variable in columns of a matrix. Than I simply obtain mu row vector by funtion mean() and I get the covariance matrix with function cov(). I save subsections of the covariance matrix to variables sigmaIJ as described above. Is there something else that I should watch for?

 Subject: Conditional sampling from multivariate normal distribution From: Peter Perkins Date: 22 Apr, 2010 17:19:04 Message: 14 of 14 On 4/22/2010 11:30 AM, Tomaz wrote: >> As per my most recent post. >> > And about sigmas: >> > basically sigma11 is the element at coordinates (i,i) - given I always >> > condition on n-1 variables. >> > sigma12 is row i of sigma without element in column i >> > sigma21 is column i of sigma without element in row i >> > sigma22 is everything else. >> > And there is no issue of column/ row vectors here, since sigma is >> > diagonally symmetrical? >> >> Only that you need the right one in the right place. But yes, sigma12 >> is just sigma21', and that will be true regardless of how many >> variables you (don't) condition on. > > Peter, sorry I don't quite get the meaning of this comment: > "Only that you need the right one in the right place." What do you mean > by that? All I meant was "sigma11 - (sigma12/sigma22)*sigma21", and not "sigma11 - sigma21*(sigma22\sigma12)". And that even when you are generating conditional random values for more than one variable, sigma12 and sigma21 are always transposes of each other. Just not necessarily vectors.