Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Generation of Correlated Data

Subject: Generation of Correlated Data

From: Deva MDP

Date: 12 Aug, 2008 08:39:01

Message: 1 of 12

Can some one tell how to generate two random data sets
with known correlation, (say Corr. Coef. = 0.5)

Subject: Generation of Correlated Data

From: Roger Stafford

Date: 12 Aug, 2008 15:26:01

Message: 2 of 12

"Deva MDP" <devasiri@gmail.com> wrote in message <g7ri75$8hr
$1@fred.mathworks.com>...
> Can some one tell how to generate two random data sets
> with known correlation, (say Corr. Coef. = 0.5)

  Generate any two mutually independent variables, x and y, whose means are
0 and variances 1. For example x = randn(n,1) and y = randn(n,1). Then we
have

 E(x) = E(y) = 0,
 E(x^2) = E(y^2) = 1
 E(x*y) = E(x)*E(y) = 0

  Then construct z = x + k*y where k is some constant yet to be determined.
Now we have

 E(z) = 0,
 E(z^2) = E(x^2) + 2*k*E(x*y) + k^2*E(y^2) = 1 + k^2,
 E(x*z) = E(x^2) + k*E(x*y) = 1

Hence

 corr(x,z) = E(x*z)/sqrt(E(x^2)*E(z^2)) = 1/sqrt(1+k^2)

Therefore solve for the value of k that gives you the desired correlation
coefficient. For corr = .5 it would be k = sqrt(3).

  Are you sure you don't have further requirements? You have left a lot of
freedom in your description here.

Roger Stafford

Subject: Generation of Correlated Data

From: Deva MDP

Date: 17 Aug, 2008 03:19:01

Message: 3 of 12

"Deva MDP" <devasiri@gmail.com> wrote in message
<g7ri75$8hr$1@fred.mathworks.com>...
> Can some one tell how to generate two random data sets
> with known correlation, (say Corr. Coef. = 0.5)
devasiri@gmail.com

Dear Friend,

Thank you for the support given. I undestood how to generate
two data vectors to a required correlatin between them.But
my problem is as follows which I couldn't clarify yet.

I have generated a correlated random vector with 3 columns
for a desired correlaton matrix. Though my work is
successful, still I don't know the theory behind this procedure.
The procedure adopted is as follows.
 
(1) Generated 3 random vectors with ndependently normally
dstributed entries. X=[x1 x2 x3]
Corr(X)= Identity matrix approximately.
 
(2) Then x is transformed in to Y by Y=X*c , where c=
squreroot of G (G s the ultimate correlation matrix of Y)
(c is +ve definte matrx)
The form of G = (1 g g;g 1 g;g g 1], g is the correlaton
between the formed vectors.
 
Thankful If you can kndly let me know the theory behind this
procedure.


Best regards

Devasiri

Subject: Generation of Correlated Data

From: Roger Stafford

Date: 17 Aug, 2008 05:27:02

Message: 4 of 12

"Deva MDP" <devasiri@gmail.com> wrote in message <g885b5$m3v
$1@fred.mathworks.com>...
> "Deva MDP" <devasiri@gmail.com> wrote in message
> <g7ri75$8hr$1@fred.mathworks.com>...
> > Can some one tell how to generate two random data sets
> > with known correlation, (say Corr. Coef. = 0.5)
> devasiri@gmail.com
>
> Dear Friend,
>
> Thank you for the support given. I undestood how to generate
> two data vectors to a required correlatin between them.But
> my problem is as follows which I couldn't clarify yet.
>
> I have generated a correlated random vector with 3 columns
> for a desired correlaton matrix. Though my work is
> successful, still I don't know the theory behind this procedure.
> The procedure adopted is as follows.
>
> (1) Generated 3 random vectors with ndependently normally
> dstributed entries. X=[x1 x2 x3]
> Corr(X)= Identity matrix approximately.
>
> (2) Then x is transformed in to Y by Y=X*c , where c=
> squreroot of G (G s the ultimate correlation matrix of Y)
> (c is +ve definte matrx)
> The form of G = (1 g g;g 1 g;g g 1], g is the correlaton
> between the formed vectors.
>
> Thankful If you can kndly let me know the theory behind this
> procedure.
>
> Best regards
>
> Devasiri

  For the sake of discussion suppose that your three independent normally
distributed random variables x1, x2, and x3 have mean 0 and variance 1, so
that correlation and covariance are one and the same. If c is the matrix
square root of the G you have defined, then the following holds true. The
covariance matrix of your n by 3 matrix Y = X*c is given by

 E{Y'*Y} = E{(X*c)'*(X*c)} = E{c'*X'*X*c}
 = c'*E{X'*X}*c = c'*I*c = c'*c = c*c = G

Here I is the identity matrix for the covariance matrix of X, and c = c' because
it is symmetric. Thus Y has the desired covariances.

  Note that the same would be true for any positive definite G. All you have to
do is find its matrix square root (using eigenvector methods presumably.)

Roger Stafford

Subject: Generation of Correlated Data

From: Deva MDP

Date: 17 Aug, 2008 07:26:01

Message: 5 of 12

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid>
wrote in message <g88cr6$d72$1@fred.mathworks.com>...
> "Deva MDP" <devasiri@gmail.com> wrote in message <g885b5$m3v
> $1@fred.mathworks.com>...
> > "Deva MDP" <devasiri@gmail.com> wrote in message
> > <g7ri75$8hr$1@fred.mathworks.com>...
> > > Can some one tell how to generate two random data sets
> > > with known correlation, (say Corr. Coef. = 0.5)
> > devasiri@gmail.com
> >
> > Dear Friend,
> >
> > Thank you for the support given. I undestood how to generate
> > two data vectors to a required correlatin between them.But
> > my problem is as follows which I couldn't clarify yet.



> > I have generated a correlated random vector with 3 columns
> > for a desired correlaton matrix. Though my work is
> > successful, still I don't know the theory behind this
procedure.
> > The procedure adopted is as follows.
> >
> > (1) Generated 3 random vectors with ndependently normally
> > dstributed entries. X=[x1 x2 x3]
> > Corr(X)= Identity matrix approximately.
> >
> > (2) Then x is transformed in to Y by Y=X*c , where c=
> > squreroot of G (G s the ultimate correlation matrix of Y)
> > (c is +ve definte matrx)
> > The form of G = (1 g g;g 1 g;g g 1], g is the correlaton
> > between the formed vectors.
> >
> > Thankful If you can kndly let me know the theory behind this
> > procedure.
> >
> > Best regards
> >
> > Devasiri
>
> For the sake of discussion suppose that your three
independent normally
> distributed random variables x1, x2, and x3 have mean 0
and variance 1, so
> that correlation and covariance are one and the same. If
c is the matrix
> square root of the G you have defined, then the following
holds true. The
> covariance matrix of your n by 3 matrix Y = X*c is given by
>
> E{Y'*Y} = E{(X*c)'*(X*c)} = E{c'*X'*X*c}
> = c'*E{X'*X}*c = c'*I*c = c'*c = c*c = G
>
> Here I is the identity matrix for the covariance matrix of
X, and c = c' because
> it is symmetric. Thus Y has the desired covariances.
>
> Note that the same would be true for any positive
definite G. All you have to
> do is find its matrix square root (using eigenvector
methods presumably.)
>
> Roger Stafford
>
Dear Roger,

Thank you very much for your support.
I can understand that.But I got Corr(A*Y)=Corr(Y)
As I have the above thinking n my mind, still I find dffcult
to undestand the correlation relationship between X and Y
data sets.

Regards from

Devasiri

Subject: Generation of Correlated Data

From: Greg Heath

Date: 17 Aug, 2008 17:47:31

Message: 6 of 12

On Aug 17, 1:27=A0am, "Roger Stafford"
<ellieandrogerxy...@mindspring.com.invalid> wrote:
> "Deva MDP" <devas...@gmail.com> wrote in message <g885b5$m3v
>
> $...@fred.mathworks.com>...
>
>
>
>
>
> > "Deva MDP" <devas...@gmail.com> wrote in message
> > <g7ri75$8h...@fred.mathworks.com>...
> > > Can some one tell how to generate two random data sets
> > > with known correlation, (say Corr. Coef. =3D 0.5)
> > devas...@gmail.com
>
> > Dear Friend,
>
> > Thank you for the support given. I undestood how to generate
> > two data vectors to a required correlatin between them.But
> > my problem is as follows which I couldn't clarify yet.
>
> > I have generated a correlated random vector with 3 columns
> > for a desired correlaton matrix. Though my work is
> > successful, still I don't know the theory behind this procedure.
> > The procedure adopted is as follows.
>
> > (1) Generated 3 random vectors with ndependently normally
> > dstributed entries. X=3D[x1 x2 x3]
> > Corr(X)=3D Identity matrix approximately.
>
> > (2) Then x is transformed in to Y by Y=3DX*c , where c=3D
> > squreroot of G (G s the ultimate correlation matrix of Y)
> > (c is +ve definte matrx)
> > The form of G =3D (1 g g;g 1 g;g g 1], g is the correlaton
> > between the formed vectors.
>
> > Thankful If you can kndly let me know the theory behind this
> > procedure.
>
> > Best regards
>
> > Devasiri
>
> =A0 For the sake of discussion suppose that your three independent normal=
ly
> distributed random variables x1, x2, and x3 have mean 0 and variance 1, s=
o
> that correlation and covariance are one and the same. =A0If c is the matr=
ix
> square root of the G you have defined, then the following holds true. =A0=
The
> covariance matrix of your n by 3 matrix Y =3D X*c is given by
>
> =A0E{Y'*Y} =3D E{(X*c)'*(X*c)} =3D E{c'*X'*X*c}
> =A0=3D c'*E{X'*X}*c =3D c'*I*c =3D c'*c =3D c*c =3D G
>
> Here I is the identity matrix for the covariance matrix of X, and c =3D c=
' because
> it is symmetric. =A0Thus Y has the desired covariances.
>
> =A0 Note that the same would be true for any positive definite G. =A0All =
you have to
> do is find its matrix square root (using eigenvector methods presumably.)

The matrix square root is obviously not unique. Why do you prefer
using eigenvector methods instead of SQRTM?

Hope this helps.

Greg

Subject: Generation of Correlated Data

From: Roger Stafford

Date: 17 Aug, 2008 21:48:01

Message: 7 of 12

Greg Heath <heath@alumni.brown.edu> wrote in message
<193fc757-3622-4faa-b8ee-
abf2fa117d70@p25g2000hsf.googlegroups.com>...

> The matrix square root is obviously not unique.

  The square root of a positive definite matrix plays essentially the same role of
uniqueness in matrix theory as does the square root of a positive scalar number
in the real continuum. To partially quote Mathworks, it "is the principal square
root of the matrix ... the unique square root for which every eigenvalue has
nonnegative real part."

> Why do you prefer using eigenvector methods instead of SQRTM?

  As you must be fully aware, Greg, the matlab function 'sqrtm' does in fact use
eigenvector methods. It is an excellent way of finding a matrix principal square
root. It is not a question of "instead of".

Roger Stafford

Subject: Generation of Correlated Data

From: Deva MDP

Date: 15 Sep, 2008 07:13:02

Message: 8 of 12

"Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <g88cr6$d72$1@fred.mathworks.com>...
> "Deva MDP" <devasiri@gmail.com> wrote in message <g885b5$m3v
> $1@fred.mathworks.com>...
> > "Deva MDP" <devasiri@gmail.com> wrote in message
> > <g7ri75$8hr$1@fred.mathworks.com>...
> > > Can some one tell how to generate two random data sets
> > > with known correlation, (say Corr. Coef. = 0.5)
> > devasiri@gmail.com
> >
> > Dear Friend,
> >
> > Thank you for the support given. I undestood how to generate
> > two data vectors to a required correlatin between them.But
> > my problem is as follows which I couldn't clarify yet.
> >
> > I have generated a correlated random vector with 3 columns
> > for a desired correlaton matrix. Though my work is
> > successful, still I don't know the theory behind this procedure.
> > The procedure adopted is as follows.
> >
> > (1) Generated 3 random vectors with ndependently normally
> > dstributed entries. X=[x1 x2 x3]
> > Corr(X)= Identity matrix approximately.
> >
> > (2) Then x is transformed in to Y by Y=X*c , where c=
> > squreroot of G (G s the ultimate correlation matrix of Y)
> > (c is +ve definte matrx)
> > The form of G = (1 g g;g 1 g;g g 1], g is the correlaton
> > between the formed vectors.
> >
> > Thankful If you can kndly let me know the theory behind this
> > procedure.
> >
> > Best regards
> >
> > Devasiri
>
> For the sake of discussion suppose that your three independent normally
> distributed random variables x1, x2, and x3 have mean 0 and variance 1, so
> that correlation and covariance are one and the same. If c is the matrix
> square root of the G you have defined, then the following holds true. The
> covariance matrix of your n by 3 matrix Y = X*c is given by
>
> E{Y'*Y} = E{(X*c)'*(X*c)} = E{c'*X'*X*c}
> = c'*E{X'*X}*c = c'*I*c = c'*c = c*c = G
>
> Here I is the identity matrix for the covariance matrix of X, and c = c' because
> it is symmetric. Thus Y has the desired covariances.
>
> Note that the same would be true for any positive definite G. All you have to
> do is find its matrix square root (using eigenvector methods presumably.)
>


> Roger Stafford
> Dear Roger,

I have already generated the required correlated data sets.
using C=sqrtm(g). y=X*C, Also I undestand what you explaind to me previously.

But, still I am suffering with the proof of Corr(Y)= G, (here, C'*C=G, C= symmetric, + semidefinite matrix),
I can prove Corr(x)=I (Identity matrix)
since Corr(X*C)= Corr(X)


I do not get Corr(Y)=G.
Please help me to prove this
I need this theorey to include it in my methodology.


Thank you very much.
Devasiri

Subject: Generation of Correlated Data

From: Deva MDP

Date: 16 Sep, 2008 08:40:04

Message: 9 of 12

"Deva MDP" <devasiri@gmail.com> wrote in message <gal1tu$2ha$1@fred.mathworks.com>...
> "Roger Stafford" <ellieandrogerxyzzy@mindspring.com.invalid> wrote in message <g88cr6$d72$1@fred.mathworks.com>...
> > "Deva MDP" <devasiri@gmail.com> wrote in message <g885b5$m3v
> > $1@fred.mathworks.com>...
> > > "Deva MDP" <devasiri@gmail.com> wrote in message
> > > <g7ri75$8hr$1@fred.mathworks.com>...
> > > > Can some one tell how to generate two random data sets
> > > > with known correlation, (say Corr. Coef. = 0.5)
> > > devasiri@gmail.com
> > >
> > > Dear Friend,
> > >
> > > Thank you for the support given. I undestood how to generate
> > > two data vectors to a required correlatin between them.But
> > > my problem is as follows which I couldn't clarify yet.
> > >
> > > I have generated a correlated random vector with 3 columns
> > > for a desired correlaton matrix. Though my work is
> > > successful, still I don't know the theory behind this procedure.
> > > The procedure adopted is as follows.
> > >
> > > (1) Generated 3 random vectors with ndependently normally
> > > dstributed entries. X=[x1 x2 x3]
> > > Corr(X)= Identity matrix approximately.
> > >
> > > (2) Then x is transformed in to Y by Y=X*c , where c=
> > > squreroot of G (G s the ultimate correlation matrix of Y)
> > > (c is +ve definte matrx)
> > > The form of G = (1 g g;g 1 g;g g 1], g is the correlaton
> > > between the formed vectors.
> > >
> > > Thankful If you can kndly let me know the theory behind this
> > > procedure.
> > >
> > > Best regards
> > >
> > > Devasiri
> >
> > For the sake of discussion suppose that your three independent normally
> > distributed random variables x1, x2, and x3 have mean 0 and variance 1, so
> > that correlation and covariance are one and the same. If c is the matrix
> > square root of the G you have defined, then the following holds true. The
> > covariance matrix of your n by 3 matrix Y = X*c is given by
> >
> > E{Y'*Y} = E{(X*c)'*(X*c)} = E{c'*X'*X*c}
> > = c'*E{X'*X}*c = c'*I*c = c'*c = c*c = G
> >
> > Here I is the identity matrix for the covariance matrix of X, and c = c' because
> > it is symmetric. Thus Y has the desired covariances.
> >
> > Note that the same would be true for any positive definite G. All you have to
> > do is find its matrix square root (using eigenvector methods presumably.)
> >
>
>
> > Roger Stafford
> > Dear Roger,
>
> I have already generated the required correlated data sets.
> using C=sqrtm(g). y=X*C, Also I undestand what you explaind to me previously.
>
> But, still I am suffering with the proof of Corr(Y)= G, (here, C'*C=G, C= symmetric, + semidefinite matrix),
> I can prove Corr(x)=I (Identity matrix)
> since Corr(X*C)= Corr(X)
>
>
> I do not get Corr(Y)=G.
> Please help me to prove this
> I need this theorey to include it in my methodology.
>
>
> Thank you very much.
> Devasiri
>

Subject: Generation of Correlated Data

From: Chetan

Date: 29 Apr, 2013 20:12:12

Message: 10 of 12

Hi Roger,
Is it possible to generalize this to data sets with any mean, variance and covariance? I did some calculations at home, but I find that the correlation(X,Z) does not change much from the original correlation of (X,Y) . Infact the correlation as a function of K declines asymptotically. I could be totally wrong of course - but if you provide me an email address, I can send you my file - if you care to look that is.
Ty
Chet



"Roger Stafford" wrote in message <g7sa29$gnv$1@fred.mathworks.com>...
> "Deva MDP" <devasiri@gmail.com> wrote in message <g7ri75$8hr
> $1@fred.mathworks.com>...
> > Can some one tell how to generate two random data sets
> > with known correlation, (say Corr. Coef. = 0.5)
>
> Generate any two mutually independent variables, x and y, whose means are
> 0 and variances 1. For example x = randn(n,1) and y = randn(n,1). Then we
> have
>
> E(x) = E(y) = 0,
> E(x^2) = E(y^2) = 1
> E(x*y) = E(x)*E(y) = 0
>
> Then construct z = x + k*y where k is some constant yet to be determined.
> Now we have
>
> E(z) = 0,
> E(z^2) = E(x^2) + 2*k*E(x*y) + k^2*E(y^2) = 1 + k^2,
> E(x*z) = E(x^2) + k*E(x*y) = 1
>
> Hence
>
> corr(x,z) = E(x*z)/sqrt(E(x^2)*E(z^2)) = 1/sqrt(1+k^2)
>
> Therefore solve for the value of k that gives you the desired correlation
> coefficient. For corr = .5 it would be k = sqrt(3).
>
> Are you sure you don't have further requirements? You have left a lot of
> freedom in your description here.
>
> Roger Stafford
>

Subject: Generation of Correlated Data

From: Tom Lane

Date: 30 Apr, 2013 14:01:52

Message: 11 of 12

> Is it possible to generalize this to data sets with any mean, variance and
> covariance?

I didn't see the original posting, but it sounds like you want to specify an
exact mean and covariance. This is not the same as generating from a
theoretical mean and covariance (which is what I believe most people ought
to want to do most of the time), because the result is less random than it
should be. But the question comes up from time to time. Here's an example:

% Desired mean and covariance
Mu = [10 15.5 9.99];
Sigma = [5 -3 0;-3 4 1;0 4 10];

% Create too-perfect sample with zero mean and identity covariance
x = randn(100,length(Mu));
z = zscore(x); % if you don't have Statistics Toolbox, subtract mean and
divide by std
c = chol(cov(z));
z = z/c;
cov(z)
mean(z)

% Apply desired mean and covariance
z = z*chol(Sigma);
z = bsxfun(@plus,Mu,z);
cov(z)
mean(z)

-- Tom

Subject: Generation of Correlated Data

From: Chetan

Date: 6 May, 2013 13:34:07

Message: 12 of 12

Thanks Tom,
I'm going to try this and get back to you.
Best
Chet


"Tom Lane" <tlane@mathworks.com> wrote in message <kloisg$bkc$1@newscl01ah.mathworks.com>...
> > Is it possible to generalize this to data sets with any mean, variance and
> > covariance?
>
> I didn't see the original posting, but it sounds like you want to specify an
> exact mean and covariance. This is not the same as generating from a
> theoretical mean and covariance (which is what I believe most people ought
> to want to do most of the time), because the result is less random than it
> should be. But the question comes up from time to time. Here's an example:
>
> % Desired mean and covariance
> Mu = [10 15.5 9.99];
> Sigma = [5 -3 0;-3 4 1;0 4 10];
>
> % Create too-perfect sample with zero mean and identity covariance
> x = randn(100,length(Mu));
> z = zscore(x); % if you don't have Statistics Toolbox, subtract mean and
> divide by std
> c = chol(cov(z));
> z = z/c;
> cov(z)
> mean(z)
>
> % Apply desired mean and covariance
> z = z*chol(Sigma);
> z = bsxfun(@plus,Mu,z);
> cov(z)
> mean(z)
>
> -- Tom

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us