Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Generating random numbers for correlated variables

Subject: Generating random numbers for correlated variables

From: Kirk

Date: 27 Jan, 2009 15:50:18

Message: 1 of 8

I have a question regarding the generation of random data when the two variables are strongly correlated.

Take the example:

I have a 62 year record of monthly temperature data in the form of two variables TMAX and TMIN. That gives us 12 months (January - December) of means and standard deviations

tmax_mu = -7.22 -3.67 2.40 11.25 19.11 23.89 26.73 25.26 19.61 12.78 2.38 -4.82
tmax_sigma = 3.04 2.82 2.48 2.73 2.26 1.78 1.64 1.77 1.79 2.06 2.46 2.63

tmin_mu = -18.47 -16.03 -9.69 -2.23 3.22 8.47 11.95 11.34 6.91 1.340 -6.19 -14.46
tmin_sigma = 3.63 3.64 2.95 1.28 1.59 1.24 1.19 1.44 1.27 1.72 2.40 3.33

To generate random numbers about the those means and standard deviations, I used the function 'normrnd(mu,sigma)'.

tmax_r1 = normrnd(tmax_mu,tmax_sigma);
tmin_r1 = normrnd(tmin_mu,tmin_sigma);

However, as you can see, I am calling the 'normrnd' function independently for TMAX and TMIN and I am concerned about covariance.

A correlation matrix indicates (as expected) that the two variables are highly correlated. Here are the correlation coefficients for January. The other 11 months are similar (~0.95).

            TMAX TMIN
TMAX 1.000 0.948
TMIN 0.948 1.000

Can anyone direct me to a technique that would generate random estimates for the two variables and still account for the high degree of correlation?

Thanks in advance

Subject: Generating random numbers for correlated variables

From: Ting Su

Date: 27 Jan, 2009 15:53:55

Message: 2 of 8

Kirk,
If I understand your questions correctly, then 'mvnrnd' can satisfy your
requirement.

-Ting Su
The Mathworks.
"Kirk" <kwythers.nospam@umn.edu> wrote in message
news:glnafp$9ef$1@fred.mathworks.com...
>I have a question regarding the generation of random data when the two
>variables are strongly correlated.
>
> Take the example:
>
> I have a 62 year record of monthly temperature data in the form of two
> variables TMAX and TMIN. That gives us 12 months (January - December) of
> means and standard deviations
>
> tmax_mu = -7.22 -3.67 2.40 11.25 19.11 23.89 26.73 25.26 19.61 12.78
> 2.38 -4.82
> tmax_sigma = 3.04 2.82 2.48 2.73 2.26 1.78 1.64 1.77 1.79 2.06 2.46 2.63
>
> tmin_mu = -18.47 -16.03 -9.69 -2.23 3.22 8.47 11.95 11.34 6.91
> 1.340 -6.19 -14.46
> tmin_sigma = 3.63 3.64 2.95 1.28 1.59 1.24 1.19 1.44 1.27 1.72 2.40 3.33
>
> To generate random numbers about the those means and standard deviations,
> I used the function 'normrnd(mu,sigma)'.
>
> tmax_r1 = normrnd(tmax_mu,tmax_sigma);
> tmin_r1 = normrnd(tmin_mu,tmin_sigma);
>
> However, as you can see, I am calling the 'normrnd' function independently
> for TMAX and TMIN and I am concerned about covariance.
>
> A correlation matrix indicates (as expected) that the two variables are
> highly correlated. Here are the correlation coefficients for January. The
> other 11 months are similar (~0.95).
>
> TMAX TMIN
> TMAX 1.000 0.948
> TMIN 0.948 1.000
>
> Can anyone direct me to a technique that would generate random estimates
> for the two variables and still account for the high degree of
> correlation?
>
> Thanks in advance
>
>
>
>
>

Subject: Generating random numbers for correlated variables

From: ImageAnalyst

Date: 27 Jan, 2009 16:02:24

Message: 3 of 8

Not an answer to your question, but related, and some people might
want to check it out. Microsoft has posted MATLAB code on
http://research.microsoft.com/en-us/downloads/db1653f0-1308-4b45-b358-d8e1011385a0/default.aspx

Here is their description:

Fast subroutines for Matlab programs
This library provides highly optimized versions of primitive functions
such as repmat, set intersection, and gammaln. It provides efficient
random number generators and evaluation of common probability
densities. It provides routines for counting floating-point operations
(FLOPS), useful for benchmarking algorithms. There are also some
useful utilities such as filename globbing and parsing of variable-
length argument lists.

Subject: Generating random numbers for correlated variables

From: Kirk

Date: 27 Jan, 2009 18:13:02

Message: 4 of 8

Thanks for the tip. I think this is on the right track. However, I'm still a bit confused. Would the approach then be to use 'normrnd' to get a randomly generated vector of TMAX values from measured TMAX means and standard deviations:

tmax_mean = [-7.22 -3.67 2.40 11.25 19.11 23.89 26.73 25.26 19.61 12.78 2.38 -4.82];
tmax_std = [3.04 2.82 2.48 2.73 2.26 1.78 1.64 1.77 1.79 2.06 2.46 2.63];

tmax_r1 = normrnd(tmax_mean,tmax_std);

Then use 'mvnrnd' to generate a vector of synthetic TMIN data based on TMIN means and a covariance matrix?

tmin_mean = [-18.47 -16.03 -9.69 -2.23 3.22 8.47 11.95 11.34 6.91 1.340 -6.19 -14.46];
tmin_cov = [9.15 10.40; 10.40 13.17];

tmin_r1 = [tmin_means,tmin_cov];

The covariance matrix looks like this:

          TMAX TMIN
TMAX 9.15 10.40
TMIN 10.40 13.17

or am missing something here?


"Ting Su" <Ting.Su@mathworks.com> wrote in message <glnaml$ndp$1@fred.mathworks.com>...
> Kirk,
> If I understand your questions correctly, then 'mvnrnd' can satisfy your
> requirement.
>
> -Ting Su
> The Mathworks.

Subject: Generating random numbers for correlated variables

From: Roger Stafford

Date: 27 Jan, 2009 20:05:07

Message: 5 of 8

"Kirk" <kwythers.nospam@umn.edu> wrote in message <glnire$h33$1@fred.mathworks.com>...
> Thanks for the tip. I think this is on the right track. However, I'm still a bit confused. Would the approach then be to use 'normrnd' to get a randomly generated vector of TMAX values from measured TMAX means and standard deviations:
>
> tmax_mean = [-7.22 -3.67 2.40 11.25 19.11 23.89 26.73 25.26 19.61 12.78 2.38 -4.82];
> tmax_std = [3.04 2.82 2.48 2.73 2.26 1.78 1.64 1.77 1.79 2.06 2.46 2.63];
>
> tmax_r1 = normrnd(tmax_mean,tmax_std);
>
> Then use 'mvnrnd' to generate a vector of synthetic TMIN data based on TMIN means and a covariance matrix?
>
> tmin_mean = [-18.47 -16.03 -9.69 -2.23 3.22 8.47 11.95 11.34 6.91 1.340 -6.19 -14.46];
> tmin_cov = [9.15 10.40; 10.40 13.17];
>
> tmin_r1 = [tmin_means,tmin_cov];
>
> The covariance matrix looks like this:
>
> TMAX TMIN
> TMAX 9.15 10.40
> TMIN 10.40 13.17
>
> or am missing something here?
>
> "Ting Su" <Ting.Su@mathworks.com> wrote in message <glnaml$ndp$1@fred.mathworks.com>...
> > Kirk,
> > If I understand your questions correctly, then 'mvnrnd' can satisfy your
> > requirement.
> >
> > -Ting Su
> > The Mathworks.

  Probably you should use 'mvnrnd(mu,sigma,cases)' separately for each of the twelve months. For each one you would need its two corresponding means in 'mu' and the month's two by two covariances in 'sigma'. The number in 'cases' tells how many random numbers you want to generate for that month.

  Don't use 'normrnd' which only generates independent random variables unless you are prepared to take the appropriate linear combinations of its results.

Roger Stafford

Subject: Generating random numbers for correlated variables

From: Kirk

Date: 28 Jan, 2009 15:42:02

Message: 6 of 8

> Probably you should use 'mvnrnd(mu,sigma,cases)' separately for each of the twelve months. For each one you would need its two corresponding means in 'mu' and the month's two by two covariances in 'sigma'. The number in 'cases' tells how many random numbers you want to generate for that month.
>
> Don't use 'normrnd' which only generates independent random variables unless you are prepared to take the appropriate linear combinations of its results.
>

I have clarification question about how to set up a matrix (or vector) of 12 months of 2 means (mu), and 12 months of 2 by 2 covariances.

In my specific case I need to fill a 1000 year vector with stochastically generated temperature data. Therefor when I was using 'normrnd' to generate the values (and I understand why this is not the function to use), I used a struct with the name "clim", to hold vectors "tmax", "tmin", "par", "prec" I also created vectors of length 12 for each vector's means, and standard deviations (tmax_mu, and tmax_sigma). I used the 'repmat' function to repeat the 12 months of means and standard deviations for the 1000 year series like this.

clim.tmax(1:12000)=normrnd(repmat(tmax_mu,1000,1),repmat((tmax_sigma),1000,1));


I would like to apply a similar approach with the 'mvnrnd' function. It seems to me that I would still have to use 'normrnd' for the first variable "tmax", then use 'mvnrnd' to build the succeeding variables "tmin", "par", "prec", based on the covariances with tmax. Correct?

However, it is unclear to my how to build the months of 2 means and 12 months of 2 by 2 covariances for the arguments to the 'mvnrnd' function.

Instead of the vectors:
tmax_mu=[-7.22,-3.67,2.40,11.25,19.11,23.89,26.73,25.26,19.61,12.78,2.38,-4.82]';
tmax_sigma=[3.04,2.82,2.48,2.73,2.26,1.78,1.64,1.77,1.79,2.06,2.46,2.63]';

Would the input arguments for 'mvnrnd' look something like:
tmaxtmin_mu = [-7.22 -18.47, -3.67 -16.03, 2.40 -9.69, 11.25 -2.23, 19.11 3.22, 23.89 8.47, 26.73 11.95, 25.26 11.34, 19.61 6.91, 12.78 1.34, 2.38 -6.19, -4.82 -14.46];
where each pair separated by commas represents tmax and tmin mean?

and for 2 by covariances

tmaxtmin_sigma = [ 9.15 10.40; 10.40 13.17, 7.97 9.55; 9.55 13.22; 6.13 6.38; 6.38 8.69,...];
where each block between the commas represents a 2 by 2 covariance matrix?

If this implementation for the 2 means, and 4 covariances is correct, then I could still use the 'repmat' function to build the 1000 year struct of stocastic climate based on measured means.

Subject: Generating random numbers for correlated variables

From: Roger Stafford

Date: 29 Jan, 2009 02:30:04

Message: 7 of 8

"Kirk" <kwythers.nospam@umn.edu> wrote in message <glpuca$dsc$1@fred.mathworks.com>...
> .......
> I would like to apply a similar approach with the 'mvnrnd' function. It seems to me that I would still have to use 'normrnd' for the first variable "tmax", then use 'mvnrnd' to build the succeeding variables "tmin", "par", "prec", based on the covariances with tmax. Correct?
> ........

  No Kirk, it is not valid to simply use the covariances of the three variables "tmin", "par", "prec", based on their individual covariances with 'tmax'. You need the covariances between each possible pairing of the four, including pairings of the variables with respect to themselves, i.e. variances, if you expect to generate a valid set of four random variables that truly reflect the statistical properties of your data. Otherwise the covariances that hold between the latter three will not be reflected properly in the output statistics. Using 'normrnd' will most definitely not do the job, at least not as you are carrying it out here. There is no reason not to use 'mvnrnd' which is expressly designed for this kind of computation.

  Its documentation states "MU is an n-by-d matrix, and mvnrnd generates each row of R using the corresponding row of mu. SIGMA is a d-by-d symmetric positive semi-definite matrix, or a d-by-d-by-n array." For you with d = 4 you will need twelve different 4 x 4 covariance matrices, one for each month, combined in a 4 x 4 x 12 multi-dimensional array, and these need to have a 'repmat' applied to them to achieve the desired 4 x 4 x 12000 size. Your MU needs to be 12000 x 4. The result will be a 12000 x 4 array containing the 12000 sets of the four random variables.

  I believe a frequent practice in using 'mvnrnd' is to have a non-varying set of means and covariances which is used to generate a random set of variables having constant statistical properties. This does not require the immense number of repetitions that you have here. That is why I originally recommended the use of 'mvnrnd' with the 'cases' option, performed twelve different times, once for each month, with no repetitions necessary.

  One thing strikes me as strange here. Why have you chosen to make the number of random sets generated (12000) the same as the number of observations? You could just as easily generate 120000 or 1200 random samples from this same set of statistics by altering the number of repetitions in 'repmat'. Once you have computed the means and covariances for each month from the 12 sets of 1000 yearly observations, you are not restricted to just generating 12000 random variables. The number should depend on what your needs are in the use of these random variables.

Roger Stafford

Subject: Generating random numbers for correlated variables

From: Kirk

Date: 29 Jan, 2009 05:17:01

Message: 8 of 8

> Its documentation states "MU is an n-by-d matrix, and mvnrnd generates each row of R using the corresponding row of mu. SIGMA is a d-by-d symmetric positive semi-definite matrix, or a d-by-d-by-n array." For you with d = 4 you will need twelve different 4 x 4 covariance matrices, one for each month, combined in a 4 x 4 x 12 multi-dimensional array, and these need to have a 'repmat' applied to them to achieve the desired 4 x 4 x 12000 size. Your MU needs to be 12000 x 4. The result will be a 12000 x 4 array containing the 12000 sets of the four random variables.
>
> I believe a frequent practice in using 'mvnrnd' is to have a non-varying set of means and covariances which is used to generate a random set of variables having constant statistical properties. This does not require the immense number of repetitions that you have here. That is why I originally recommended the use of 'mvnrnd' with the 'cases' option, performed twelve different times, once for each month, with no repetitions necessary.
>
> One thing strikes me as strange here. Why have you chosen to make the number of random sets generated (12000) the same as the number of observations?

Thanks for the reply Roger.

 I see the source of the confusion. I have 62 years of measured monthly climate data, i.e. observations (tmax, tmin, par, and precip). From those data I need to stochastically generate 1000 years of climate data to drive an ecosystem model. I need the synthetic climate to realistically represent the 62 years of observed data including realistic variability, noise, and covariance. To further complicate things, I need to wrap all this (including the ecosystem model) in a loop in order to do Monte Carlo style iterations. I did consider your "cases" suggestion, but I need to calculate the climate "on the fly", i.e. each pass through the loop a new 1000 year set of climate.

I understand now why 'normrnd' is not fit for generating random numbers for 4 variables that covary. Understand that the attraction to 'normrnd' was that since it took standard deviations as arguments, it was easy to increase or decrease the variance (i.e. the noise in my climate) by simply generating random numbers from 2*sigma or 0.5*sigma. While your suggestion of 'mvnrnd' does deal with covariance of my climate data, it does not allow me to increase or decrease "noise" as easily.

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us