Skip to Main Content Skip to Search
Home |   Select Country  Choose Country  |  Contact Us  |  Cart Store 
Create Account | Log In
Products & Services Industries Academia Support User Community Company

 

MATLAB News & Notes - November 2003

Monte-Carlo Simulation in MATLAB Using Copulas

by Peter Perkins and Tom Lane

Copulas are functions that describe dependencies among random variables and provide a way to create probability distributions to model dependent multivariate data.

MATLAB is an ideal tool for running simulations with random inputs or noise. If it’s reasonable for the inputs to be independent, functions in the Statistics Toolbox can generate them according to a variety of common univariate distributions. But what if they need to be correlated? Unless the inputs have a standard multivariate distribution, this can be difficult. Copulas may be the answer. In this article, we discuss how to use copulas to generate dependent random data using the Statistics Toolbox. Using a copula, a data analyst can construct a multivariate distribution by specifying marginal univariate distributions and then choosing a particular copula to provide a correlation structure between variables.

Dependence Among Simulation Inputs

To perform a Monte-Carlo simulation, you must choose the probability distribution of each random input and the dependence between the inputs. Neglecting dependence might lead to the wrong conclusions.

For example, consider simulating a population of animals that exists as three subpopulations at separate locations. At any given time, there is a probability pd. that a subpopulation at one location will die off because of random environmental variability. There is also a probability pr that the location will subsequently be recolonized by another subpopulation. pr depends on the number of active subpopulations. If all three subpopulations die off, then the population becomes extinct.

We want to estimate the probability that the population will survive for one century. In this simple example, each subpopulation is either present or absent, and there is no model for the size of each subpopulation.

To simulate this process, we pick a uniform random number Ui at each time step for each location. The animals at the i th location die off if Ui<pd. If the environmental conditions at each location were unrelated, we could choose the Ui’s independently. But if conditions at the three sites were similar, it would be more sensible for the Ui’s to have some positive dependence.

Generating random values from independent uniform distributions is trivial. To generate values with dependence, we start from the multivariate normal distribution with a correlation parameter . Applying the standard normal cumulative distribution function (CDF) to normal random variables transforms them to a uniform distribution.

Figure 1: Estimated probability of extinction within a century as a function of the degree of dependence of the environmental variability between locations, using pd = 0.03 and pr = 0.25. The horizontal axis does not extend to –1, because a correlation < -0.33 is not physically possible among three locations.
Click on image to see enlarged view.

 

The following code generates 1,000 random triples from a trivariate uniform distribution with positive dependence:

n = 1000; rho = .6;z = mvnrnd([0 0 0],[1 rho rho; rho 1 rho; rho rho 1], n);u = normcdf(z);

Using this method, we can vary rho to see what effect dependence in environmental variability among locations has on the probability of extinction.We find that a positive dependence increases the probability that two subpopulations become extinct at the same time. Because they act as backups for each other, it also decreases the probability that the whole population will survive. A negative dependence has the opposite effect. Figure 1 shows estimates of the probability of extinction over one simulated century for a range of dependence.

Dependent Nonuniform Simulation Inputs

Suppose the uniform distribution isn’t what we need. For example, we might want to use a beta distribution to model a subpopulation’s size relative to the carrying capacity (the maximum that can be supported by the habitat). Just as we transformed from normal to uniform using a normal CDF, we can use an inverse CDF to transform from uniform to other distributions. Given u from above, the following line produces random values with beta marginal distributions:

x = betainv(u,3,1.5);

We can even get different marginal distributions for each column. The
following line generates dependent gamma, t, and beta random values:

x = [gaminv(u(:,1),2,1), tinv(u(:,2),5),betainv(u(:,3),3,1.5)];

Figure 2 shows a scatter plot of the first two columns of x, with histograms of each column. (The third column is left out of the plot for clarity.)
Click on image to see enlarged view.

The idea is to use mvnrnd to generate dependent variables, then normcdf to create uniform marginal distributions, then one or more inverse CDF functions to produce the desired marginal distributions.

Measuring Dependence with Rank Correlation

Dependence among columns of x in this construction is determined by the correlation parameter rho of the underlying multivariate normal used to generate z. However, because of the nonlinear transformations imposed by the normal CDF and the beta, t, or gamma inverse CDFs, the linear correlations between columns of x are not the same as those between columns of z (the latter are all approximately equal to rho, except for random variability). In fact, the linear correlations among columns of x depend on the marginal distributions. A more useful way to quantify dependence in this context is with a rank correlation, such as Kendall’s .

Like a linear correlation coefficient, Kendall’s measures the degree to which large or small values of one random variable associate with large or small values of another. However, unlike a linear correlation, the rank correlation is preserved under the CDF and inverse CDF transformations. Therefore, the rank correlations between the final transformed variables in x are exactly equal to those between columns of z, regardless of what marginal distributions are chosen for x.

Kendall’s between two normal random variables is simply , where is their linear correlation. Thus, it’s easy to choose the desired rank correlations among columns of x by specifying the correct linear correlations among columns of z.

Figure 3: Random pairs generated using five different copulas. In all cases, the marginal distributions of both x(:,1) and x(:,2) are Student’s t with 5 degrees of freedom, with Kendall’s rank correlation of 0.6 between them.
Click on image to see enlarged view.

Families of Copulas

The method we used to generate u is an example of a Gaussian copula. A copula is simply a multivariate probability distribution whose marginal distributions are uniform. Copulas let you specify dependence between uniform random variables and then transform to any desired marginal distribution. Specification of the dependence is completely separate from specification of the ultimate marginal distributions.

There are many other families of copulas. A Student’s t copula starts from a multivariate t distribution and uses the t CDF to transform to uniform marginals. Still other families of copulas, known as Archimedian copulas, are defined in terms of their (multivariate) CDFs, rather than with the “constructive” definition of the Gaussian and t copulas. The Gumbel, Clayton, and Frank families are all examples of Archimedian copulas.

Even with the same rank correlations, different copula families impose different dependence structures. Your choice of family may depend on theoretical concerns, on your need to mimic historical data, or on your desire to experiment with different possibilities. Figure 3 compares the five copulas already mentioned. Each graph has 1,000 simulated pairs of random values with marginal t distributions. Each pair was created with a different type of copula but has the same rank correlation. Notice how the five copulas create distributions that are very different with respect to the association of the extremes of either variable.

You may want to try using copulas in your own simulations to see how different families, marginal distributions, and correlations affect the results.

1. Caughley, G. and A. Gunn (1996) Conservation Biology in Theory and Practice, Blackwell Science, 459pp.
2. Frees, E. and E. Valdez (1998) “Understanding Relationships Using Copulas,” North American Actuarial Journal, 2(1):1-25.
3. Genest, C. and J. MacKay (1986) “The joy of copulas: bivariate distributions with uniform marginals,” The American Statistician, 40:280-283.
4. Nelson, R.B. (1999) An Introduction to Copulas, Lecture Notes in Statistics, Vol. 139, Springer-Verlag, 216pp.

For more information
Contact sales
E-mail this page
Print this page
Subscribe to newsletters