## Linear Correlation

### Introduction

*Correlation* quantifies the strength of a linear relationship
between two variables. When there is no correlation between two variables, then
there is no tendency for the values of the variables to increase or decrease in
tandem. Two variables that are uncorrelated are not necessarily independent,
however, because they might have a nonlinear relationship.

You can use linear correlation to investigate whether a linear relationship exists between variables without having to assume or fit a specific model to your data. Two variables that have a small or no linear correlation might have a strong nonlinear relationship. However, calculating linear correlation before fitting a model is a useful way to identify variables that have a simple relationship. Another way to explore how variables are related is to make scatter plots of your data.

*Covariance* quantifies the strength of a linear relationship
between two variables in units relative to their variances. Correlations are
standardized covariances, giving a dimensionless quantity that measures the degree
of a linear relationship, separate from the scale of either variable.

The following MATLAB^{®} functions compute sample correlation coefficients and covariance.
These sample coefficients are estimates of the true covariance and correlation
coefficients of the population from which the data sample is drawn.

### Covariance

Use the MATLAB
`cov`

function to calculate the
sample covariance matrix for a data matrix (where each column represents a separate
quantity).

The sample covariance matrix has the following properties:

`cov(X)`

is symmetric.`diag(cov(X))`

is a vector of variances for each data column. The variances represent a measure of the spread or dispersion of data in the corresponding column. (The`var`

function calculates variance.)`sqrt(diag(cov(X)))`

is a vector of standard deviations. (The`std`

function calculates standard deviation.)The off-diagonal elements of the covariance matrix represent the covariances between the individual data columns.

Here, `X`

can be a vector or a matrix. For an*
m*-by-*n* matrix, the covariance matrix is
*n*-by-*n*.

For an example of calculating the covariance, load the sample data in
`count.dat`

that contains a 24-by-3
matrix:

load count.dat

Calculate the covariance matrix for this data:

cov(count)

MATLAB responds with the following result:

ans = 1.0e+003 * 0.6437 0.9802 1.6567 0.9802 1.7144 2.6908 1.6567 2.6908 4.6278

The covariance matrix for this data has the following form:

$$\begin{array}{c}\left[\begin{array}{ccc}{s}^{2}{}_{11}& {s}^{2}{}_{12}& {s}^{2}{}_{13}\\ {s}^{2}{}_{21}& {s}^{2}{}_{22}& {s}^{2}{}_{23}\\ {s}^{2}{}_{31}& {s}^{2}{}_{32}& {s}^{2}{}_{33}\end{array}\right]\\ {s}^{2}{}_{ij}={s}^{2}{}_{ji}\end{array}$$

Here, *s*^{2}* _{ij}* is the sample covariance between column

*i*and column

*j*of the data. Because the

`count`

matrix contains three columns, the covariance matrix is 3-by-3.**Note**

In the special case when a vector is the argument of `cov`

,
the function returns the variance.

### Correlation Coefficients

The function `corrcoef`

produces a matrix of
sample correlation coefficients for a data matrix (where each column represents a
separate quantity). The correlation coefficients range from -1 to 1, where

Values close to 1 indicate that there is a positive linear relationship between the data columns.

Values close to -1 indicate that one column of data has a negative linear relationship to another column of data (

*anticorrelation*).Values close to or equal to 0 suggest there is no linear relationship between the data columns.

For an* m*-by-*n* matrix, the
correlation-coefficient matrix is *n*-by-*n*.
The arrangement of the elements in the correlation coefficient matrix corresponds to
the location of the elements in the covariance matrix, as described in Covariance.

For an example of calculating correlation coefficients, load the sample data in
`count.dat`

that contains a 24-by-3
matrix:

load count.dat

Type the following syntax to calculate the correlation coefficients:

corrcoef(count)

This results in the following 3-by-3 matrix of correlation coefficients:

ans = 1.0000 0.9331 0.9599 0.9331 1.0000 0.9553 0.9599 0.9553 1.0000

Because all correlation coefficients are close to 1, there is a strong positive
correlation between each pair of data columns in the `count`

matrix.