Linear Correlation

Introduction

Before you fit a function to model the relationship between two measured quantities, it is a good idea to determine if a relationship exists between these quantities.

Correlation quantifies the strength of a linear relationship between two variables. When there is no correlation between the two quantities, then there is no tendency for the values of one quantity to increase or decrease with the values of the second quantity.

The following three MATLAB® functions compute correlation coefficients and covariance. In typical data analysis applications, where you are mostly interested in the degree of relationship between variables, you might only calculate correlation coefficients, but these are derived from covariances.

Function

Description

corrcoef

Correlation coefficient matrix

cov

Covariance matrix

xcorr (a Signal Processing Toolbox™ function)

Cross-correlation sequence of a random process (includes autocorrelation)

Covariance

Use the MATLAB cov function to explicitly calculate the covariance matrix for a data matrix (where each column represents a separate quantity).

The covariance matrix has the following properties:

Here, X can be a vector or a matrix. For an m-by-n matrix, the covariance matrix is n-by-n.

For an example of calculating the covariance, load the sample data in count.dat that contains a 24-by-3 matrix:

load count.dat

Calculate the covariance matrix for this data:

cov(count)

MATLAB responds with the following result:

ans =
    1.0e+003 *
       0.6437  0.9802  1.6567
       0.9802  1.7144  2.6908
       1.6567  2.6908  4.6278

The covariance matrix for this data has the following form:

a 3 by 3 matrix of sigma squared, where sigma squared sub i j = sigma squared sub j i

Here, σ2ij is the covariance between column i and column j of the data. Because the count matrix contains three columns, the covariance matrix is 3-by-3.

Correlation Coefficients

The correlation coefficient matrix represents the normalized measure of the strength of linear relationship between variables.

The correlation coefficient rX, Y between two random variables X and Y with expected values μX and μY and standard deviations σX and σY is their covariance normalized by their standard deviations, as follows

where E is the expected value operator and cov means covariance. Since μX = E(X), σX2 = E(X2) − E2(X), and likewise for Y, rX,Y is also

The correlation is defined only if both of the standard deviations are finite and both of them are nonzero.

For time series, correlation coefficients rk are given by

where xt is a data value at time step t, k is the lag, and the overall mean is given by

The MATLAB function corrcoef produces a matrix of correlation coefficients for a data matrix (where each column represents a separate quantity). The correlation coefficients range from -1 to 1, where

For an m-by-n matrix, the correlation-coefficient matrix is n-by-n. The arrangement of the elements in the correlation coefficient matrix corresponds to the location of the elements in the covariance matrix, as described in Covariance.

For an example of calculating correlation coefficients, load the sample data in count.dat that contains a 24-by-3 matrix:

load count.dat

Type the following syntax to calculate the correlation coefficients:

corrcoef(count)

This results in the following 3-by-3 matrix of correlation coefficients:

ans = 
    1.0000    0.9331    0.9599
    0.9331    1.0000    0.9553
    0.9599    0.9553    1.0000

Because all correlation coefficients are close to 1, there is a strong correlation between each pair of data columns in the count matrix.

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS