# corrcoef

Correlation coefficients

## Syntax

## Description

returns
the matrix of correlation
coefficients for `R`

= corrcoef(`A`

)`A`

, where the columns of `A`

represent
random variables and the rows represent observations.

`[`

returns the matrix of correlation
coefficients and the matrix of p-values for testing the hypothesis
that there is no relationship between the observed phenomena (null
hypothesis). Use this syntax with any of the arguments from the previous
syntaxes. If an off-diagonal element of `R`

,`P`

] =
corrcoef(___)`P`

is smaller
than the significance level (default is `0.05`

),
then the corresponding correlation in `R`

is considered
significant. This syntax is invalid if `R`

contains
complex elements.

`___ = corrcoef(___,`

returns any of the output arguments from the previous syntaxes with additional
options specified by one or more `Name,Value`

)`Name,Value`

pair arguments.
For example, `corrcoef(A,'Alpha',0.1)`

specifies a 90%
confidence interval, and `corrcoef(A,'Rows','complete')`

omits
all rows of `A`

containing one or more `NaN`

values.

## Examples

### Random Columns of Matrix

Compute the correlation coefficients for a matrix with two normally distributed, random columns and one column that is defined in terms of another. Since the third column of `A`

is a multiple of the second, these two variables are directly correlated, thus the correlation coefficient in the `(2,3)`

and `(3,2)`

entries of `R`

is `1`

.

x = randn(6,1); y = randn(6,1); A = [x y 2*y+3]; R = corrcoef(A)

`R = `*3×3*
1.0000 -0.6237 -0.6237
-0.6237 1.0000 1.0000
-0.6237 1.0000 1.0000

### Two Random Variables

Compute the correlation coefficient matrix between two normally distributed, random vectors of 10 observations each.

A = randn(10,1); B = randn(10,1); R = corrcoef(A,B)

`R = `*2×2*
1.0000 0.4518
0.4518 1.0000

### P-Values of Matrix

Compute the correlation coefficients and p-values of a normally distributed, random matrix, with an added fourth column equal to the sum of the other three columns. Since the last column of `A`

is a linear combination of the others, a correlation is introduced between the fourth variable and each of the other three variables. Therefore, the fourth row and fourth column of `P`

contain very small p-values, identifying them as significant correlations.

A = randn(50,3); A(:,4) = sum(A,2); [R,P] = corrcoef(A)

`R = `*4×4*
1.0000 0.1135 0.0879 0.7314
0.1135 1.0000 -0.1451 0.5082
0.0879 -0.1451 1.0000 0.5199
0.7314 0.5082 0.5199 1.0000

`P = `*4×4*
1.0000 0.4325 0.5438 0.0000
0.4325 1.0000 0.3146 0.0002
0.5438 0.3146 1.0000 0.0001
0.0000 0.0002 0.0001 1.0000

### Correlation Bounds

Create a normally distributed, random matrix, with an added fourth column equal to the sum of the other three columns, and compute the correlation coefficients, p-values, and lower and upper bounds on the coefficients.

A = randn(50,3); A(:,4) = sum(A,2); [R,P,RL,RU] = corrcoef(A)

`R = `*4×4*
1.0000 0.1135 0.0879 0.7314
0.1135 1.0000 -0.1451 0.5082
0.0879 -0.1451 1.0000 0.5199
0.7314 0.5082 0.5199 1.0000

`P = `*4×4*
1.0000 0.4325 0.5438 0.0000
0.4325 1.0000 0.3146 0.0002
0.5438 0.3146 1.0000 0.0001
0.0000 0.0002 0.0001 1.0000

`RL = `*4×4*
1.0000 -0.1702 -0.1952 0.5688
-0.1702 1.0000 -0.4070 0.2677
-0.1952 -0.4070 1.0000 0.2825
0.5688 0.2677 0.2825 1.0000

`RU = `*4×4*
1.0000 0.3799 0.3575 0.8389
0.3799 1.0000 0.1388 0.6890
0.3575 0.1388 1.0000 0.6974
0.8389 0.6890 0.6974 1.0000

The matrices `RL`

and `RU`

give lower and upper bounds, respectively, on each correlation coefficient according to a 95% confidence interval by default. You can change the confidence level by specifying the value of `Alpha`

, which defines the percent confidence, `100*(1-Alpha)`

%. For example, use an `Alpha`

value equal to 0.01 to compute a 99% confidence interval, which is reflected in the bounds `RL`

and `RU`

. The intervals defined by the coefficient bounds in `RL`

and `RU`

are bigger for 99% confidence compared to 95%, since higher confidence requires a more inclusive range of potential correlation values.

`[R,P,RL,RU] = corrcoef(A,'Alpha',0.01)`

`R = `*4×4*
1.0000 0.1135 0.0879 0.7314
0.1135 1.0000 -0.1451 0.5082
0.0879 -0.1451 1.0000 0.5199
0.7314 0.5082 0.5199 1.0000

`P = `*4×4*
1.0000 0.4325 0.5438 0.0000
0.4325 1.0000 0.3146 0.0002
0.5438 0.3146 1.0000 0.0001
0.0000 0.0002 0.0001 1.0000

`RL = `*4×4*
1.0000 -0.2559 -0.2799 0.5049
-0.2559 1.0000 -0.4792 0.1825
-0.2799 -0.4792 1.0000 0.1979
0.5049 0.1825 0.1979 1.0000

`RU = `*4×4*
1.0000 0.4540 0.4332 0.8636
0.4540 1.0000 0.2256 0.7334
0.4332 0.2256 1.0000 0.7407
0.8636 0.7334 0.7407 1.0000

`NaN`

Values

Create a normally distributed matrix involving `NaN`

values, and compute the correlation coefficient matrix, excluding any rows that contain `NaN`

.

A = randn(5,3); A(1,3) = NaN; A(3,2) = NaN; A

`A = `*5×3*
0.5377 -1.3077 NaN
1.8339 -0.4336 3.0349
-2.2588 NaN 0.7254
0.8622 3.5784 -0.0631
0.3188 2.7694 0.7147

R = corrcoef(A,'Rows','complete')

`R = `*3×3*
1.0000 -0.8506 0.8222
-0.8506 1.0000 -0.9987
0.8222 -0.9987 1.0000

Use `'all'`

to include all `NaN`

values in the calculation.

R = corrcoef(A,'Rows','all')

`R = `*3×3*
1 NaN NaN
NaN NaN NaN
NaN NaN NaN

Use `'pairwise'`

to compute each two-column correlation coefficient on a pairwise basis. If one of the two columns contains a `NaN`

, that row is omitted.

R = corrcoef(A,'Rows','pairwise')

`R = `*3×3*
1.0000 -0.3388 0.4649
-0.3388 1.0000 -0.9987
0.4649 -0.9987 1.0000

## Input Arguments

`A`

— Input array

matrix

Input array, specified as a matrix.

If

`A`

is a scalar,`corrcoef(A)`

returns`NaN`

.If

`A`

is a vector,`corrcoef(A)`

returns`1`

.

**Data Types: **`single`

| `double`

**Complex Number Support: **Yes

`B`

— Additional input array

vector | matrix | multidimensional array

Additional input array, specified as a vector, matrix, or multidimensional array.

`A`

and`B`

must be the same size.If

`A`

and`B`

are scalars, then`corrcoef(A,B)`

returns`1`

. If`A`

and`B`

are equal, however,`corrcoef(A,B)`

returns`NaN`

.If

`A`

and`B`

are matrices or multidimensional arrays, then`corrcoef(A,B)`

converts each input into its vector representation and is equivalent to`corrcoef(A(:),B(:))`

or`corrcoef([A(:) B(:)])`

.If

`A`

and`B`

are 0-by-0 empty arrays,`corrcoef(A,B)`

returns a 2-by-2 matrix of`NaN`

values.

**Data Types: **`single`

| `double`

**Complex Number Support: **Yes

### Name-Value Arguments

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

*
Before R2021a, use commas to separate each name and value, and enclose*
`Name`

*in quotes.*

**Example: **`R = corrcoef(A,'Alpha',0.03)`

`Alpha`

— Significance level

0.05 (default) | number between 0 and 1

Significance level, specified as a number between 0 and 1. The value
of the `'Alpha'`

parameter defines the percent
confidence level, 100*(1-`Alpha`

)%, for the correlation
coefficients, which determines the bounds in `RL`

and
`RU`

.

**Data Types: **`single`

| `double`

`Rows`

— Use of `NaN`

option

`'all'`

(default) | `'complete'`

| `'pairwise'`

Use of `NaN`

option, specified as one of these values:

`'all'`

— Include all`NaN`

values in the input before computing the correlation coefficients.`'complete'`

— Omit any rows of the input containing`NaN`

values before computing the correlation coefficients. This option always returns a positive semi-definite matrix.`'pairwise'`

— Omit any rows containing`NaN`

only on a pairwise basis for each two-column correlation coefficient calculation. This option can return a matrix that is not positive semi-definite.

**Data Types: **`char`

## Output Arguments

`R`

— Correlation coefficients

matrix

Correlation coefficients, returned as a matrix.

For one matrix input,

`R`

has size`[size(A,2) size(A,2)]`

based on the number of random variables (columns) represented by`A`

. The diagonal entries are set to one by convention, while the off-diagonal entries are correlation coefficients of variable pairs. The values of the coefficients can range from -1 to 1, with -1 representing a direct, negative correlation, 0 representing no correlation, and 1 representing a direct, positive correlation.`R`

is symmetric.For two input arguments,

`R`

is a 2-by-2 matrix with ones along the diagonal and the correlation coefficients along the off-diagonal.If any random variable is constant, its correlation with all other variables is undefined, and the respective row and column value is

`NaN`

.

`P`

— P-values

matrix

P-values, returned as a matrix. `P`

is symmetric
and is the same size as `R`

. The diagonal entries
are all ones and the off-diagonal entries are the p-values for each
variable pair. P-values range from 0 to 1, where values close to 0
correspond to a significant correlation in `R`

and
a low probability of observing the null hypothesis.

`RL`

— Lower bound for correlation coefficient

matrix

Lower bound for correlation coefficient, returned as a matrix. `RL`

is
symmetric and is the same size as `R`

. The diagonal
entries are all ones and the off-diagonal entries are the 95% confidence
interval lower bound for the corresponding coefficient in `R`

.
The syntax returning `RL`

is invalid if `R`

contains
complex values.

`RU`

— Upper bound for correlation coefficient

matrix

Upper bound for correlation coefficient, returned as a matrix. `RU`

is
symmetric and is the same size as `R`

. The diagonal
entries are all ones and the off-diagonal entries are the 95% confidence
interval upper bound for the corresponding coefficient in `R`

.
The syntax returning `RL`

is invalid if `R`

contains
complex values.

## More About

### Correlation Coefficient

The correlation coefficient of two random variables is a measure of their
linear dependence. If each variable has *N* scalar observations,
then the Pearson correlation coefficient is defined as

$$\rho (A,B)=\frac{1}{N-1}{\displaystyle \sum _{i=1}^{N}\left(\frac{{A}_{i}-{\mu}_{A}}{{\sigma}_{A}}\right)\left(\frac{{B}_{i}-{\mu}_{B}}{{\sigma}_{B}}\right)},$$

where $${\mu}_{A}$$ and $${\sigma}_{A}$$ are the mean and standard deviation of *A*,
respectively, and $${\mu}_{B}$$ and $${\sigma}_{B}$$ are the mean and standard deviation of *B*.
Alternatively, you can define the correlation coefficient in terms of the covariance
of *A* and *B*:

$$\rho (A,B)=\frac{\mathrm{cov}(A,B)}{{\sigma}_{A}{\sigma}_{B}}.$$

The correlation coefficient *matrix* of two random variables is the matrix
of correlation coefficients for each pairwise variable combination,

$$R=\left(\begin{array}{cc}\rho (A,A)& \rho (A,B)\\ \rho (B,A)& \rho (B,B)\end{array}\right).$$

Since *A* and *B* are always
directly correlated to themselves, the diagonal entries are just 1, that is,

$$R=\left(\begin{array}{cc}1& \rho (A,B)\\ \rho (B,A)& 1\end{array}\right).$$

## References

[1] Fisher, R.A. *Statistical Methods for Research
Workers*, 13th Ed., Hafner, 1958.

[2] Kendall, M.G. *The Advanced Theory of Statistics*,
4th Ed., Macmillan, 1979.

[3] Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery,
B.P. *Numerical Recipes in C*, 2nd Ed., Cambridge
University Press, 1992.

## Extended Capabilities

### Tall Arrays

Calculate with arrays that have more rows than fit in memory.

Usage notes and limitations:

`A`

and`B`

must be tall arrays of the same size, even if both are vectors.Inputs

`A`

and`B`

cannot be scalars for`corrcoef(A,B)`

.The second input

`B`

must be 2-D.The

`'pairwise'`

option is not supported.

For more information, see Tall Arrays.

### C/C++ Code Generation

Generate C and C++ code using MATLAB® Coder™.

Usage notes and limitations:

Row-vector input is only supported when the first two inputs are vectors and nonscalar.

### Thread-Based Environment

Run code in the background using MATLAB® `backgroundPool`

or accelerate code with Parallel Computing Toolbox™ `ThreadPool`

.

This function fully supports thread-based environments. For more information, see Run MATLAB Functions in Thread-Based Environment.

### GPU Arrays

Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

### Distributed Arrays

Partition large arrays across the combined memory of your cluster using Parallel Computing Toolbox™.

This function fully supports distributed arrays. For more information, see Run MATLAB Functions with Distributed Arrays (Parallel Computing Toolbox).

## Version History

**Introduced before R2006a**

## See Also

`plotmatrix`

| `cov`

| `mean`

| `std`

## Open Example

You have a modified version of this example. Do you want to open this example with your edits?

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)