Why does corr function result depend on the number of columns ?

52 views (last 30 days)
Philippe Garnier
Philippe Garnier on 22 Aug 2020
Commented: Adam Danz on 27 Aug 2020
Hello everyone,
I want to calculate the correlation coefficient between several physical parameters for astrophysical purpose (with Matlab R2019b). Since I have a number of different parameters, instead of using corr(A,B) for each pair of parameters (A,B), I created a matrix whose columns correspond to all the parameters of interest and calculated corr(X,X) to get the correlation matrix and thus look at the non diagonal terms to get the pairwise correlation coefficients. However, I am surprised that the correlation coefficient for the same pair of parameters (A,B) varies depending on the other parameters (columns) I include in the matrix. As far as I understand corr(X,Y) calculates the pairwise correlation coefficient, which should depend only on the pair of columns considered right ?
Thank you in advance if someone can explain if I misunderstood the use of the corr function !

Accepted Answer

Adam Danz
Adam Danz on 22 Aug 2020
Edited: Adam Danz on 27 Aug 2020
Understanding the output of rho=corr(x)
r = corr(x,x);
is the same as
r = corr(x);
If a single value is changed in x in column n, it should affect all of the correlation matrix results in row n and column n.
For example,
% x is an nx3 matrix
% r = corr(x)
% r shows the correlation between
% the following column pairs:
r =
1 & 1 1 & 1 1 & 3
2 & 1 2 & 2 2 & 3
3 & 1 3 & 2 3 & 3
If a value changes in column 3, you can see above that it would affect all values in column 3 and row 3 of the correlation matrix.
Here's a demo
x0 = [1 6 5; 9 3 5; 7 5 3; 5 9 5];
x1 = x0;
x1(10) = 9;
NaN infestation
As explained in this answer, a single NaN value in the input matrix of r=corr(x) at x(i,j) will result in all NaN values in row i and column j of the output matrix.
A single NaN value in one of the two matrices x or y of r=corr(x,y) at coordinate (i,j) will result in a column of NaN values in column j of the output matrix but row i will otherwise be OK.
Ignoring missing values (e.g. NaN).
As explained in this answer, to compute column-wise correlation while ignoring missing values, set the 'Rows' property to either 'complete' or 'pairwise'.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!