Why does corr function result depend on the number of columns ?

17 views (last 30 days)
Hello everyone,
I want to calculate the correlation coefficient between several physical parameters for astrophysical purpose (with Matlab R2019b). Since I have a number of different parameters, instead of using corr(A,B) for each pair of parameters (A,B), I created a matrix whose columns correspond to all the parameters of interest and calculated corr(X,X) to get the correlation matrix and thus look at the non diagonal terms to get the pairwise correlation coefficients. However, I am surprised that the correlation coefficient for the same pair of parameters (A,B) varies depending on the other parameters (columns) I include in the matrix. As far as I understand corr(X,Y) calculates the pairwise correlation coefficient, which should depend only on the pair of columns considered right ?
Thank you in advance if someone can explain if I misunderstood the use of the corr function !
Philippe

Accepted Answer

Adam Danz
Adam Danz on 22 Aug 2020
Edited: Adam Danz on 27 Aug 2020
Understanding the output of rho=corr(x)
r = corr(x,x);
is the same as
r = corr(x);
If a single value is changed in x in column n, it should affect all of the correlation matrix results in row n and column n.
For example,
% x is an nx3 matrix
% r = corr(x)
% r shows the correlation between
% the following column pairs:
r =
1 & 1 1 & 1 1 & 3
2 & 1 2 & 2 2 & 3
3 & 1 3 & 2 3 & 3
If a value changes in column 3, you can see above that it would affect all values in column 3 and row 3 of the correlation matrix.
Here's a demo
x0 = [1 6 5; 9 3 5; 7 5 3; 5 9 5];
x1 = x0;
x1(10) = 9;
NaN infestation
As explained in this answer, a single NaN value in the input matrix of r=corr(x) at x(i,j) will result in all NaN values in row i and column j of the output matrix.
A single NaN value in one of the two matrices x or y of r=corr(x,y) at coordinate (i,j) will result in a column of NaN values in column j of the output matrix but row i will otherwise be OK.
Ignoring missing values (e.g. NaN).
As explained in this answer, to compute column-wise correlation while ignoring missing values, set the 'Rows' property to either 'complete' or 'pairwise'.
  8 Comments
Ilaria Sani
Ilaria Sani on 21 Apr 2022
Thanks for the above explanations.
I have a follow up question.
What if I add an entire new column?
What I see is that the correlation of untouched columns changes. Is that possible? Is there a correction for number of comarisons?
Thank you so much for your kind reply.
Ilaria
Adam Danz
Adam Danz on 21 Apr 2022
>... the correlation of untouched columns changes...
That shouldn't be the case. Consider this demo below where x1 is the same as x0 except for the addition of a 4th column. The corr results only differ in row 4 and column 4.
x0 = [1 6 5; 9 3 5; 7 5 3; 5 9 5]
x0 = 4×3
1 6 5 9 3 5 7 5 3 5 9 5
corr(x0)
ans = 3×3
1.0000 -0.5270 -0.2928 -0.5270 1.0000 0.2000 -0.2928 0.2000 1.0000
x1 = [x0,[2;3;4;1]]
x1 = 4×4
1 6 5 2 9 3 5 3 7 5 3 4 5 9 5 1
corr(x1)
ans = 4×4
1.0000 -0.5270 -0.2928 0.5292 -0.5270 1.0000 0.2000 -0.7746 -0.2928 0.2000 1.0000 -0.7746 0.5292 -0.7746 -0.7746 1.0000

Sign in to comment.

More Answers (0)

Categories

Find more on Birthdays in Help Center and File Exchange

Products


Release

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!