Why does corr function result depend on the number of columns ?
17 views (last 30 days)
Show older comments
Hello everyone,
I want to calculate the correlation coefficient between several physical parameters for astrophysical purpose (with Matlab R2019b). Since I have a number of different parameters, instead of using corr(A,B) for each pair of parameters (A,B), I created a matrix whose columns correspond to all the parameters of interest and calculated corr(X,X) to get the correlation matrix and thus look at the non diagonal terms to get the pairwise correlation coefficients. However, I am surprised that the correlation coefficient for the same pair of parameters (A,B) varies depending on the other parameters (columns) I include in the matrix. As far as I understand corr(X,Y) calculates the pairwise correlation coefficient, which should depend only on the pair of columns considered right ?
Thank you in advance if someone can explain if I misunderstood the use of the corr function !
Philippe
0 Comments
Accepted Answer
Adam Danz
on 22 Aug 2020
Edited: Adam Danz
on 27 Aug 2020
Understanding the output of rho=corr(x)
r = corr(x,x);
is the same as
r = corr(x);
If a single value is changed in x in column n, it should affect all of the correlation matrix results in row n and column n.
For example,
% x is an nx3 matrix
% r = corr(x)
% r shows the correlation between
% the following column pairs:
r =
1 & 1 1 & 1 1 & 3
2 & 1 2 & 2 2 & 3
3 & 1 3 & 2 3 & 3
If a value changes in column 3, you can see above that it would affect all values in column 3 and row 3 of the correlation matrix.
Here's a demo
x0 = [1 6 5; 9 3 5; 7 5 3; 5 9 5];
x1 = x0;
x1(10) = 9;
NaN infestation
As explained in this answer, a single NaN value in the input matrix of r=corr(x) at x(i,j) will result in all NaN values in row i and column j of the output matrix.
A single NaN value in one of the two matrices x or y of r=corr(x,y) at coordinate (i,j) will result in a column of NaN values in column j of the output matrix but row i will otherwise be OK.
Ignoring missing values (e.g. NaN).
As explained in this answer, to compute column-wise correlation while ignoring missing values, set the 'Rows' property to either 'complete' or 'pairwise'.
8 Comments
Ilaria Sani
on 21 Apr 2022
Thanks for the above explanations.
I have a follow up question.
What if I add an entire new column?
What I see is that the correlation of untouched columns changes. Is that possible? Is there a correction for number of comarisons?
Thank you so much for your kind reply.
Ilaria
Adam Danz
on 21 Apr 2022
>... the correlation of untouched columns changes...
That shouldn't be the case. Consider this demo below where x1 is the same as x0 except for the addition of a 4th column. The corr results only differ in row 4 and column 4.
x0 = [1 6 5; 9 3 5; 7 5 3; 5 9 5]
corr(x0)
x1 = [x0,[2;3;4;1]]
corr(x1)
More Answers (0)
See Also
Categories
Find more on Birthdays in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!