# different result of PCA function and my PCA

5 views (last 30 days)
farshad jahangiri on 12 Oct 2022
Edited: David Goodmanson on 12 Oct 2022
HI. I wrote a code for principal component analysis ( PCA).
the number of results are same but in some PC the sign of some of them is different (ex. In PC3 coeff=~coeff2). I am confused about the signs.Is my code incomplete? what is the meaning of this sign?
I checked the result of minitab and there is same problem. anyway I want my results to be the same with the results of the MATLAB function.
data=importdata('n_data.txt');
%% function
[coeff] = pca(data,'Algorithm','eig')
coeff = 9×9
0.0025 0.6199 -0.4172 0.0801 -0.4719 0.4581 -0.0491 0.0043 -0.0188 0.4372 0.0212 -0.1046 0.1054 0.0479 -0.0391 0.2120 -0.4842 0.7093 -0.1774 0.5928 -0.2163 -0.3074 0.6132 -0.3022 0.0888 0.0067 0.0254 0.3655 0.2313 0.2419 -0.3789 -0.3934 -0.4807 -0.3934 -0.2079 -0.1645 0.4045 -0.0324 -0.2678 0.4054 0.2959 0.0051 -0.0489 -0.3734 -0.6081 0.4222 0.0394 0.0190 -0.1810 -0.1516 -0.0930 0.7627 0.3612 -0.2081 -0.0631 0.4477 0.6480 0.5930 0.0096 -0.0965 0.1144 0.0355 0.0170 0.3501 0.0853 0.4194 -0.3535 0.3364 0.6541 -0.1644 0.0246 -0.0248 0.4188 0.0111 -0.2045 0.2658 0.1430 -0.1430 -0.4072 0.6712 0.2342
%% my PCA
C = cov(data);
[u,d]=eig(C); %find eigen value and eigen vectors
D=diag(d); % Convert matrix to column vector
[D,indx]=sort(D,'descend'); % Sort eigen value and corresponded eigen vector
Coeff2=u(:,indx)
Coeff2 = 9×9
0.0025 0.6199 0.4172 -0.0801 -0.4719 -0.4581 0.0491 0.0043 0.0188 0.4372 0.0212 0.1046 -0.1054 0.0479 0.0391 -0.2120 -0.4842 -0.7093 -0.1774 0.5928 0.2163 0.3074 0.6132 0.3022 -0.0888 0.0067 -0.0254 0.3655 0.2313 -0.2419 0.3789 -0.3934 0.4807 0.3934 -0.2079 0.1645 0.4045 -0.0324 0.2678 -0.4054 0.2959 -0.0051 0.0489 -0.3734 0.6081 0.4222 0.0394 -0.0190 0.1810 -0.1516 0.0930 -0.7627 0.3612 0.2081 -0.0631 0.4477 -0.6480 -0.5930 0.0096 0.0965 -0.1144 0.0355 -0.0170 0.3501 0.0853 -0.4194 0.3535 0.3364 -0.6541 0.1644 0.0246 0.0248 0.4188 0.0111 0.2045 -0.2658 0.1430 0.1430 0.4072 0.6712 -0.2342

John D'Errico on 12 Oct 2022
Edited: John D'Errico on 12 Oct 2022
Congratulations! You are the 1 millionth person to have this happen, and then be confused. Your reward is a free trip to Newark, New Jersey. Free, that is, in the sense, that we pay nothing towards it, you pay all costs, make all arrangements. ;-)
Anyway, the eigenvectors that are returned (as part of the PCA) are NOT unique. They can be completely arbitrarily multiplied by -1. Think about it like this. What is an eigenvector, eigenvalue pair, but A solution to the problem:
A*x = lambda*x
But if x and lambda form a valid solution to the problem, then is it not true that -x and lambda ALSO form a solution? So is it not equally true that
A*(-x) = lambda*(-x)
What did you find? That in some cases, the vectors had a DIFFERENT sign. This is comepletely expected, and cannot be controlled, nor should you care.

David Goodmanson on 12 Oct 2022
Edited: David Goodmanson on 12 Oct 2022
Each column vector in the cof matrix is arbitrary up to an overall constant of absolute value 1, which for the case of everything being real means an overall factor of +-1. That is the nature of the disagreement that you have, which is good since it shows that your code is basically correct. Those sign changes do not really matter, but it's not the only thing that changes.
In the usual case, X is an nxm matrix with n>m. The means are subtracted down each column, Xm = X - mean(X). Then the scores are calculated with
score = Xm*cof
Then, since the cof matrix is unitary,
Xm = score*cof' (1)
Suppose that the kth column of your cof matrix has a sign change of -1 compared to Matlab pca. If you compute the scores, you find that the kth column of your scores also has a sign change of -1 compared to Matlab pca scores. What's important is, if cof and score stay paired up, (1) is unchanged and characterizes the data.