How to apply PCA correctly?

Sepp on 12 Dec 2015
Commented: the cyclist on 26 Dec 2020 at 18:54
I'm currently struggling with PCA and Matlab. Let's say we have a data matrix X and a response y (classification task). X consists of 12 rows and 4 columns. The rows are the data points, the columns are the predictors (features).
Now, I can do PCA with the following command:
[coeff, score] = pca(X);
As I understood from the matlab documentation, coeff contains the loadings and score contains the principal components in the columns. That mean first column of score contains the first principal component (associated with the highest variance) and the first column of coeff contains the loadings for the first principal component.
Is this correct?
But if this is correct, why is then X * coeff not equal to score?

DrJ on 11 Dec 2019
Sepp @Sepp
your doubt can be clarified by this tutorial (eventhough in another program context) .. specially after 5' in
the cliclist
fabulous and generous explanation

the cyclist
the cyclist on 12 Dec 2015
Edited: the cyclist on 18 Apr 2020
Maybe this script will help.
rng 'default'
M = 7; % Number of observations
N = 5; % Number of variables observed
X = rand(M,N);
% De-mean
X = bsxfun(@minus,X,mean(X));
% Do the PCA
[coeff,score,latent] = pca(X);
% Calculate eigenvalues and eigenvectors of the covariance matrix
covarianceMatrix = cov(X);
[V,D] = eig(covarianceMatrix);
% "coeff" are the principal component vectors.
% These are the eigenvectors of the covariance matrix.
% Compare the columns of coeff and V.
% (Note that the columns are not necessarily in the same *order*,
% and they might be *lightly different from each other
% due to floating-point error.)
% Multiply the original data by the principal component vectors
% to get the projections of the original data on the
% principal component vector space. This is also the output "score".
% Compare ...
dataInPrincipalComponentSpace = X*coeff
% The columns of X*coeff are orthogonal to each other. This is shown with ...
% The variances of these vectors are the eigenvalues of the covariance matrix, and are also the output "latent". Compare
the cyclist
the cyclist on 14 Apr 2020
The first line (in green) is a comment that describes what these lines mean.
The second line
displays the value of the variable coeff to the screen. coeff is one of the outputs of MATLAB's pca function. It is a numeric array in which each column is a principal component vector.
The third line
displays the value of the variable V to the screen. V is one of the outputs of MATLAB's eig function. I applied eig to the covariance matrix of the data, to calculate its eigenvectors. My code shows that coeff and V are equal to each other.
V_sal on 18 Apr 2020
Thanks, you explanation is really good, I'm working with EEG data , and I cant make 'V' and 'coeff' be the same, don't really understand why, do you have any idea?
the cyclist
the cyclist on 18 Apr 2020
coeff and V are not necessarily identical.
Their columns will be the same (within floating-point error), but those columns will not necessarily appear in the same order.
I've edited the comments in the code above to reflect that.
If that does not fix the issue, maybe you could upload your data and I could take a look.

Yaser Khojah
Yaser Khojah on 17 Apr 2019
Dear the cyclist, thanks for showing this example. I have a question regarding to the order of the COEFF since they are different than the V. Is there anyway to see which order of these columns? In another word, what are the variables of each column?


the cyclist
the cyclist on 31 Mar 2020
As you can see in my code above it is
X * coeff
that should equal score, not
coeff * X
(where X is the de-meaned input to pca).
Yuan Luo
Yuan Luo on 8 Nov 2020
why X need to be de-meaned? since pca by defualt will center the data.
the cyclist
the cyclist on 26 Dec 2020 at 18:54
Sorry it took me a while to see this question.
If you do
[coeff,score] = pca(X);
it is true that pca() will internally de-mean the data. So, score is derived from de-meaned data.
But it does not mean that X itself [outside of pca()] has been de-meaned. So, if you are trying to re-create what happens inside pca(), you need to manually de-mean X first.

Greg Heath
Greg Heath on 13 Dec 2015
