PCA and data projection issue

Question

George Pavlidis on 29 Jan 2018

1
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/379546-pca-and-data-projection-issue

Commented: George Pavlidis on 29 Jan 2018

I am using PCA to get identify the main data components and re-project the data to that space With X the input (row-wise MxN data), Y the output and D the required dimensions, typically this is done

[~,Y] = pca( X );
Y = Y(:,1:D);

If I want to do this "manually" I would compute the covariance matrix, then the eigenvalues/vectors and multiply the data with the new coordinate system as follows:

X = X-mean(X);          % 'center' the data around zero
A = (X'*X) / length(X); % compute the covariance matrix (normalised by the num of elements)
[V,~] = eig(A);         % compute the eigenvectors -- this results in a increasing ordering
V = fliplr(V);          % flip the eigenvectors so that the most significant come first
V = V(:,1:D);           % take only those eigenvectors required
Y = X * V;              % project the original data to the new coordinate system

Unfortunately the two above methods do not produce the same results! In particular, and this is the interesting part, some of the resulting values are equal and some have flipped signs! If I get the difference of the absolute values I get almost zero (in the order of 1e-14) for all matrix elements.

I tried even simple examples like the one presented here but I see the same issue.

Any ideas?

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

John D'Errico on 29 Jan 2018

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/379546-pca-and-data-projection-issue#answer_302341

Edited: John D'Errico on 29 Jan 2018

Flipped signs is completely irrelevant.

An eigenvector is not unique, since you can multiply it by any constant and still have a valid eigenvector. A factor of -1 does not even impact the norm. So flipping the sign on it changes nothing, just that factor of -1.

A relative difference of 1e-14 is also irrelevant. Just floating point trash, because the computations were done in a different sequence. NEVER trust the least significant bits of a floating point number.

So, no, that is not the "interesting" part. In fact, nothing about what you have said is even remotely surprising.

3 Comments
Show 1 older commentHide 1 older comment

John D'Errico on 29 Jan 2018

Edited: John D'Errico on 29 Jan 2018

If you call eig twice, you will get the same sign. Eig and SVD are I recall deterministic (as opposed to tools likes svds or eigs). Call them twice with the EXACTLY same input, and you should get the same output.

But two different PCA codes may not get the same signs each time, because there are multiple subtly different ways of doing the computations. This can easily result in opposite signs between methods.

For example, you can do a PCA using eig OR svd. I will be quite confident that some of the signs will be arbitrarily flipped there between eig versus svd, and consistently so. So repeat the call, and they will not change.

George Pavlidis on 29 Jan 2018

Thanks. I figured out the same thing but I was wondering if there is any insight on what's happening behind the scenes...

Sign in to comment.

PCA and data projection issue

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

3 Comments
Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Community Treasure Hunt

PCA and data projection issue

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

3 Comments Show 1 older commentHide 1 older comment

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

3 Comments
Show 1 older commentHide 1 older comment