Need help determining the order of the eigenvectors output from eig

I'm performing PCA on a data set with three variables to reduce correlation but I also need to analyze the eigenvectors output from PCA and compare them. Thus, I need to know which eigenvectors correspond to which of the three variables.
The eig function seems to organize the eigenvalues by decreasing or increasing order which organizes the eigenvectors to not match the order of the variables I input.
Is there anyway to undo the sorting or would I be worth it to make my own eig function?
Also, would it make sense to plot the 3 eigenvectors with the data then to assign the eigenvectors to the data variable which they align with the variation of the most?

 Accepted Answer

It appears as if you have 3 variables, in ONE eigenproblem. So not a parametric function, where my eigenshuffle code could be appropriate.
The problem is more in terms of your understanding of what eigenvectors and eigenvalues mean. An eigenvector does NOT correspond to any given variable. So writing your own version of eig would be a waste of time, if I read your question correctly.
Likewise, you cannot simply assign an eigenvector to one of the variables.
An eigenvector corresponds to a linear combination of your data variables. Sometimes one eigenvector might be composed mainly on one of the variables. Usually that is not true, nor is it something that you can control.
Finally, there is no way to "undo the sorting" of the eigenvectors, since they have no intrinsic sorted order.
As for what you can do, perhaps the best idea would be to look at some of the published PCA analyses. Learn what PCA does, and how to use it. For example, read through an analysis of the rather famous IRIS data using PCA. On your own data, you should look at the coefficients in the eigenvectors (the first one, with the largest eigenvalue) is most important. What does it tell you about the relationship between those variables? What kind of decrease do you see in the eigenvalues from largest to smallest? If you see little decrease, then PCA will offer very little help to analyze this data. In the worst case, all of your eigenvalues will be all nearly equal, in that case the eigenvectors will be virtually random rotations of your data.

2 Comments

Is anything wrong with having 3 variables, a 3 dimensional data set, in 1 eigenproblem? Where a 3x3 covariance matrix is then my eigenproblem, right?
If a eigenvector can't correspond to any given variable then I should stop here since that is my main assumption. I'm assuming that the axes of a data set, which are PCA corrected, are the eigenvectors since that's what I've read in other literature.
I'm confident I know what PCA does and how to do it since I can do it manually if I wanted. It's just I would like to compare eigenvectors so it's necessary to know which eigenvector belongs to which data dimension/feature/variable.
The eigenvalues significantly decrease in value from 300 to 6 then below 1. So, it shifts the subspace to one where covariance in the data is eliminated which is what I want.
Um, NO. There is nothing wrong with a 3 variable problem. But there is no presumption that an eigenvector should correspond to ANY of the data variables on their own. I'm not sure why you think this should be true. It is not at all the case.
The axes of the data set (the variables) are NOT the eigenvectors. If you read that in the literature, then I'm sorry, but you misread it.
The eigenvectors correspond to directions where your data shows variability.
If you are asking if the eigenvectors define new axes where the data lies, ignoring those directions with small variability, then yes. The eigenvectors do define a new set of "axes", in a lower dimensional space.
load irisdata
C = cov(irisdata);
[V,D] = eig(C)
V =
-0.31725 0.581 0.65654 0.36159
0.32409 -0.59642 0.72971 -0.082269
0.47972 -0.072524 -0.17577 0.85657
-0.75112 -0.54906 -0.074706 0.35884
D =
0.023683 0 0 0
0 0.078524 0 0
0 0 0.24224 0
0 0 0 4.2248
So the two largest eigenvalues do cover most of the variability. In fact, as is the case for your problem, the ONE largest eigenvalue represents 92% of the variability.
diag(D)/trace(D)*100
ans =
0.51831
1.7185
5.3016
92.462
The eigenvector corresponding to the most variability is
V(:,4)
ans =
0.36159
-0.082269
0.85657
0.35884
So this linear combination of the variables is where stuff is mainly happening in the iris data.
iristrans = bsxfun(@minus,irisdata,mean(irisdata));
plot(iristrans*V(:,4),iristrans*V(:,3),'o')
xlabel 'V4: 92.4%'
ylabel 'V3: 5.3%'
Are the axes of that plot at all related to the original variables? Well, only in the sense that they are created as linear combinations of the original variables. There is no sorting involved, and none was needed.
As you can see from the figure, there is indeed something happening. We can see what are essentially two distinct populations in the data. Since these are iris measurements, one might presume there are some genetic differences that would correspond to the sub-populations. In fact, were I to do further analysis on this data, I'd be looking to see if there something to that. For example,
k = iristrans*V(:,4) > -1;
unique(irisnames(find(k)))
ans =
'Iris-versicolor'
'Iris-virginica'
unique(irisnames(find(~k)))
ans =
'Iris-setosa'
So indeed, it appears that most significant component does allow us to identify two genetically different populations.
I would STRONGLY recommend that you sit down with a good text on PCA. The book by Ted (J.E.) Jackson, "A User's Guide to Principal Components" is a good one. That I worked with Ted many years ago might bias me, since I learned much about PCA from him. :)

Sign in to comment.

More Answers (1)

You might look at this FEX submission by John D'Errico:

2 Comments

Yeah I checked that out and it gave me the same order as the output from the pca function which I'm pretty sure is sorted which is confusing
There is NO specific order of a set of eigenvalues. All you can do is try to use a tool like eigenshuffle, which essentially looks at the sequence of eigenvalues and eigenvectors, to try to put them into a consistent sequence. Consistency from one step to the next is all you can ask for. Will eigenshuffle get it wrong sometimes? Definitely yes. It can be mistaken. How can that happen?
Suppose I give you two sequences, two sets of matrices based on some parameter t. If the sequence of matrices are all very similar, because the parameter t does not change significantly at each step, then eigenshuffle will be very good at what it is trying to do. But if the sequence is very coarse with relatively large steps in the parameter t, then each successive matrix will be very different form the previous one. And now eigenshuffle can get confused.

Sign in to comment.

Categories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!