Thread Subject: eigenvalues of the covarience matrix (princomp)

Subject: eigenvalues of the covarience matrix (princomp)

From: yakir gagnon

Date: 15 Nov, 2007 12:15:14

Message: 1 of 4

I'm sorry to aks this, but just to put things to rest and make it extra clear:

scaling is: X/var( X )
centering is: X-mean( X )
standardizing is: zscore( X )
in matlab doing this:
princomp( X ) is called the: covariance PCA
this: princomp( X/var( X ) ) is: correlation PCA
and princomp(zscore( X )) is a CORRECT PCA...

I have tried reading a number of books about PCA, but could anyone try and clearify what the difference is between these options, which case fits what.
Tons of thanks,
PCA rules.

Subject: eigenvalues of the covarience matrix (princomp)

From: Peter Perkins

Date: 15 Nov, 2007 14:18:20

Message: 2 of 4

yakir gagnon wrote:

> in matlab doing this:
> this: princomp( X/var( X ) ) is: correlation PCA

X/var(X) is not going to get you the right thing for a couple reasons.
You want either

X ./ repmat(std(X),n,1), or
bsxfun(@rdivide,X,std(X)), or even
X*diag(1./std(X))

> and princomp(zscore( X )) is a CORRECT PCA...

There is absolutely no point in doing this (as opposed to what you've
called "correlation PCA"), since PRINCOMP already centers the data.

Subject: eigenvalues of the covarience matrix (princomp)

From: yakir gagnon

Date: 15 Nov, 2007 16:24:19

Message: 3 of 4

Thanks a lot for answering!

> yakir gagnon wrote:
>
> > in matlab doing this:
> > this: princomp( X/var( X ) ) is: correlation PCA
>
> X/var(X) is not going to get you the right thing for
> a couple reasons.
> You want either
>
> X ./ repmat(std(X),n,1), or
> bsxfun(@rdivide,X,std(X)), or even
> X*diag(1./std(X))

Yes, right. I understand (my mistake).

>
> > and princomp(zscore( X )) is a CORRECT PCA...
>
> There is absolutely no point in doing this

why? doing princomp(X) or princomp(zscore(X)) yields two different answers. and zscore(X) = zscore(zscore(X))

> (as opposed to what you've
> called "correlation PCA"), since PRINCOMP already
> centers the data.

here you say 'centre the data' which makes me confused since I thought you were talking about the zscoring (in which case I thought it was called standardizing), but I might be wrong.

so why would I choose to do a so called "correlation PCA"? what is it good for?

Subject: eigenvalues of the covarience matrix (princomp)

From: Peter Perkins

Date: 15 Nov, 2007 17:55:59

Message: 4 of 4

yakir gagnon wrote:
>>> and princomp(zscore( X )) is a CORRECT PCA...
>> There is absolutely no point in doing this
>
> why? doing princomp(X) or princomp(zscore(X)) yields two different answers. and zscore(X) = zscore(zscore(X))

Yes, princomp(X) and princomp(zscore(X)) do give different results. All
I meant was that princomp(X./repmat(std(X,1),size(X,1),1)) and
princomp(zscore(X)) will give the same results, because princomp already
centers the data to have zero mean, and so the centering step in zscore
is redundant. On the other hand, since it's easier to type zscore(X)
than X./repmat(std(X,1),size(X,1),1), choosing the former does no harm.


>> (as opposed to what you've
>> called "correlation PCA"), since PRINCOMP already
>> centers the data.
>
> here you say 'centre the data' which makes me confused since I thought you were talking about the zscoring (in which case I thought it was called standardizing), but I might be wrong.

ZSCORE centers each column to have zero mean, and normalizes each column
to have unit variance. "Standardized" is kind of an ambiguous term; the
best description of what ZSCORE does is "type zscore".

PRINCOMP always centers the data to have zero mean before doing
anything. There's limited use in doing PCA on non-centered data,
because the first component will typically describe the mean of the
data, and that's not what most people want out of PCA (some would argue
with that).


> so why would I choose to do a so called "correlation PCA"? what is it good for?

There are a lot of differing opinions on this. My own opinion is that
doing PCA on unstandardized variables implies that you think that the
scales on which the different variables are measured are somehow
"natural" and "comparable", in the sense that variation of some absolute
magnitude in one variable is no more or less important than the same
amount of absolute variation in another variable. Doing PCA on
standardized variables (scaling each column by the inverse of its sample
std dev) implies that you think that the scales of the different
variables are an artifact of the units in which you measured them, and
that you need to rescale in order to make the variation in the different
variables "comparable". The classic example is doing PCA on things like
body measurements. Should your PCA results differ if you choose to
measure weight in grams vs. stones? Probably it shouldn't.

Whether or not you center the data before doing PCA affects these
arguments too.

I would not describe either as "correct", but would apply a method as
appropriate to circumstances. Again some would argue with that.

Hope this helps.

- Peter Perkins
   The MathWorks, Inc.

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread
 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com