Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Hadi&Ling1998 Principal Components Analysis pitfalls?

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Paul

Date: 24 Feb, 2010 20:26:29

Message: 1 of 27

First, let me qualify my math background -- I am a mere engineer, so I
work with cartoon mental pictures way better than algebraic symbols.

I recently had reason to look at PCA (a topic to which I am new) for a
complex data set (lots of variables, relationship between them not
certain, and data yet to be generated). After concentrated reading on
PCA background, I fired up Matlab and ran through their help (and made
sense of Biplots).

Unfortunately, I ran across a fly in the ointment:
Hadi & Ling, "Some Cautionary Notes on Use of Principal Components
Regression", The American Statistician, Feb. 1998, vol. 52, no. 1.
http://www.jstor.org/stable/2685559

The problem (as I understood it, anyway) applies when you have a set
of p input variables and an output variable Y. PCA is applied to the
input variables to identify p input components . It turns out that
the m most principal components may have very little relation to Y.
Instead, it may be the p-m least principal components that best
explain the variation of Y.

This makes sense to me because Y did not even enter into the PCA. I
understand the difference between correlation and causality, and that
the notion of input versus output variables doesn't play into PCA. If
the analyst had dispensed with the distinction between input and
output variables and treated all variables alike, we would have a 2nd
PCA on p+1 variables (including Y). Y would obviously contribute
heavily toward the (p+1)-m components that best explain its variation.

The 2nd PCA seems to me to be the correct usage of PCA. The 1st PCA
excludes Y, and it doesn't seem to make sense to expect the most
principal components to have the strongest effect on Y.

Is this correct?

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Peter Perkins

Date: 24 Feb, 2010 22:14:07

Message: 2 of 27

On 2/24/2010 3:26 PM, Paul wrote:

> This makes sense to me because Y did not even enter into the PCA. I
> understand the difference between correlation and causality, and that
> the notion of input versus output variables doesn't play into PCA. If
> the analyst had dispensed with the distinction between input and
> output variables and treated all variables alike, we would have a 2nd
> PCA on p+1 variables (including Y). Y would obviously contribute
> heavily toward the (p+1)-m components that best explain its variation.
>
> The 2nd PCA seems to me to be the correct usage of PCA. The 1st PCA
> excludes Y, and it doesn't seem to make sense to expect the most
> principal components to have the strongest effect on Y.
>
> Is this correct?

I think first you should distinguish between PCA, and PCA regression. PCA is not intended to have a notion of input (predictor) and output (response) variables. Your observations about the shortcomings of PCA regression are correct, but are really not about PCA per se. But that's partially just terminology.

I'm not sure if including Y in a PCA as a way to regress y on X would work, because you'd end up explaining variation among the predictor variables as well as the response, and in a regression, you usually don't primarily care about describing the variation in X (looking at X can help with fit diagnostics and experimental design, but not so much with the actual estimate of the regression coefs).

If you have access to the Statistics Toolbox, you might take a look at this demo:

<http://www.mathworks.com/products/statistics/demos.html?file=/products/demos/shipping/stats/plspcrdemo.html>


> I recently had reason to look at PCA (a topic to which I am new) for a
> complex data set (lots of variables, relationship between them not
> certain, and data yet to be generated). After concentrated reading on
> PCA background, I fired up Matlab and ran through their help (and made
> sense of Biplots).

I'd be curious to know if that means, "once I read the help for BIPLOT, biplots made sense", or if it means, "I read the help for BIPLOT, and then had to struggle to make sense of biplots." And if the latter, do you have any suggestions for improving that documentation?

Hope this helps.

- Peter Perkins
   The MathWorks, Inc.

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Rich Ulrich

Date: 25 Feb, 2010 04:28:38

Message: 3 of 27

On Wed, 24 Feb 2010 12:26:29 -0800 (PST), Paul
<paul.domaskis@gmail.com> wrote:

>First, let me qualify my math background -- I am a mere engineer, so I
>work with cartoon mental pictures way better than algebraic symbols.
>
>I recently had reason to look at PCA (a topic to which I am new) for a
>complex data set (lots of variables, relationship between them not
>certain, and data yet to be generated). After concentrated reading on
>PCA background, I fired up Matlab and ran through their help (and made
>sense of Biplots).
>
>Unfortunately, I ran across a fly in the ointment:
>Hadi & Ling, "Some Cautionary Notes on Use of Principal Components
>Regression", The American Statistician, Feb. 1998, vol. 52, no. 1.
>http://www.jstor.org/stable/2685559
>
>The problem (as I understood it, anyway) applies when you have a set
>of p input variables and an output variable Y. PCA is applied to the
>input variables to identify p input components . It turns out that
>the m most principal components may have very little relation to Y.
>Instead, it may be the p-m least principal components that best
>explain the variation of Y.

If that is going to happen, then it might be that PCA was a
poor choice of prelliminary analysis for these seemingly
unselected data.

The convention in the social sciences is that PCA (or PFA)
is performed on a selected "universe" of items or tests. The
universe, by presumption, can be simplified and represented by
fewer scores.

The guiding assumption for using a reduced subset of scores is
that there are "latent scores" that represent the variance of
all the scores; or that important variation will be captured
by composite scores. IF you don't have that assumption,
then you *might* still want to use the decomposition (for
various reasons, including computational simplicity), but it
may be important to keep *all* the components in the later
analyses.

>
>This makes sense to me because Y did not even enter into the PCA. I
>understand the difference between correlation and causality, and that
>the notion of input versus output variables doesn't play into PCA. If
>the analyst had dispensed with the distinction between input and
>output variables and treated all variables alike, we would have a 2nd
>PCA on p+1 variables (including Y). Y would obviously contribute
>heavily toward the (p+1)-m components that best explain its variation.
>
>The 2nd PCA seems to me to be the correct usage of PCA. The 1st PCA
>excludes Y, and it doesn't seem to make sense to expect the most
>principal components to have the strongest effect on Y.
>
>Is this correct?

No, it is not a good idea, or the "correct usage of PCA."
Yes, you find some variables that are correlated with Y.
Why not just look at the univariate correlations with Y?
The latter is easier to understand, gets the same results
(approximately). It has the same problem of not giving
you an equation, and of using arbitrary "selection" of
variables; and just pointing to the correlations is simpler.

--
Rich Ulrich

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Greg Heath

Date: 25 Feb, 2010 06:40:35

Message: 4 of 27

On Feb 24, 3:26 pm, Paul <paul.domas...@gmail.com> wrote:
> First, let me qualify my math background -- I am a mere engineer, so I
> work with cartoon mental pictures way better than algebraic symbols.

I taught engineering, so I know that is not an unconditionally
logical
conclusion.

> I recently had reason to look at PCA (a topic to which I am new) for a
> complex data set (lots of variables, relationship between them not
> certain, and data yet to be generated). After concentrated reading on
> PCA background, I fired up Matlab and ran through their help (and made
> sense of Biplots).
>
> Unfortunately, I ran across a fly in the ointment:
> Hadi & Ling, "Some Cautionary Notes on Use of Principal Components
> Regression", The American Statistician, Feb. 1998, vol. 52, no. 1.http://www.jstor.org/stable/2685559
>
> The problem (as I understood it, anyway) applies when you have a set
> of p input variables and an output variable Y. PCA is applied to the
> input variables to identify p input components . It turns out that
> the m most principal components may have very little relation to Y.
> Instead, it may be the p-m least principal components that best
> explain the variation of Y.

Cartoon mental picture:

Coyote cutting a road-runner sandwhich with a spoon ...

... the wrong tool for the job.

PCA can be particularly innappropriate for classification.
For example,

http://groups.google.com/group/comp.soft-sys.matlab/msg/6fa51506b7aab414?hl=en

(beware of URL wrap-around)

A more appropriate approach is

1. PLS (Partial Least Squares).
I believe MATLAB now offers that in a PLS Toolbox.

Other less rigororous alternatives:

2. Analysis of the correlation coefficient matrix
corrcoef([X;y]) and it's eigenstructure.

3. Use of the function STEPWISEFIT or it's GUI version,
STEPWISE.

I tend to use 2 and 3 together (partly because I'm
too poor to buy the PLS Toolbox),

> This makes sense to me because Y did not even enter into the PCA. I
> understand the difference between correlation and causality, and that
> the notion of input versus output variables doesn't play into PCA. If
> the analyst had dispensed with the distinction between input and
> output variables and treated all variables alike, we would have a 2nd
> PCA on p+1 variables (including Y). Y would obviously contribute
> heavily toward the (p+1)-m components that best explain its variation.
>
> The 2nd PCA seems to me to be the correct usage of PCA. The 1st PCA
> excludes Y, and it doesn't seem to make sense to expect the most
> principal components to have the strongest effect on Y.
>
> Is this correct?

Partially. See above.

Hope this helps.

Greg

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Paige Miller

Date: 25 Feb, 2010 14:31:43

Message: 5 of 27

On Feb 25, 1:40 am, Greg Heath <he...@alumni.brown.edu> wrote:
> On Feb 24, 3:26 pm, Paul <paul.domas...@gmail.com> wrote:
>
> > First, let me qualify my math background -- I am a mere engineer, so I
> > work with cartoon mental pictures way better than algebraic symbols.
>
> I taught engineering, so I know that is not an unconditionally
> logical
> conclusion.
>
>
>
> > I recently had reason to look at PCA (a topic to which I am new) for a
> > complex data set (lots of variables, relationship between them not
> > certain, and data yet to be generated).  After concentrated reading on
> > PCA background, I fired up Matlab and ran through their help (and made
> > sense of Biplots).
>
> > Unfortunately, I ran across a fly in the ointment:
> > Hadi & Ling, "Some Cautionary Notes on Use of Principal Components
> > Regression", The American Statistician, Feb. 1998, vol. 52, no. 1.http://www.jstor.org/stable/2685559
>
> > The problem (as I understood it, anyway) applies when you have a set
> > of p input variables and an output variable Y.  PCA is applied to the
> > input variables to identify p input components .  It turns out that
> > the m most principal components may have very little relation to Y.
> > Instead, it may be the p-m least principal components that best
> > explain the variation of Y.
>
> Cartoon mental picture:
>
> Coyote cutting a road-runner sandwhich with a spoon ...
>
> ... the wrong tool for the job.
>
> PCA can be particularly innappropriate for classification.
> For example,
>
> http://groups.google.com/group/comp.soft-sys.matlab/msg/6fa51506b7aab...
>
> (beware of URL wrap-around)
>
> A more appropriate approach is
>
> 1. PLS (Partial Least Squares).
> I believe MATLAB now offers that in a PLS Toolbox.
>
> Other less rigororous alternatives:
>
> 2. Analysis of the correlation coefficient matrix
> corrcoef([X;y]) and it's eigenstructure.
>
> 3. Use of the function STEPWISEFIT or it's GUI version,
> STEPWISE.
>
> I tend to use 2 and 3 together (partly because I'm
> too poor to buy the PLS Toolbox),

I'll vote strongly in favor of using PLS in this situation as well
(and by the way, if you can't afford the PLS toolbox in MATLAB, the
actual PLS algorithm isn't that hard to program in MATLAB).

The reason that PLS is appropriate here (and not PCA) is that PLS
specifically looks for components in the X data that are highly
correlated with Y. (technical detail: actually, it finds components of
X that have the highest squared covariance with Y). These PLS
components may or may not be the same components that PCA finds, even
if you keep all the PCA components -- in other words, PLS may find
components of X that are not perpendicular to the PCA components, the
PLS components could be at a 45 degree angle to the PCA components.

PCA on a combined matrix of X and y also differs from PLS. PLS treats
the Y explicitly in an objective function that that algorithm tries to
maximize ... as I said, it tries to find components of X that maximize
squared covariance of X components to Y. PCA on a combined matrix of X
and Y only treats Y as just another variable, so if you have ten X and
one Y and each has been scaled to have a variance of 1, then the PCA
analysis of X and Y combined treats Y as 1/11th of the total variance
it is trying to work with — and this isn't an analysis I would want in
this situation.

--
Paige Miller
paige\dot\miller \at\ kodak\dot\com

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Johann Hibschman

Date: 25 Feb, 2010 20:26:06

Message: 6 of 27

Paige Miller <paige.miller@kodak.com> writes:

> I'll vote strongly in favor of using PLS in this situation as well
> (and by the way, if you can't afford the PLS toolbox in MATLAB, the
> actual PLS algorithm isn't that hard to program in MATLAB).

PLS does seem more likely to be useful here, but there are other
approaches like ridge regression, lasso, etc., that may help.

My usual reference for PCA, PLS, etc., is Hastie, Tibshirani, and
Friedman's _The Elements of Statistical Learning_, which is conveniently
available online at:

  http://www-stat.stanford.edu/~hastie/Papers/ESLII.pdf

Chapter 3 goes into a lot of this, and Section 3.6 explicitly tries to
compare several methods.

-Johann

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Paige Miller

Date: 25 Feb, 2010 21:39:40

Message: 7 of 27

On Feb 25, 3:26 pm, Johann Hibschman <jhibschman+use...@gmail.com>
wrote:

> PLS does seem more likely to be useful here, but there are other
> approaches like ridge regression, lasso, etc., that may help.
>
> My usual reference for PCA, PLS, etc., is Hastie, Tibshirani, and
> Friedman's _The Elements of Statistical Learning_, which is conveniently
> available online at:
>
>  http://www-stat.stanford.edu/~hastie/Papers/ESLII.pdf
>
> Chapter 3 goes into a lot of this, and Section 3.6 explicitly tries to
> compare several methods.

Hard to image what section 3.6 really means. In the one case shown,
where regression coefficients are 4 and 2 and correlation is either
0.5 or -0.5, I guess section 3.6 charts indicates which methods are
better. How this applies to a more general situation is unclear.

The paper cited there (Frank and Friedman), called a "full study",
does not cover situations where PLS is normally used (correlations
between all X variables), and so its results have always been suspect
in situations where PLS is normally used.

--
Paige Miller
paige\dot\miller \at\ kodak\dot\com

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: thebluecliffrecord

Date: 26 Feb, 2010 15:25:31

Message: 8 of 27

It is unnecessary to repeat the cliché: Which technique is best/
appropriate/inappropriate depend on the purpose of the research.

PCA is to account the maximum variance of X. Thus, PCR is to explore
whether the maximum variance in X is associated with Y or not. I
found that PCR is very useful in engineering because X (i.e., fuel
economy) and Y (i.e., customer satisfaction) are usually well
associated. However, there are many cases that PC of X is not well
associated with Y.

PCA is to account the maximum variance of "X". OLS is to account the
maximum variance of "Y". PLS is in the middle of PCA and OLS: PLS is
to account the maximum covariance of "X and Y". The three methods are
combined into “continuum regression.”

I read the article mentioned above few years ago and I think that the
paper compared the apple with the orange: Which technique is best/
appropriate/inappropriate depends on the purpose of the research.

Regards,

Sangdon Lee,
GM Tech. Center,

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Greg Heath

Date: 27 Feb, 2010 06:10:17

Message: 9 of 27

On Feb 25, 9:31 am, Paige Miller <paige.mil...@kodak.com> wrote:
------SMIP>
> > A more appropriate approach is
>
> > 1. PLS (Partial Least Squares).
> > I believe MATLAB now offers that in a PLS Toolbox.
>
> > Other less rigororous alternatives:
>
> > 2. Analysis of the correlation coefficient matrix
> > corrcoef([X;y]) and it's eigenstructure.
>
> > 3. Use of the function STEPWISEFIT or it's GUI version,
> > STEPWISE.
>
> > I tend to use 2 and 3 together (partly because I'm
> > too poor to buy the PLS Toolbox),
>
> I'll vote strongly in favor of using PLS in this situation as well
> (and by the way, if you can't afford the PLS toolbox in MATLAB, the
> actual PLS algorithm isn't that hard to program in MATLAB).

Provided you have a good explanation to work from.
I coded PLS in MATLAB 5 or more years ago. It was
tough going because the 2 references I was using were
neither helpful nor complementary.

Unfortunately I lost all of my self written codes when
my hard-drive crashed. Disgusted, I decided to
rewrite the lost codes when and only when I had
to use them.

So far I have convinced myself that using various
options in STEPWISEFIT and the Neutal Network TB
is sufficient.

I took a quick look at Algorithm 3.3 in Hastie et al.
It doesn't look familiar. So, I need to dig up my old
references and decide whether or not I want to
tackle it again.

Greg

P.S. Can you recommend a better explanation
on the order of "PLS for Dummies"?

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Peter Perkins

Date: 1 Mar, 2010 13:55:19

Message: 10 of 27

On 2/27/2010 1:10 AM, Greg Heath wrote:

> Provided you have a good explanation to work from.
> I coded PLS in MATLAB 5 or more years ago. It was
> tough going because the 2 references I was using were
> neither helpful nor complementary.

I share your pain. Much like Factor Analysis, there are multiple versions, and multiple algorithms, and most references do not make any comparisons. And few talk about PLS from anything other than an algorithm point of view, making it even harder to follow what the differences might be.

> I took a quick look at Algorithm 3.3 in Hastie et al.
> It doesn't look familiar. So, I need to dig up my old
> references and decide whether or not I want to
> tackle it again.

Just for the record, the Statistics Toolbox includes the PLSREGRESS function.

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Paige Miller

Date: 1 Mar, 2010 14:21:58

Message: 11 of 27

On Feb 27, 1:10 am, Greg Heath <he...@alumni.brown.edu> wrote:

> Provided you have a good explanation to work from.
> I coded PLS in MATLAB 5 or more years ago. It was
> tough going because the 2 references I was using were
> neither helpful nor complementary.
>
> Unfortunately I lost all of my self written codes when
> my hard-drive crashed. Disgusted, I decided to
> rewrite the lost codes when and only when I had
> to use them.
>
> So far I have convinced myself that using various
> options in STEPWISEFIT and the Neutal Network TB
> is sufficient.
>
> I took a quick look at Algorithm 3.3 in Hastie et al.
> It doesn't look familiar. So, I need to dig up my old
> references and decide whether or not I want to
> tackle it again.
>
> Greg
>
> P.S. Can you recommend a better explanation
> on the order of "PLS for Dummies"?

The algorithm is described quite clearly in an appendix to Kresta,
MacGregor and Marlin (1991), The Canadian Journal of Chemical
Engineering, Volume 69, Issue 1 (p 35-47)

MacGregor and other co-authors have repeated this algorithm in many
later publications as well.

--
Paige Miller
paige\dot\miller \at\ kodak\dot\com

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: thebluecliffrecord

Date: 1 Mar, 2010 15:45:12

Message: 12 of 27

On Feb 27, 1:10 am, Greg Heath <he...@alumni.brown.edu> wrote:
> On Feb 25, 9:31 am, Paige Miller <paige.mil...@kodak.com> wrote:
>
> Provided you have a good explanation to work from.
> I coded PLS in MATLAB 5 or more years ago. It was
> tough going because the 2 references I was using were
> neither helpful nor complementary.
>

I just want to point out a publically available MATLAB file (free) for
PLS and others (PCA, PARAFAC, etc) developed by my colleague, Rasmus
Bro and others. I used it many times.

C. A. Andersson and R. Bro
The N-way Toolbox for MATLAB
Chemometrics & Intelligent Laboratory Systems. 52 (1):1-4, 2000.
http://www.models.life.ku.dk/source/nwaytoolbox/

Click the following wet address.
http://www.models.life.ku.dk/intranet/dl-monitor/go2engine.asp?get=mf.m&email=sangdonlee@gmail.com&realname=Sangdon+Lee&purpose=Industry&country=us

Hope this helps.

Sangdon Lee

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: thebluecliffrecord

Date: 1 Mar, 2010 15:47:59

Message: 13 of 27

Dear Greg,

I just want to point out a publically available MATLAB file (free) for
PLS and others (PCA, PARAFAC, etc) developed by my colleague, Rasmus
Bro and others. I used it many times.

C. A. Andersson and R. Bro
The N-way Toolbox for MATLAB
Chemometrics & Intelligent Laboratory Systems. 52 (1):1-4, 2000.
http://www.models.life.ku.dk/source/nwaytoolbox/

Hope this helps.

Sangdon Lee

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Ray Koopman

Date: 2 Mar, 2010 01:31:56

Message: 14 of 27

On Mar 1, 6:21 am, Paige Miller <paige.mil...@kodak.com> wrote:
> On Feb 27, 1:10 am, Greg Heath <he...@alumni.brown.edu> wrote:
>
>> Provided you have a good explanation to work from.
>> I coded PLS in MATLAB 5 or more years ago. It was
>> tough going because the 2 references I was using were
>> neither helpful nor complementary.
>>
>> Unfortunately I lost all of my self written codes when
>> my hard-drive crashed. Disgusted, I decided to
>> rewrite the lost codes when and only when I had
>> to use them.
>>
>> So far I have convinced myself that using various
>> options in STEPWISEFIT and the Neutal Network TB
>> is sufficient.
>>
>> I took a quick look at Algorithm 3.3 in Hastie et al.
>> It doesn't look familiar. So, I need to dig up my old
>> references and decide whether or not I want to
>> tackle it again.
>>
>> Greg
>>
>> P.S. Can you recommend a better explanation
>> on the order of "PLS for Dummies"?
>
> The algorithm is described quite clearly in an appendix to Kresta,
> MacGregor and Marlin (1991), The Canadian Journal of Chemical
> Engineering, Volume 69, Issue 1 (p 35-47)
>
> MacGregor and other co-authors have repeated this algorithm in
> many later publications as well.
>
> --
> Paige Miller
> paige\dot\miller \at\ kodak\dot\com

Hastie et al's Algorithm 3.3 is pretty much what I've understood
PLS regression to be, the only difference being the way they define
phihat_{mj} in step 2a. My understanding allows other coefficients
that relate y to x_j^{m-1}, the important constraint being that
they must be bivariate, not multivariate, coefficients.

For those of us who don't have access to the Kresta et al paper,
could you summarize the differences between their algorithm and
Hastie et al's?

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Paige Miller

Date: 2 Mar, 2010 13:40:04

Message: 15 of 27

On Mar 1, 8:31 pm, Ray Koopman <koop...@sfu.ca> wrote:

> Hastie et al's Algorithm 3.3 is pretty much what I've understood
> PLS regression to be, the only difference being the way they define
> phihat_{mj} in step 2a. My understanding allows other coefficients
> that relate y to x_j^{m-1}, the important constraint being that
> they must be bivariate, not multivariate, coefficients.
>
> For those of us who don't have access to the Kresta et al paper,
> could you summarize the differences between their algorithm and
> Hastie et al's?

Friedman and co-authors have always had a somewhat idiosyncratic view
of PLS, hence the criticism of the Frank and Friedman paper. I suppose
algorithm 3.3 can also be called idiosyncratic, in the sense that it
uses different letters/symbols for the elements of the PLS algorithm.
I have read many papers on PLS, and the algorithm always looks very
similar to Kresta et al, with consistent use of symbols/letters to
represent the different elements of the PLS algorithm; only Hastie et
al uses different symbols.

So to answer the question ... I have no idea what the differences are
between Hastie et al and all the other PLS algorithms I have seen.

Another perhaps more accessible version of the Kresta et al algorithm
is published in P Nomikos, JF MacGregor, Technometrics, 1995.

--
Paige Miller
paige\dot\miller \at\ kodak\dot\com

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: root

Date: 2 Mar, 2010 15:30:45

Message: 16 of 27

>
> Friedman and co-authors have always had a somewhat idiosyncratic view
> of PLS, hence the criticism of the Frank and Friedman paper. I suppose
> algorithm 3.3 can also be called idiosyncratic, in the sense that it
> uses different letters/symbols for the elements of the PLS algorithm.
> I have read many papers on PLS, and the algorithm always looks very
> similar to Kresta et al, with consistent use of symbols/letters to
> represent the different elements of the PLS algorithm; only Hastie et
> al uses different symbols.
>
> So to answer the question ... I have no idea what the differences are
> between Hastie et al and all the other PLS algorithms I have seen.
>
> Another perhaps more accessible version of the Kresta et al algorithm
> is published in P Nomikos, JF MacGregor, Technometrics, 1995.
>
> --
> Paige Miller
> paige\dot\miller \at\ kodak\dot\com

I have not been familiar with PLS, so I turned to Wikipedia
for a description. There, the general case is described
as a regression of a matrix Y against a matrix X. That
description sounded a lot like what I know as canonical
correlation. Is there some difference that I am missing.

Wikipedia includes a specific example, and an algorithm,
for regressing a vector Y against a matrix X. I haven't
programmed the algorithm, but it looks to me as if
at each stage it includes the element of X which is most
correlated with the current residual, is that correct?
If my understanding of the Wiki article is correct,
then it follows that the first element of X included
in the PLS will remain in all subsequent stages; it
is easily demonstrated that such a method cannot
guarantee optimality at any step after the first.

Any clarification would be appreciated.
TIA

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Paige Miller

Date: 2 Mar, 2010 17:16:02

Message: 17 of 27

On Mar 2, 10:30 am, root <NoEM...@home.org> wrote:

> I have not been familiar with PLS, so I turned to Wikipedia
> for a description. There, the general case is described
> as a regression of a matrix Y against a matrix X. That
> description sounded a lot like what I know as canonical
> correlation. Is there some difference that I am missing.
>
> Wikipedia includes a specific example, and an algorithm,
> for regressing a vector Y against a matrix X. I haven't
> programmed the algorithm, but it looks to me as if
> at each stage it includes the element of X which is most
> correlated with the current residual, is that correct?
> If my understanding of the Wiki article is correct,
> then it follows that the first element of X included
> in the PLS will remain in all subsequent stages; it
> is easily demonstrated that such a method cannot
> guarantee optimality at any step after the first.

I have no interest in reading or defending the presentation of PLS in
Wikipedia, so I also don't know what it says or doesn't say.

To answer your questions directly, without reference to Wikipedia:

Canonical correlations looks at correlations between (linear
combinations of the columns of) matrix X and (linear combinations of
the columns of) matrix Y. Similarly, in the univariate case, you can
compute correlation between vector X and vector Y. This contrasts with
regression, where the objective function attempts to minimize the
prediction errors. Thus, univariate regression finds a function of X
that minimizes (sum of squared) prediction errors in Y. In
multivariate regression, PLS specfically, the algorithm finds linear
combinations of X that are "highly predictive" of matrix Y. So to sum
up, with correlations and canonical correlations, minimizing a
function of Y isn't a goal; with regression and PLS, minimizing a
function of Y is a goal.

In the PLS algorithm, both X and Y are "deflated" before the next
dimension is computed. In other words, the second dimension works with
the residuals from the X matrix and residuals from the Y matrix, both
after the first dimension is fit. Third dimension works on the
residuals after 2 dimensions. And so on.

--
Paige Miller
paige\dot\miller \at\ kodak\dot\com

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Ray Koopman

Date: 2 Mar, 2010 17:22:05

Message: 18 of 27

On Mar 2, 5:40 am, Paige Miller <paige.mil...@kodak.com> wrote:
> On Mar 1, 8:31 pm, Ray Koopman <koop...@sfu.ca> wrote:
>
>> Hastie et al's Algorithm 3.3 is pretty much what I've understood
>> PLS regression to be, the only difference being the way they define
>> phihat_{mj} in step 2a. My understanding allows other coefficients
>> that relate y to x_j^{m-1}, the important constraint being that
>> they must be bivariate, not multivariate, coefficients.
>>
>> For those of us who don't have access to the Kresta et al paper,
>> could you summarize the differences between their algorithm and
>> Hastie et al's?
>
> Friedman and co-authors have always had a somewhat idiosyncratic view
> of PLS, hence the criticism of the Frank and Friedman paper. I suppose
> algorithm 3.3 can also be called idiosyncratic, in the sense that it
> uses different letters/symbols for the elements of the PLS algorithm.
> I have read many papers on PLS, and the algorithm always looks very
> similar to Kresta et al, with consistent use of symbols/letters to
> represent the different elements of the PLS algorithm; only Hastie et
> al uses different symbols.
>
> So to answer the question ... I have no idea what the differences are
> between Hastie et al and all the other PLS algorithms I have seen.
>
> Another perhaps more accessible version of the Kresta et al algorithm
> is published in P Nomikos, JF MacGregor, Technometrics, 1995.

I now have a copy of Kresta et al, and I agree with 'root': it
describes the 'power method' for doing a Canonical Correlation
Analysis. (For a description of CCA by another method, see
http://groups.google.ca/group/sci.stat.math/msg/0346065ce9ad74cc .)

So the question is how the PLS label got attached to such different
procedures.

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Paige Miller

Date: 2 Mar, 2010 17:53:07

Message: 19 of 27

On Mar 2, 12:22 pm, Ray Koopman <koop...@sfu.ca> wrote:

> I now have a copy of Kresta et al, and I agree with 'root': it
> describes the 'power method' for doing a Canonical Correlation
> Analysis. (For a description of CCA by another method, seehttp://groups.google.ca/group/sci.stat.math/msg/0346065ce9ad74cc.)
>
> So the question is how the PLS label got attached to such different
> procedures.

Here is a version of MATLAB code that performs PLS
http://www.cdpcenter.org/files/plsr/nipals.html

As far as "power method" for doing a Canonical Correlation, this paper
establishes the equivalence of Canonical Correlation and a variation
of PLS known as Orthonormalized PLS: http://ijcai.org/papers09/Papers/IJCAI09-207.pdf.
This paper also explains there is a distinct difference between
Canonical Correlation and general Partial Least Squares: "While CCA
maximizes the correlation of data in the dimensionality-reduced space,
partial least squares (PLS) maximizes their covariance." (And just to
get overly technical, I think that sentence should read PLS maximizes
their squared covariance, so a covariance of -5 is treated the same as
a covariance of 5)

--
Paige Miller
paige\dot\miller at kodak\dot\com

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Ray Koopman

Date: 3 Mar, 2010 06:50:09

Message: 20 of 27

On Mar 2, 9:53 am, Paige Miller <paige.mil...@kodak.com> wrote:
> On Mar 2, 12:22 pm, Ray Koopman <koop...@sfu.ca> wrote:
>
>> I now have a copy of Kresta et al, and I agree with 'root': it
>> describes the 'power method' for doing a Canonical Correlation
>> Analysis. (For a description of CCA by another method, see
>> http://groups.google.ca/group/sci.stat.math/msg/0346065ce9ad74cc.)
>>
>> So the question is how the PLS label got attached to such different
>> procedures.

And the answer is ... the two procedures are not that different.
Hastie et al's Algorithm 3.3 is what the algorithm in Kresta et al's
Appendix simplifies to when there is only one Y-variable.

>
> Here is a version of MATLAB code that performs PLS
> http://www.cdpcenter.org/files/plsr/nipals.html
>
> As far as "power method" for doing a Canonical Correlation,
> this paper establishes the equivalence of Canonical Correlation
> and a variation of PLS known as Orthonormalized PLS:
> http://ijcai.org/papers09/Papers/IJCAI09-207.pdf.
> This paper also explains there is a distinct difference between
> Canonical Correlation and general Partial Least Squares: "While CCA
> maximizes the correlation of data in the dimensionality-reduced space,
> partial least squares (PLS) maximizes their covariance." (And just to
> get overly technical, I think that sentence should read PLS maximizes
> their squared covariance, so a covariance of -5 is treated the same as
> a covariance of 5)

I was wrong. Kresta et al's algorithm is not CCA by the power method.

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Greg Heath

Date: 3 Mar, 2010 13:56:54

Message: 21 of 27

On Feb 25, 1:40 am, Greg Heath <he...@alumni.brown.edu> wrote:
> On Feb 24, 3:26 pm, Paul <paul.domas...@gmail.com> wrote:
>
> > First, let me qualify my math background -- I am a mere engineer, so I
> > work with cartoon mental pictures way better than algebraic symbols.
>
> I taught engineering, so I know that is not an unconditionally
> logical conclusion.
>
> > I recently had reason to look at PCA (a topic to which I am new) for a
> > complex data set (lots of variables, relationship between them not
> > certain, and data yet to be generated). After concentrated reading on
> > PCA background, I fired up Matlab and ran through their help (and made
> > sense of Biplots).
>
> > Unfortunately, I ran across a fly in the ointment:
> > Hadi & Ling, "Some Cautionary Notes on Use of Principal Components
> > Regression", The American Statistician, Feb. 1998, vol. 52, no. 1.http://www.jstor.org/stable/2685559
>
> > The problem (as I understood it, anyway) applies when you have a set
> > of p input variables and an output variable Y. PCA is applied to the
> > input variables to identify p input components . It turns out that
> > the m most principal components may have very little relation to Y.
> > Instead, it may be the p-m least principal components that best
> > explain the variation of Y.
>
> Cartoon mental picture:
>
> Coyote cutting a road-runner sandwhich with a spoon ...
>
> ... the wrong tool for the job.
>
> PCA can be particularly innappropriate for classification.
> For example,
>
> http://groups.google.com/group/comp.soft-sys.matlab/msg/6fa51506b7aab...
>
> (beware of URL wrap-around)
>
> A more appropriate approach is
>
> 1. PLS (Partial Least Squares).
> I believe MATLAB now offers that in a PLS Toolbox.
>
> Other less rigororous alternatives:
>
> 2. Analysis of the correlation coefficient matrix
> corrcoef([X;y]) and it's eigenstructure.
>
> 3. Use of the function STEPWISEFIT or it's GUI version,
> STEPWISE.
>
> I tend to use 2 and 3 together (partly because I'm
> too poor to buy the PLS Toolbox),

Although this is suboptimal, let me clarify

The PCs are the orthonormal eigenvectors of corrcoef(X);
However, instead of ranking them by eigenvalue,
they are ranked by their importance in stagewise
regression obtained via the misnamed MATLAB function
STEPWISEFIT. (Misnamed because "stepwise" implies a
very suboptimal greedy forward or backward search;
whereas "stagewise" allows discarding a previously
entered variable whenever it's contribution is superceded
by it's correlation to a newly chosen variable
... it's a moot point here because the PCs are uncorrelated).

Hope this helps.

Greg

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Greg Heath

Date: 3 Mar, 2010 14:47:08

Message: 22 of 27

On Mar 2, 10:30 am, root <NoEM...@home.org> wrote:
> > Friedman and co-authors have always had a somewhat idiosyncratic view
> > of PLS, hence the criticism of the Frank and Friedman paper. I suppose
> > algorithm 3.3 can also be called idiosyncratic, in the sense that it
> > uses different letters/symbols for the elements of the PLS algorithm.
> > I have read many papers on PLS, and the algorithm always looks very
> > similar to Kresta et al, with consistent use of symbols/letters to
> > represent the different elements of the PLS algorithm; only Hastie et
> > al uses different symbols.
>
> > So to answer the question ... I have no idea what the differences are
> > between Hastie et al and all the other PLS algorithms I have seen.
>
> > Another perhaps more accessible version of the Kresta et al algorithm
> > is published in P Nomikos, JF MacGregor, Technometrics, 1995.
>
> > --
> > Paige Miller
> > paige\dot\miller \at\ kodak\dot\com
>
> I have not been familiar with PLS, so I turned to Wikipedia
> for a description. There, the general case is described
> as a regression of a matrix Y against a matrix X. That
> description sounded a lot like what I know as canonical
> correlation. Is there some difference that I am missing.
>
> Wikipedia includes a specific example, and an algorithm,
> for regressing a vector Y against a matrix X. I haven't
> programmed the algorithm, but it looks to me as if
> at each stage it includes the element of X which is most
> correlated with the current residual, is that correct?
> If my understanding of the Wiki article is correct,
> then it follows that the first element of X included
> in the PLS will remain in all subsequent stages; it
> is easily demonstrated that such a method cannot
> guarantee optimality at any step after the first.
>
> Any clarification would be appreciated.
> TIA

At the end of each stage both unchosen predictors
and responses should be regressed against the
chosen predictors so that the beginning of the next
stage only considers predictor and response residuals.

This mitigates the correlation backside bite in a manner
simlilar Graham-Schmidt orthogonalization.

Hope this helps.

Greg

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: root

Date: 3 Mar, 2010 15:10:15

Message: 23 of 27

Greg Heath <heath@alumni.brown.edu> wrote:
>
> At the end of each stage both unchosen predictors
> and responses should be regressed against the
> chosen predictors so that the beginning of the next
> stage only considers predictor and response residuals.
>
> This mitigates the correlation backside bite in a manner
> simlilar Graham-Schmidt orthogonalization.
>
> Hope this helps.
>
> Greg

I'm sorry, it really doesn't help. In Gram Schmidt, the
first vector remains forever, which I suspect is what
happens with the procedure described in the PLS Wiki.

At a risk of repeating myself, the first included
regression variable need not be present in a "best"
three factor regression, say.

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Greg Heath

Date: 3 Mar, 2010 16:25:28

Message: 24 of 27

On Mar 2, 12:53 pm, Paige Miller <paige.mil...@kodak.com> wrote:
> On Mar 2, 12:22 pm, Ray Koopman <koop...@sfu.ca> wrote:
>
> > I now have a copy of Kresta et al, and I agree with 'root': it
> > describes the 'power method' for doing a Canonical Correlation
> > Analysis. (For a description of CCA by another method, seehttp://groups.google.ca/group/sci.stat.math/msg/0346065ce9ad74cc.)
>
> > So the question is how the PLS label got attached to such different
> > procedures.
>
> Here is a version of MATLAB code that performs PLShttp://www.cdpcenter.org/files/plsr/nipals.html
>
> As far as "power method" for doing a Canonical Correlation, this paper
> establishes the equivalence of Canonical Correlation and a variation
> of PLS known as Orthonormalized PLS:http://ijcai.org/papers09/Papers/IJCAI09-207.pdf.
> This paper also explains there is a distinct difference between
> Canonical Correlation and general Partial Least Squares: "While CCA
> maximizes the correlation of data in the dimensionality-reduced space,
> partial least squares (PLS) maximizes their covariance." (And just to
> get overly technical, I think that sentence should read PLS maximizes
> their squared covariance, so a covariance of -5 is treated the same as
> a covariance of 5)

Are they the same when all original variables are
normalized to unit variance?

Hope this helps.

Greg

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Paige Miller

Date: 3 Mar, 2010 16:58:24

Message: 25 of 27

On Mar 3, 11:25 am, Greg Heath <he...@alumni.brown.edu> wrote:
> On Mar 2, 12:53 pm, Paige Miller <paige.mil...@kodak.com> wrote:
>
>
>
> > On Mar 2, 12:22 pm, Ray Koopman <koop...@sfu.ca> wrote:
>
> > > I now have a copy of Kresta et al, and I agree with 'root': it
> > > describes the 'power method' for doing a Canonical Correlation
> > > Analysis. (For a description of CCA by another method, seehttp://groups.google.ca/group/sci.stat.math/msg/0346065ce9ad74cc.)
>
> > > So the question is how the PLS label got attached to such different
> > > procedures.
>
> > Here is a version of MATLAB code that performs PLShttp://www.cdpcenter.org/files/plsr/nipals.html
>
> > As far as "power method" for doing a Canonical Correlation, this paper
> > establishes the equivalence of Canonical Correlation and a variation
> > of PLS known as Orthonormalized PLS:http://ijcai.org/papers09/Papers/IJCAI09-207.pdf.
> > This paper also explains there is a distinct difference between
> > Canonical Correlation and general Partial Least Squares: "While CCA
> > maximizes the correlation of data in the dimensionality-reduced space,
> > partial least squares (PLS) maximizes their covariance." (And just to
> > get overly technical, I think that sentence should read PLS maximizes
> > their squared covariance, so a covariance of -5 is treated the same as
> > a covariance of 5)
>
> Are they the same when all original variables are
> normalized to unit variance?
>
> Hope this helps.
>
> Greg

I don't think so. According to the paper I cited, CCA is the same as
orthonormalized PLS, not normalized PLS.

--
Paige Miller
paige\dot\miller \at\ kodak\dot\com

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Ray Koopman

Date: 4 Mar, 2010 07:16:47

Message: 26 of 27

On Mar 2, 10:50 pm, Ray Koopman <koop...@sfu.ca> wrote:
> [...]
> I was wrong. Kresta et al's algorithm is not CCA by the power method.

In Kresta et al's algorithm, w and q at convergence are the left and
right singular vectors corresponding to the largest singular value of
X'Y, or (equivalently) the eigenvectors corresponding to the largest
eigenvalue of X'YY'X and Y'XX'Y. After n iterations, the unnormalized
q is (Y'XX'Y)^n * q_0, where q_0 is the initial q -- all zeros, except
for a single one -- implied by taking the initial u to be one of the
columns of Y. That's the classical "power method" for getting the
eigenvector corresponding to the largest eigenvalue.

Note that X and Y in the above are the original X and Y only for the
first (w,q) pair; for subsequent pairs X and Y are residuals, as per
steps 11 & 12. This means that we can not short-cut the process by
getting all the singular vectors of the original X'Y.

Subject: Hadi&Ling1998 Principal Components Analysis pitfalls?

From: Johann Hibschman

Date: 4 Mar, 2010 15:30:23

Message: 27 of 27

Paige Miller <paige.miller@kodak.com> writes:

> As far as "power method" for doing a Canonical Correlation, this paper
> establishes the equivalence of Canonical Correlation and a variation
> of PLS known as Orthonormalized PLS:
> http://ijcai.org/papers09/Papers/IJCAI09-207.pdf. This paper also
> explains there is a distinct difference between Canonical Correlation
> and general Partial Least Squares: "While CCA maximizes the
> correlation of data in the dimensionality-reduced space, partial least
> squares (PLS) maximizes their covariance." (And just to get overly
> technical, I think that sentence should read PLS maximizes their
> squared covariance, so a covariance of -5 is treated the same as a
> covariance of 5)

That's funny. I just added that that paper to my "to read" stack last
week, after scratching my head for a while trying to understand the
difference between PLS and CCA.

Thanks for bringing my attention back to it, now with a bit more context.

-Johann

Tags for this Thread

No tags are associated with this thread.

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us