The validation method for choosing number of components of PLS is controversial. However, there are several metrics you can use: root means squared error of crossvalidation (RMSECV), prediction residual error sum of squares (PRESS), R^2, Q^2. Several papers on choosing number of PLS components is as follows:
http://scholar.google.com/scholar?q=A+comparison+of+partial+least+squares+regression+with+other+prediction+methods&hl=en&as_sdt=0&as_vis=1&oi=scholart
Multiway Analysis with Applications in the Chemical Sciences, by Age Smilde.
Some people use classification accuracy as the determinant for number of PLS components, which is not verified.
Hope this helps.
"Frank Sabouri" <Frank.Sabouri@gmail.com> wrote in message <i5171m$p0k$1@fred.mathworks.com>...
> Hi all,
>
> In partial least squares (plsregress), we need to define number of predictors or number of components in function to avoid overfitting. I noticed that people plot either PCTVAR or MSE versus number of principal components. If we have “n” predictors the size of PCTVAR and MSE are 2byn and 2byn+1, respectively. I have some questions:
>
> 1. The question is whether PCTVAR or MSE should be considered to define number of “n” in “plsregress” function. May you please explain why?
>
> 2. If I would plot MSE vs. number of components or “n”, but not “n+1”; which column of the data (MSE) should be removed #1 or #n+1.
>
> 3. To define the number of components, I need to know whether we need to look at the first row (predictors) of either PCTVAR or MSE, or whether we need to plot the 2nd row (responses) of either PCTVAR or MSE.
>
> Regards,
> Frank
