File Exchange

image thumbnail

Partial Least-Squares and Discriminant Analysis

version 1.0 (32.4 KB) by

A tutorial and tool using PLS for discriminant analysis.

7 Ratings



View License

Patial Least-Squares (PLS) is a widely used technique in various areas. This package provides a function to perform the PLS regression using the Nonlinear Iterative Partial Least-Squares (NIPALS) algorithm. It consists of a tutorial function to explain the NIPALS algorithm and the way to perform discriminant analysis using the PLS function.

The difference between the total least squares regression and partial least squares regression can be explained as follows:

For given independent data X and dependent data Y, to fit a model

Y = X*B + E

the total least squares regression solves the problem to minimize the error in least squares sense:

J = E'*E

Instead of directly fitting a model between X and Y, the PLS decomposes X and Y into low-dimensional space (so called laten variable space) first:

X = T*P' + E0, and
Y = U*Q' + F0

where P and Q are orthogonal matrices, i.e. P'*P=I, Q'*Q=I, T and U has the same number of columns, a, which is much less than the number of columns of X. Then, a least squares regression is performed between T and U:

U = T*B + F1

At the end, the overall regression model is

Y = X*(P*B*Q') + F

i.e. the overall regression coefficient is P*B*Q'.

The reason to perform PLS instead of total LS regression is that the data sets X and Y may contain random noises, which should be excluded from regression. Decomposing X and Y into laten space can ensure the regression is performed based on most reliable variation.

Comments and Ratings (12)

Maggie Zhai

very good

excellent code,
On NIPALS for PCA, what’s the basis for tol2=(1-0.95)*5*(10-1)? If I had a matrix of (20,100) would tol2 be calculated as =(1-0.95)*100*(20-1)?
Can we use RSq instead of tol2? With the following calculation?
VarE = var(X,0,2);
VarX = var((T*P'+X),0,2);
RSq(r)= 1-((VarE)'/(VarX)');
if RSq(r)<0.95

Oskar Vivero

Oskar Vivero (view profile)

Illustrative code of Wold's PLS algorithm based on Geladi and Kowalski 1988 paper. The predictor in the example is incorrect. You state the prediction Y_hat_new = (X_new*P)*B*Q', which yields an error norm(Y_new-Y_hat_new)=0.187. The correct predictor is Y_hat_new = X_new* (W/(P'*W))*B*Q'.

Ramy Baly

Hi, I am really wondering how to use this code to predict the response variable. Is it like that:
- I get the BETA values from applying PLS on some training data
- I multiply the BETA with the testing data to get the predicted (Y) ??

or there is a kind of iterations, such as picking only the components with higher BETAs?

Yi Cao

Yi Cao (view profile)

ncomp? No such variable in my code.


Matlabus Ach

I just did that I have two questions:
what does the number ncomp means and how can we define it?
Then how can use the results to define which variable is important twards the output as I get a matrix with weights.
my X is 220 * 33
my Y is 220 * 1

V. Poor


Paul (view profile)

Su, I believe you can use the PLS algorithm directly. Look at the example discussed in the HTML file - the IRIS data set - where the Y responses are all binary.


Su (view profile)

I have a general question regarding PLS regression that confused me;

Suppose the response variables Y is binary, can we run a PLS regression on it directly? or we need to resort to logistic version?


kevin chen

Are you also interested in the convolution algorithms in Reading's Modulated Differential Scanning Calorimetry? -- I read a lot of books and technical articles, but only got confusion: how to deconvolute the modulated profile into reversible and non-reversible parts?

kevin chen

it is excellent for a PLS algorithm beginner like me, but, is this non-linear PLS algorithm? or only PLS1?

fielen cathnic



update pls function

update description

update the example file.

update description

MATLAB Release
MATLAB 7.5 (R2007b)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video