This mfile returns a useful residual scaling, the prediction error sum of squares (PRESS). To calculate PRESS, select an observation i. Fit the regression model to the remaining n1 observations and use this equation to predict the withheld observation y_i. Denoting this predicted value by ye_(i), we may find the prediction error for point i as e_(i)=y_i  ye_(i). The prediction error is often called the ith PRESS residual. This procedure is repeated for each observation i = 1,2,...,n, producing a set of n PRESS residuals e_(1),e_(2),...,e_(n). Then the PRESS statistic is defined as the sum of squares of the n PRESS residuals as in,
PRESS = i_Sum_n e_(i)^2 = i_Sum_n [y_i  ye_(i)]^2
Thus PRESS uses such possible subset of n1 observations as an estimation data set, and every observation in turn is used to form a prediction data set. In the construction of this mfile, we use this statistical approach.
As we have seen that calculating PRESS requires fitting n different regressions, also it is possible to calculate it from the results of a single least squares fit to all n observations. It turns out that the ith PRESS residual is,
e_(i) = e_i/(1  h_ii)
Thus, because PRESS is just the sum of the squares of the PRESS residuals, a simple computing formula is
PRESS = i_Sum_n [e_i/(1  h_ii)]^2
It is easy to see that the PRESS residual is just the ordinary residual weighted according to the diagonal elements of the hat matrix h_ii. Also, for all the interested people, here we just indicate, in an inactive form, this statistical approaching.
Data points for which h_ii are large will have large PRESS residuals. These observations will generally be high influence points. Generally, a large difference between the ordinary residual and the PRESS residual will indicate a point where the model fits the data well, but a model built without that point predicts poorly.
This is also known as leaveoneout crossvalidation (LOOCV) in linear models as a measure of the accuracy. [an anon's suggestion]
In order to improve the matrix script for avoiding the squares condition number in the regression parameter estimation are by using a pivoted QR factorization of X.
Syntax: function x = press(D)
Inputs:
D  matrix data (=[X Y]) (last column must be the Ydependent variable).
(Xindependent variables).
Output:
x  prediction error sum of squares (PRESS).
1.5  Text was improved. [The anon suggestion was taken into account.] 

1.3  Text was improved. 

1.1  Code was improved. Taking into account the mathematical suggestions done by Bart and John D'Errico (08262013). 
giuseppe (view profile)
Hi. I tried to use a RBF Network to fit my data. In this case, should the Design Matrix H my input data?
I mean: D = [H;y]?
Thanks
Antonio TrujilloOrtiz (view profile)
annon. Yes, it is.
Anon (view profile)
Refers PRESS to the wellknown leaveoneoutcrossvalidation? If yes, it would be good to indicate so in the description.
John D'Errico (view profile)
Thanks for fixing it. I'm happy now.
Antonio TrujilloOrtiz (view profile)
There were already done to this mfile the valuable suggestions done by Bart and John D'Errico. Thx
Antonio TrujilloOrtiz (view profile)
Hi John,
Thanks for your broad and useful mathematical fundament comment. I'll review the file script and take into account those in order to improve it.
Antonio TrujilloOrtiz
John D'Errico (view profile)
Bart  Worse, this uses a poor form to solve the least squares problem. Here is the line in question in press:
b = inv(X'*X)*(X'*Y); %least squares parameters estimation (=X\Y)
This is arguably bad because it uses inv, but FAR worse, it uses the normal equations. That is the WRONG way to solve the least squares problem.
Faster, more efficient, and more accurate is:
b = X\Y;
This works for nonsquare matrices X. Why is it better? Because the form in press forms X'*X, which squares the condition number of the problem. So even moderately poorly conditioned problems become vastly more illconditioned. You can lose a lot of accuracy using that form.
Instead, the X\Y form uses a pivoted QR factorization of X, which avoids squaring the condition number.
As far as the other lines in question, they too can be greatly improved. For example, this line:
H = X*inv(X'*X)*X'; %hat matrix
Again, this forms X'*X, then calls inv. A better solution might be to compute the QR factors:
[Q,R] = qr(X);
We know that X = Q*R. Therefore,
X'*X = R'*Q'*Q*R
But Q is orthogonal, so Q'*Q is an identity matrix. So we have that
X'*X = R'*R
As importantly,
inv(X'*X) = inv(R'*R)
If the matrix X'*X is invertible, then R is better conditioned, and more easily inverted. So one would be better off forming inv(X'*X) as inv(R)*inv(R)':
k = size(X,2);
r = inv(R(1:k,:));
r*r'
While this does indeed use a matrix inverse, at least it is throwing inv at the unsquared matrix.
Even better is to use the column pivoted QR here. So
[Q,R,E] = qr(X);
We can use that factorization to give both the least squares solution in lieu of \ as well as the hat matrix.
From the help to QR:
If A is full:
[Q,R,E] = qr(A) produces unitary Q, upper triangular R and a
permutation matrix E so that A*E = Q*R. The column permutation E is
chosen so that ABS(DIAG(R)) is decreasing.
So we know that
X*E = Q*R
E is a permutation matrix, so we have that E*E' is an identity. Therefore,
X = Q*R*E'
So the least squares solution can be gained from the system of equations:
Q*R*E'*b = Y
R*E'*b = Q'*Y
E'*b = R\(Q'*Y)
Finally,
b = E*(R\(Q'*Y));
Note that since R is upper triangular, MATLAB will be smart about the use of \ to do a backsubstitution to solve the system.
The hat matrix also falls out nicely enough from that pivoted QR, where all can be done efficiently and using wellposed numerical operations.
MatlabUser (view profile)
Giving this a little more thought, I realized that the use of (pseudo) inverse for multiplication is best avoided, both in press() and in general. To quote the R2013a help:
"A frequent misuse of inv arises when solving the system of linear equations
Ax = b. One way to solve this is with
x = inv(A)*b. A better way, from both an execution time and numerical accuracy standpoint, is to use the matrix division operator x = A\b. This produces the solution using Gaussian elimination, without forming the inverse."
MatlabUser (view profile)
Thank you for this file. Very helpful.
I use it in slightly modified form, as I think the original here has some room for improvement:
1) The line: ye = [1 D(i,1:2)]*b;
can be changed into: ye = [1 D(i,1:c1)]*b;
This will make the function work for X with more than two columns.
2) The line: H = X*inv(X'*X)*X'; %hat matrix
This is problematic because of inv(). Current form may match textbook form most closely, but it gives unexpected results. I have had cases where the 'loop method' result is orders of magnitude different from the 'matrix method' result. Chanching inv() to pinv() may be an improvement. Better still may be to incorporate the work of Prof. Tim Davis: http://www.mathworks.nl/matlabcentral/fileexchange/24119
Mind you, I am no expert on matrix operations. I just noticed that multiplying by inv() is not very 'robust'.
Leidy (view profile)
Thanks
David Herrera (view profile)
Correction: My computation of PRESS was wrong, and Mr. Antonio TrujilloOrtiz was absolutely correct in the calculation of PRESS from his code. The problem was that I forgot to place a minus in front of one of the numbers, so I got a slightly different answer. Independently with different code, I got the same answer as Antonio TrujilloOrtiz, which was 22225.0575. I apologize for the mistake.
David Herrera (view profile)
The correct answer is 22225.0, which is correctly stated in the comment section of this code. However, the code does not compute this. Revision needed to compute PRESS value.
David Herrera (view profile)
This code does not give the correct answer to the problem stated in Response Surface Methodology by Montgomery & Myers, page 48, which is 22,225. This code gives 21265.2514 by the two different methods in the code. It needs revision. See output below (some variables were printed):
Calculation of PRESS
n = 14
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
PRESS = 21265.2514
hii = 0.358818
hii = 0.379259
hii = 0.323410
hii = 0.294624
hii = 0.094723
hii = 0.137715
hii = 0.142682
hii = 0.242150
hii = 0.236885
hii = 0.202703
hii = 0.209627
hii = 0.073729
hii = 0.225672
hii = 0.078004
2nd PRESS = 21265.2514
>>