This m-file returns a useful residual scaling, the prediction error sum of squares (PRESS). To calculate PRESS, select an observation i. Fit the regression model to the remaining n-1 observations and use this equation to predict the withheld observation y_i. Denoting this predicted value by ye_(i), we may find the prediction error for point i as e_(i)=y_i - ye_(i). The prediction error is often called the ith PRESS residual. This procedure is repeated for each observation i = 1,2,...,n, producing a set of n PRESS residuals e_(1),e_(2),...,e_(n). Then the PRESS statistic is defined as the sum of squares of the n PRESS residuals as in,
PRESS = i_Sum_n e_(i)^2 = i_Sum_n [y_i - ye_(i)]^2
Thus PRESS uses such possible subset of n-1 observations as an estimation data set, and every observation in turn is used to form a prediction data set. In the construction of this m-file, we use this statistical approach.
As we have seen that calculating PRESS requires fitting n different regressions, also it is possible to calculate it from the results of a single least squares fit to all n observations. It turns out that the ith PRESS residual is,
e_(i) = e_i/(1 - h_ii)
Thus, because PRESS is just the sum of the squares of the PRESS residuals, a simple computing formula is
PRESS = i_Sum_n [e_i/(1 - h_ii)]^2
It is easy to see that the PRESS residual is just the ordinary residual weighted according to the diagonal elements of the hat matrix h_ii. Also, for all the interested people, here we just indicate, in an inactive form, this statistical approaching.
Data points for which h_ii are large will have large PRESS residuals. These observations will generally be high influence points. Generally, a large difference between the ordinary residual and the PRESS residual will indicate a point where the model fits the data well, but a model built without that point predicts poorly.
This is also known as leave-one-out cross-validation (LOOCV) in linear models as a measure of the accuracy. [an anon's suggestion]
In order to improve the matrix script for avoiding the squares condition number in the regression parameter estimation are by using a pivoted QR factorization of X.
Syntax: function x = press(D)
Inputs:
D - matrix data (=[X Y]) (last column must be the Y-dependent variable).
(X-independent variables).
Output:
x - prediction error sum of squares (PRESS). |