Be the first to rate this file! 46 downloads (last 30 days) File Size: 3.13 KB File ID: #14564

press

by Antonio Trujillo-Ortiz

 

09 Apr 2007 (Updated 11 Apr 2007)

Code covered by BSD License  

Prediction error sum of squares.

Download Now | Watch this File

File Information
Description

This m-file returns a useful residual scaling, the prediction error sum of squares (PRESS). To calculate PRESS, select an observation i. Fit the regression model to the remaining n-1 observations and use this equation to predict the withheld observation y_i. Denoting this predicted value by ye_(i), we may find the prediction error for point i as e_(i)=y_i - ye_(i). The prediction error is often called the ith PRESS residual. This procedure is repeated for each observation i = 1,2,...,n, producing a set of n PRESS residuals e_(1),e_(2),...,e_(n). Then the PRESS statistic is defined as the sum of squares of the n PRESS residuals as in,

PRESS = i_Sum_n e_(i)^2 = i_Sum_n [y_i - ye_(i)]^2

Thus PRESS uses such possible subset of n-1 observations as an estimation data set, and every observation in turn is used to form a prediction data set. In the construction of this m-file, we use this statistical approach.

As we have seen that calculating PRESS requires fitting n different regressions, also it is possible to calculate it from the results of a single least squares fit to all n observations. It turns out that the ith PRESS residual is,

e_(i) = e_i/(1 - h_ii)

Thus, because PRESS is just the sum of the squares of the PRESS residuals, a simple computing formula is

PRESS = i_Sum_n [e_i/(1 - h_ii)]^2

It is easy to see that the PRESS residual is just the ordinary residual weighted according to the diagonal elements of the hat matrix h_ii. Also, for all the interested people, here we just indicate, in an inactive form, this statistical approaching.

Data points for which h_ii are large will have large PRESS residuals. These observations will generally be high influence points. Generally, a large difference between the ordinary residual and the PRESS residual will indicate a point where the model fits the data well, but a model built without that point predicts poorly (.

Syntax: function x = press(D)

Inputs:
D - matrix data (=[X Y]) (last column must be the Y-dependent variable).
(X-independent variables).
 
Output:
x - prediction error sum of squares (PRESS).

Required Products Statistics Toolbox
MATLAB release MATLAB 7 (R14)
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (3)
04 Oct 2009 David Herrera

This code does not give the correct answer to the problem stated in Response Surface Methodology by Montgomery & Myers, page 48, which is 22,225. This code gives 21265.2514 by the two different methods in the code. It needs revision. See output below (some variables were printed):
Calculation of PRESS
n = 14
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
r = 13, c = 3
PRESS = 21265.2514
hii = 0.358818
hii = 0.379259
hii = 0.323410
hii = 0.294624
hii = 0.094723
hii = 0.137715
hii = 0.142682
hii = 0.242150
hii = 0.236885
hii = 0.202703
hii = 0.209627
hii = 0.073729
hii = 0.225672
hii = 0.078004
2nd PRESS = 21265.2514
>>

04 Oct 2009 David Herrera

The correct answer is 22225.0, which is correctly stated in the comment section of this code. However, the code does not compute this. Revision needed to compute PRESS value.

05 Oct 2009 David Herrera

Correction: My computation of PRESS was wrong, and Mr. Antonio Trujillo-Ortiz was absolutely correct in the calculation of PRESS from his code. The problem was that I forgot to place a minus in front of one of the numbers, so I got a slightly different answer. Independently with different code, I got the same answer as Antonio Trujillo-Ortiz, which was 22225.0575. I apologize for the mistake.

Please login to add a comment or rating.
Updates
11 Apr 2007

Text was improved.

Tag Activity for this File
Tag Applied By Date/Time
statistics Antonio Trujillo-Ortiz 22 Oct 2008 09:08:14
probability Antonio Trujillo-Ortiz 22 Oct 2008 09:08:14
residual scaling Antonio Trujillo-Ortiz 22 Oct 2008 09:08:14
prediction error sum of squares Antonio Trujillo-Ortiz 22 Oct 2008 09:08:14
residuals Antonio Trujillo-Ortiz 22 Oct 2008 09:08:14
regression Antonio Trujillo-Ortiz 22 Oct 2008 09:08:14
 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com