Code covered by the BSD License  

Highlights from
press

from press by Antonio Trujillo-Ortiz
Prediction error sum of squares.

press(D)
function [x] = press(D)
%PRESS Prediction error sum of squares.
% This m-file returns a useful residual scaling, the prediction error sum of
% squares (PRESS). According to Myers and Montgomery (2002, p 46-47), to 
% calculate PRESS, select an observation i. Fit the regression model to the
% remaining n-1 observations and use this equation to predict the withheld
% observation y_i. Denoting this predicted value by ye_(i), we may find the
% prediction error for point i as e_(i)=y_i - ye_(i).
% The prediction error is often called the ith PRESS residual. This procedure
% is repeated for each observation i = 1,2,...,n, producing a set of n PRESS
% residuals e_(1),e_(2),...,e_(n). Then the PRESS statistic is defined as
% the sum of squares of the n PRESS residuals as in,
%
%            PRESS = i_Sum_n e_(i)^2 = i_Sum_n [y_i - ye_(i)]^2
%
% Thus PRESS uses such possible subset of n-1 observations as an estimation
% data set, and every observation in turn is used to form a prediction data
% set. In the construction of this m-file, we use this statistical approach.
% However, as we have seen that calculating PRESS requires fitting n different
% regressions. But, also it is possible to calculate PRESS from the results
% of a single least squares fit to all n observations. It turns out that the
% ith PRESS residual is,
%
%                          e_(i) = e_i/(1 - h_ii)
%
% Thus, because PRESS is just the sum of the squares of the PRESS residuals,
% a simple computing formula is
%
%              PRESS = i_Sum_n [e_i/(1 - h_ii)]^2
%
% It is easy to see that the PRESS residual is just the ordinary residual
% weighted according to the diagonal elements of the hat matrix h_ii. Also,
% for all the interested people, here we just indicate, in an inactive form,
% this statistical approaching.
%
% Data points for which h_ii are large will have large PRESS residuals. These
% observations will generally be high influence points. Generally, a large
% difference between the ordinary residual and the PRESS residual will
% indicate a point where the model fits the data well, but a model built
% without that point predicts poorly.
%
% Syntax: function x = press(D)
%
% Inputs:
%    D - matrix data (=[X Y]) (last column must be the Y-dependent variable).
%           (X-independent variables).
% Output:
%    x - prediction error sum of squares (PRESS).
%
% Example: 
% From example 2.1 from Myers and Montgomery (2002, p.23) we are interested 
% to calculate the prediction error sum of squares (PRESS). Data are,
%
%                      X1         X2         Y
%                   -----------------------------
%                      -1         -1        1004
%                       1         -1        1626
%                      -1       0.6667       852
%                       1       0.6667      1506
%                       0      -0.4444      1272
%                       0      -0.7222      1270
%                       0       0.6667      1269
%                      -1      -0.1667       903
%                       1      -0.1667      1555
%                       0         -1        1260
%                       0       0.9444      1146
%                       0      -0.1667      1276
%                       0          1        1225
%                    0.1667    -0.1667      1321
%                   -----------------------------
%
% Data matrix must be:
%  D=[-1 -1 1004;1 -1 1636;-1 0.6667 852;1 0.6667 1506;0 -0.4444 1272;
%  0 -0.7222 1270;0 0.6667 1269;-1 -0.1667 903;1 -0.1667 1555;0 -1 1260;
%  0 0.94444 1146;0 -0.1667 1276;0 1 1225;0.1667 -0.1667 1321];
%
% Calling on Matlab the function: 
%    x = press(D)
%
% Answer is:
%
% x = 2.2225e+004   (= 22,225.0)
%
% Created by A. Trujillo-Ortiz, R. Hernandez-Walls, A. Castro-Perez
%            and K. Barba-Rojo
%            Facultad de Ciencias Marinas
%            Universidad Autonoma de Baja California
%            Apdo. Postal 453
%            Ensenada, Baja California
%            Mexico.
%            atrujo@uabc.mx
%
% Copyright (C) April 02, 2007.
%
% To cite this file, this would be an appropriate format:
% Trujillo-Ortiz, A., R. Hernandez-Walls, K. Barba-Rojo, and 
%   A. Castro-Perez (2006). press:Prediction error sum of squares.
%   A MATLAB file. [WWW document]. URL http://www.mathworks.com/
%   matlabcentral/fileexchange/loadFile.do?objectId=14564
%
%  Reference:
%  Myers, R. H. and Montgomery, D. C. (2002), Response Surface Methodology:
%          Process and Product Optimization Using Designed Experiments. 2nd.
%          Ed. NY: John Wiley & Sons, Inc.
%

n = size(D,1);
c = [1:n];
I = reshape(c(repmat(1:n,n-1,1)),n,n-1);
I = flipud(I);

e = [];
for i = 1:n;
    idx = I(i,:)';
    DD = D(idx,:);
    [r c] = size(DD);
    n = r; %number of data
    Y = DD(:,c); %response vector
    X = [ones(n,1) DD(:,1:c-1)]; %design matrix
    b = inv(X'*X)*(X'*Y); %least squares parameters estimation (=X\Y)
    ye = [1 D(i,1:2)]*b;
    ee = (D(i,end)-ye)^2;
    e = [e;ee];
end

x = sum(e); %prediction error sum of squares (PRESS)

return,

%Prediction error sum of squares (PRESS) by using hat matrix
%[r c] = size(D);
%n = r; %number of data
%Y = D(:,c); %response vector
%X = [ones(n,1) D(:,1:c-1)]; %design matrix
%b = inv(X'*X)*(X'*Y); %least squares parameters estimation
%Ye = X*b; %expected response value
%e = Y-Ye; %residual term
%H = X*inv(X'*X)*X'; %hat matrix
%hii = diag(H); %leverage of the i-th observation
%x = sum((e./(1-hii)).^2); %prediction error sum of squares (PRESS)

Contact us at files@mathworks.com