Code covered by the BSD License  

Highlights from
lofrtest

from lofrtest by Antonio Trujillo-Ortiz
Lack-of-fit test for regression model with independent replicate values.

lofrtest(D,alpha)
function lofrtest(D,alpha)
% LOFRTEST Lack-of-fit test for regression model with independent replicate values.
%  LOFRTEST(D,alpha) is a statistical test that gives information on the form
%  of the model under consideration. A significant lack-of-fit suggest that there
%  may be some systematic variation unaccounted for in the hypothesized model
%  (chosen model does not well describe the data). It arises when there are exact
%  replicate values of the independent variable in the model that provide an estimate
%  of pure error. Pure error is in essence the amount of error that cannot be accounted
%  for by any model. Then allows a test on whether there is error present aside 
%  from pure error. For the construction of the lack-of-fit test we need to examine 
%  three common types of linear models:
%      - single mean (one parameter)
%      - slope and intercept or common regression model (two parameters)
%      - separate means for each x-value or one-way ANOVA (many parameters).
%  So, the pure error is the error of the separate means on ANOVA and the total error
%  in the residual resulting in the regression analysis: the lack-of-fit results
%  to be the difference between this two sources of error,
%
%                 SS(LOF) = SSR(Model) - SSE(ANOVA).
%
%  Syntax: lofrtest(D,alpha) 
%      
%  Inputs:
%       D - matrix data (=[X Y]) (last column must be the Y-dependent variable).
%           (X-independent variable entry can be for a simple [X], multiple [X1,X2,X3,...Xp] 
%           or polynomial [X,X^2,X^3,...,X^p] regression model).
%   alpha - significance level (default = 0.05). 
%
%  Outputs:
%       A complete summary (table) of analysis of variance partitioning sources of
%       variation for testing lack-of-fit.
%
%  Example from the data on height and weight of 19 students in Psy 202. Assigment 3 of Psych
%  3030 from the Department of Psychology of the York University. Available on Internet at
%  the URL address http://www.psych.yorku.ca/lab/psy3030/assign/assign3.htm
%  We are interested to test with a significance-value = 0.05 if there is a lack-of-fit on the
%  regression model due to the height replicate values.
%
%              -------------------   -------------------
%                Height   Weight       Height   Weight
%              -------------------   -------------------
%                  60       90           68      140 
%                  60      100           68      135
%                  62      110           70      160
%                  62      116           70      145
%                  62      120           70      148
%                  66      140           71      143
%                  66      170           71      135
%                  68      130           74      195
%                  68      117           74      164
%                  68      155
%              -------------------   -------------------
%
%  Data matrix must be:
%  D=[60 90;60 100;62 110;62 116;62 120;66 140;66 170;68 130;68 117;68 155;68 140;68 135;
%     70 160;70 145;70 148;71 143;71 135;74 195;74 164];
%
%  Calling on Matlab the function: 
%             lofrtest(D)
%
%  Answer is:
%
%  Lack-of-fit test for regression model with independent replicate values.
%  --------------------------------------------------------------------------
%  SOV                      SS         df           MS         F        P
%  --------------------------------------------------------------------------
%  Model               7658.359         1      7658.359     31.720   0.0000
%  Residual            4104.378        17       241.434
%  --------------------------------------------------------------------------
%  Lack-of-fit         2142.011         5       428.402      2.620   0.0797
%  Pure error          1962.367        12       163.531
%  --------------------------------------------------------------------------
%  Total              11762.737        18
%  --------------------------------------------------------------------------
%  If the associated P-value for any F test is equal or larger than 0.05
%  The corresponding null hypothesis is met. Otherwise it is not met.
%
%  Created by A. Trujillo-Ortiz, R. Hernandez-Walls, A. Castro-Perez and 
%             F.J. Marquez-Rocha
%             Facultad de Ciencias Marinas
%             Universidad Autonoma de Baja California
%             Apdo. Postal 453
%             Ensenada, Baja California
%             Mexico.
%             atrujo@uabc.mx
%  Copyright (C) March 4, 2005.
%
%  To cite this file, this would be an appropriate format:
%  Trujillo-Ortiz, A., R. Hernandez-Walls, A. Castro-Perez and F.J Marquez-Rocha.
%    (2005). lofrtest:Lack-of-fit test for regression model with independent replicate
%    values. A MATLAB file. [WWW document]. URL http://www.mathworks.com/matlabcentral/
%    fileexchange/loadFile.do?objectId=7074
%
%  References:
% 
%  Department of Psychology of the York University. Available on Internet at the 
%          URL address http://www.psych.yorku.ca/lab/psy3030/assign/assign3.htm
%  Zar, J. H. (1999), Biostatistical Analysis (2nd ed.). 
%          NJ: Prentice-Hall, Englewood Cliffs. p. 345-350.
%

if nargin < 2,
   alpha = 0.05;  %(default) 
elseif (length(alpha)>1),
   error('Requires a scalar alpha value.');
elseif ((alpha <= 0) | (alpha >= 1)),
   error('Requires 0 < alpha < 1.');
end;

cd = size(D,2);
if cd < 2,
    error('The number of columns in D must be at least 2.');
end;

%Regrouping procedure
X = D(:,1:end-1);
[CX,IX,JX] = unique(X,'rows');
[nr,nc] = size(JX);
A=[];
for k=1:9;
    A=[A min(find(JX==k))];
end;
A=[A' [1:length(A)]'];
[ai,bi]=sort(A(:,1));
aii=[1:length(bi)]';
JJX(ai)=aii;
JJX(end+1:length(D))=0;
for k=1:length(JX);
    np=find(JX==k);
    min(np);
    JJX(np)=JJX(min(np));
end;
G = JJX';
g = max(G);

X = [D(:,end) G];

%One-Way Analysis of Variance Procedure
m = [];n = [];nn = [];S = [];
indice = X(:,2);
for i = 1:g
   Xe = find(indice==i);
   eval(['X' num2str(i) ' = X(Xe,1);']);
   eval(['m' num2str(i) ' = mean(X' num2str(i) ');'])
   eval(['n' num2str(i) ' = length(X' num2str(i) ') ;'])
   eval(['nn' num2str(i) ' = (length(X' num2str(i) ').^2);'])
   eval(['xm = m' num2str(i) ';'])
   eval(['xn=  n' num2str(i) ';'])
   eval(['xnn = nn' num2str(i) ';'])
   eval(['x = (sum(X' num2str(i) ').^2)/(n' num2str(i) ');']);
   m = [m;xm];n = [n;xn];nn = [nn,xnn];S = [S,x];
end;

C = (sum(X(:,1)))^2/length(X(:,1)); %correction term.
TSS = sum(X(:,1).^2)-C; %total sum of squares.
dfT = length(X(:,1))-1; %total degrees of freedom.

SSS = sum(S)-C; %sample sum of squares.
v1 = g-1; %sample degrees of freedom.
ESS = TSS-SSS; %error sum of squares.

if ESS < 0,
    ESS = 0;
end;

v2 = dfT-v1; %error degrees of freedom.
ESSMS = ESS/v2;  %error mean square

%Linear Regression Analysis by the Least Squares Method
X = D(:,1:end-1);
X = [ones(size(X(:,1))) X];  %design X-matrix.
[n,p] = size(X);
Y = D(:,end);

[Q, R] = qr(X,0);  %ordinary least squares (OLS) solution by orthogonal-triangular decomposition.
b = R\(Q'*Y);  %regression coefficients vector.

Ye = X*b;  %expected Y values.
r = Y-Ye;  %residuals
RSS = norm(r)^2;  %residual sum of squares.
MSS = norm(Ye-mean(Y))^2;  % model (regression) sum of squares.
v3 = p-1;
v4 = n-p;
MMS = MSS/v3;  %model mean square.
RMS = RSS/v4;  %residual mean square
FM = MMS/RMS;  %model F-statistic
PM = 1-fcdf(FM,v3,v4);  %P-value associated to the model F-statistic

LOFSS = RSS-ESS;  %lack-of-fit estimation.
v5 = v4-v2;  %lack-of-fit degrees of freedom.
LOFMS = LOFSS/v5;  %lack-of-fit mean square.
FLOF = LOFMS/ESSMS;  %lack-of-fit F-statistic.

if isinf(FLOF),
    FLOF = 0;
end;

PLOF = 1-fcdf(FLOF,v5,v2);  %P-value associated to the lack-of-fit F-statistic.

disp(' ')
disp('Lack-of-fit test for regression model with independent replicate values.')
fprintf('--------------------------------------------------------------------------\n');
disp('SOV                      SS         df           MS         F        P')
fprintf('--------------------------------------------------------------------------\n');
fprintf('Model   %20.3f%10i%14.3f%11.3f%9.4f\n\',MSS,v3,MMS,FM,PM);
fprintf('Residual%20.3f%10i%14.3f\n\',RSS,v4,RMS);
fprintf('--------------------------------------------------------------------------\n');
fprintf('Lack-of-fit%17.3f%10i%14.3f%11.3f%9.4f\n\',LOFSS,v5,LOFMS,FLOF,PLOF);
fprintf('Pure error%18.3f%10i%14.3f\n\',ESS,v2,ESSMS);
fprintf('--------------------------------------------------------------------------\n');
fprintf('Total%23.3f%10i\n\',TSS,dfT);
fprintf('--------------------------------------------------------------------------\n');
fprintf('If the associated P-value for any F test is equal or larger than% 3.2f\n', alpha);
fprintf('The corresponding null hypothesis is met. Otherwise it is not met.\n');

return,

Contact us at files@mathworks.com