function lofrtest(D,alpha)
% LOFRTEST Lack-of-fit test for regression model with independent replicate values.
% LOFRTEST(D,alpha) is a statistical test that gives information on the form
% of the model under consideration. A significant lack-of-fit suggest that there
% may be some systematic variation unaccounted for in the hypothesized model
% (chosen model does not well describe the data). It arises when there are exact
% replicate values of the independent variable in the model that provide an estimate
% of pure error. Pure error is in essence the amount of error that cannot be accounted
% for by any model. Then allows a test on whether there is error present aside
% from pure error. For the construction of the lack-of-fit test we need to examine
% three common types of linear models:
% - single mean (one parameter)
% - slope and intercept or common regression model (two parameters)
% - separate means for each x-value or one-way ANOVA (many parameters).
% So, the pure error is the error of the separate means on ANOVA and the total error
% in the residual resulting in the regression analysis: the lack-of-fit results
% to be the difference between this two sources of error,
%
% SS(LOF) = SSR(Model) - SSE(ANOVA).
%
% Syntax: lofrtest(D,alpha)
%
% Inputs:
% D - matrix data (=[X Y]) (last column must be the Y-dependent variable).
% (X-independent variable entry can be for a simple [X], multiple [X1,X2,X3,...Xp]
% or polynomial [X,X^2,X^3,...,X^p] regression model).
% alpha - significance level (default = 0.05).
%
% Outputs:
% A complete summary (table) of analysis of variance partitioning sources of
% variation for testing lack-of-fit.
%
% Example from the data on height and weight of 19 students in Psy 202. Assigment 3 of Psych
% 3030 from the Department of Psychology of the York University. Available on Internet at
% the URL address http://www.psych.yorku.ca/lab/psy3030/assign/assign3.htm
% We are interested to test with a significance-value = 0.05 if there is a lack-of-fit on the
% regression model due to the height replicate values.
%
% ------------------- -------------------
% Height Weight Height Weight
% ------------------- -------------------
% 60 90 68 140
% 60 100 68 135
% 62 110 70 160
% 62 116 70 145
% 62 120 70 148
% 66 140 71 143
% 66 170 71 135
% 68 130 74 195
% 68 117 74 164
% 68 155
% ------------------- -------------------
%
% Data matrix must be:
% D=[60 90;60 100;62 110;62 116;62 120;66 140;66 170;68 130;68 117;68 155;68 140;68 135;
% 70 160;70 145;70 148;71 143;71 135;74 195;74 164];
%
% Calling on Matlab the function:
% lofrtest(D)
%
% Answer is:
%
% Lack-of-fit test for regression model with independent replicate values.
% --------------------------------------------------------------------------
% SOV SS df MS F P
% --------------------------------------------------------------------------
% Model 7658.359 1 7658.359 31.720 0.0000
% Residual 4104.378 17 241.434
% --------------------------------------------------------------------------
% Lack-of-fit 2142.011 5 428.402 2.620 0.0797
% Pure error 1962.367 12 163.531
% --------------------------------------------------------------------------
% Total 11762.737 18
% --------------------------------------------------------------------------
% If the associated P-value for any F test is equal or larger than 0.05
% The corresponding null hypothesis is met. Otherwise it is not met.
%
% Created by A. Trujillo-Ortiz, R. Hernandez-Walls, A. Castro-Perez and
% F.J. Marquez-Rocha
% Facultad de Ciencias Marinas
% Universidad Autonoma de Baja California
% Apdo. Postal 453
% Ensenada, Baja California
% Mexico.
% atrujo@uabc.mx
% Copyright (C) March 4, 2005.
%
% To cite this file, this would be an appropriate format:
% Trujillo-Ortiz, A., R. Hernandez-Walls, A. Castro-Perez and F.J Marquez-Rocha.
% (2005). lofrtest:Lack-of-fit test for regression model with independent replicate
% values. A MATLAB file. [WWW document]. URL http://www.mathworks.com/matlabcentral/
% fileexchange/loadFile.do?objectId=7074
%
% References:
%
% Department of Psychology of the York University. Available on Internet at the
% URL address http://www.psych.yorku.ca/lab/psy3030/assign/assign3.htm
% Zar, J. H. (1999), Biostatistical Analysis (2nd ed.).
% NJ: Prentice-Hall, Englewood Cliffs. p. 345-350.
%
if nargin < 2,
alpha = 0.05; %(default)
elseif (length(alpha)>1),
error('Requires a scalar alpha value.');
elseif ((alpha <= 0) | (alpha >= 1)),
error('Requires 0 < alpha < 1.');
end;
cd = size(D,2);
if cd < 2,
error('The number of columns in D must be at least 2.');
end;
%Regrouping procedure
X = D(:,1:end-1);
[CX,IX,JX] = unique(X,'rows');
[nr,nc] = size(JX);
A=[];
for k=1:9;
A=[A min(find(JX==k))];
end;
A=[A' [1:length(A)]'];
[ai,bi]=sort(A(:,1));
aii=[1:length(bi)]';
JJX(ai)=aii;
JJX(end+1:length(D))=0;
for k=1:length(JX);
np=find(JX==k);
min(np);
JJX(np)=JJX(min(np));
end;
G = JJX';
g = max(G);
X = [D(:,end) G];
%One-Way Analysis of Variance Procedure
m = [];n = [];nn = [];S = [];
indice = X(:,2);
for i = 1:g
Xe = find(indice==i);
eval(['X' num2str(i) ' = X(Xe,1);']);
eval(['m' num2str(i) ' = mean(X' num2str(i) ');'])
eval(['n' num2str(i) ' = length(X' num2str(i) ') ;'])
eval(['nn' num2str(i) ' = (length(X' num2str(i) ').^2);'])
eval(['xm = m' num2str(i) ';'])
eval(['xn= n' num2str(i) ';'])
eval(['xnn = nn' num2str(i) ';'])
eval(['x = (sum(X' num2str(i) ').^2)/(n' num2str(i) ');']);
m = [m;xm];n = [n;xn];nn = [nn,xnn];S = [S,x];
end;
C = (sum(X(:,1)))^2/length(X(:,1)); %correction term.
TSS = sum(X(:,1).^2)-C; %total sum of squares.
dfT = length(X(:,1))-1; %total degrees of freedom.
SSS = sum(S)-C; %sample sum of squares.
v1 = g-1; %sample degrees of freedom.
ESS = TSS-SSS; %error sum of squares.
if ESS < 0,
ESS = 0;
end;
v2 = dfT-v1; %error degrees of freedom.
ESSMS = ESS/v2; %error mean square
%Linear Regression Analysis by the Least Squares Method
X = D(:,1:end-1);
X = [ones(size(X(:,1))) X]; %design X-matrix.
[n,p] = size(X);
Y = D(:,end);
[Q, R] = qr(X,0); %ordinary least squares (OLS) solution by orthogonal-triangular decomposition.
b = R\(Q'*Y); %regression coefficients vector.
Ye = X*b; %expected Y values.
r = Y-Ye; %residuals
RSS = norm(r)^2; %residual sum of squares.
MSS = norm(Ye-mean(Y))^2; % model (regression) sum of squares.
v3 = p-1;
v4 = n-p;
MMS = MSS/v3; %model mean square.
RMS = RSS/v4; %residual mean square
FM = MMS/RMS; %model F-statistic
PM = 1-fcdf(FM,v3,v4); %P-value associated to the model F-statistic
LOFSS = RSS-ESS; %lack-of-fit estimation.
v5 = v4-v2; %lack-of-fit degrees of freedom.
LOFMS = LOFSS/v5; %lack-of-fit mean square.
FLOF = LOFMS/ESSMS; %lack-of-fit F-statistic.
if isinf(FLOF),
FLOF = 0;
end;
PLOF = 1-fcdf(FLOF,v5,v2); %P-value associated to the lack-of-fit F-statistic.
disp(' ')
disp('Lack-of-fit test for regression model with independent replicate values.')
fprintf('--------------------------------------------------------------------------\n');
disp('SOV SS df MS F P')
fprintf('--------------------------------------------------------------------------\n');
fprintf('Model %20.3f%10i%14.3f%11.3f%9.4f\n\',MSS,v3,MMS,FM,PM);
fprintf('Residual%20.3f%10i%14.3f\n\',RSS,v4,RMS);
fprintf('--------------------------------------------------------------------------\n');
fprintf('Lack-of-fit%17.3f%10i%14.3f%11.3f%9.4f\n\',LOFSS,v5,LOFMS,FLOF,PLOF);
fprintf('Pure error%18.3f%10i%14.3f\n\',ESS,v2,ESSMS);
fprintf('--------------------------------------------------------------------------\n');
fprintf('Total%23.3f%10i\n\',TSS,dfT);
fprintf('--------------------------------------------------------------------------\n');
fprintf('If the associated P-value for any F test is equal or larger than% 3.2f\n', alpha);
fprintf('The corresponding null hypothesis is met. Otherwise it is not met.\n');
return,