function [AnDarexptest] = AnDarexptest(x,alpha)
%ANDAREXPTEST Anderson-Darling test for assessing exponential distribution of
% a sample data.
% The Anderson-Darling test (Anderson and Darling, 1952) is used to test if
% a sample of data comes from a specific distribution. It is a modification
% of the Kolmogorov-Smirnov (K-S) test and gives more weight to the tails
% than the K-S test. The K-S test is distribution free in the sense that the
% critical values do not depend on the specific distribution being tested.
% The Anderson-Darling test makes use of the specific distribution in calculating
% critical values. This has the advantage of allowing a more sensitive test
% and the disadvantage that critical values must be calculated for each
% distribution.
% The Anderson-Darling test is only available for a few specific distributions.
% The test is calculated as:
%
% AD2 = integral{[F_o(x)-F_t(x)]^2/[F_t(x)(1-F_t(x)0]}dF_t(x)
%
% AD2a = AD2*a
%
% Note that for a given distribution, the Anderson-Darling statistic may be
% multiplied by a constant, a (which usually depends on the sample size, n).
% These constants are given in the various papers by Stephens (1974, 1977a,
% 1977b, 1979, 1986). This is what should be compared against the critical
% values. Also, be aware that different constants (and therefore critical
% values) have been published. You just need to be aware of what constant
% was used for a given set of critical values (the needed constant is typically
% given with the critical values).
% The critical values for the Anderson-Darling test are dependent on the
% specific distribution that is being tested. Tabulated values and formulas
% have been published for a few specific distributions (normal, lognormal,
% exponential, Weibull, logistic, extreme value type 1). The test is a one-sided
% test and the hypothesis that the distribution is of a specific form is
% rejected if the test statistic, AD2a, is greater than the critical value.
% Here we develop the m-file for detecting departure from an exponential
% distribution. It is one of the most powerful statistics for test this. For the
% null hypothesis testing, we provide the exact P-value formulation.
%
% Syntax: function AnDarexptest(x,alpha)
%
% Input:
% x - data vector
% alpha - significance level (default = 0.05)
%
% Output:
% - Complete Anderson-Darling test for an exponential distribution
%
% Example: The data on the table below represent the days between homicides
% in Waco, Texas in 1989 as reported in Kittlitz (1999). *Two homicides occurred
% on June 16 and were defined to be 12 hours apart. It is suggested that the
% data may follow some exponential distribution. We are interested to test if
% the days between homicides follow an exponential distribution.
%
% -----------------------------------------------------------------------
% Month/Date Days Between Month/Date Days Between Month/Date Days Between
% -----------------------------------------------------------------------
% Jan 20 - Jun 16 9.25* Sep 24 2
% Feb 23 34 Jun 16 0.50* Oct 1 7
% Feb 25 2 Jun 22 5.25* Oct 4 3
% Mar 5 8 Jun 25 3 Oct 8 4
% Mar 10 5 Jul 6 11 Oct 19 11
% Apr 4 25 Jul 8 2 Nov 2 14
% May 7 33 Jul 9 1 Nov 25 23
% May 24 17 Jul 26 17 Dec 28 33
% May 28 4 Sep 9 45 Dec 29 1
% Jun 7 10 Sep 22 13
% -----------------------------------------------------------------------
%
% Data vector is:
% x=[34 2 8 5 25 33 17 4 10 9.25 0.5 5.25 3 11 2 1 17 45 13 2 7 3 4 11 14
% 23 33 1];
%
% Calling on Matlab the function:
% AnDarexptest(x)
%
% Answer is:
%
% Sample size: 28
% Anderson-Darling statistic: 0.2268
% Anderson-Darling adjusted statistic: 0.2293
% Probability associated to the Anderson-Darling statistic = 0.8929
% With a given significance = 0.050
% The sampled population has an exponential distribution.
% Thus, this sample have been drawn from an exponential population with parameter = 12.2500
%
% Created by A. Trujillo-Ortiz, R. Hernandez-Walls, K. Barba-Rojo,
% A. Castro-Perez and B.E. Lavaniegos-Espejo*
% Facultad de Ciencias Marinas
% Universidad Autonoma de Baja California
% Apdo. Postal 453
% *Centro de Investigacion Cientifica y de Educacion Superior
% de Ensenada
% Ensenada, Baja California
% Mexico.
% atrujo@uabc.mx
%
% Copyright. July 13, 2007.
%
% To cite this file, this would be an appropriate format:
% Trujillo-Ortiz, A., R. Hernandez-Walls, K. Barba-Rojo, A. Castro-Perez and
% B.E. Lavaniegos-Espejo. (2007). AnDarexptest:Anderson-Darling test for assessing
% exponential distribution of a sample data. A MATLAB file. [WWW document].
% URL http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=15746
%
% References:
% Anderson, T. W. and Darling, D. A. (1952), Asymptotic theory of certain
% 'goodness-of-fit' criteria based on stochastic processes. Annals of
% Mathematical Statistics, 23:193-212.
% Kittlitz, Jr., R. G. (1999), Transforming the Exponential for SPC
% Applications. Journal of Quality Technology 31:301-308.
% Stephens, M. A. (1974), EDF Statistics for goodness of fit and some
% comparisons. Journal of the American Statistical Association,
% 69:730-737.
% Stephens, M. A. (1976), Asymptotic Results for goodness-of-fit statistics
% with unknown parameters. Annals of Statistics, 4:357-369.
% Stephens, M. A. (1977a), Goodness of fit for the extreme value distribution.
% Biometrika, 64:583-588.
% Stephens, M. A. (1977b), Goodness of fit with special reference to tests
% for exponentiality. Technical Report No. 262, Department of Statistics,
% Stanford University, Stanford, CA.
% Stephens, M. A. (1979), Tests of fit for the logistic distribution based
% on the empirical distribution function. Biometrika, 66:591-595.
% Stephens, M. A. (1986), Tests based on EDF statistics. In: D'Agostino,
% R.B. and Stephens, M.A., eds.: Goodness-of-Fit Techniques. Marcel
% Dekker, New York.
%
if nargin < 2,
alpha = 0.05;
end
if (alpha <= 0 | alpha >= 1)
fprintf('Warning: significance level must be between 0 and 1\n');
return;
end
if nargin < 1,
error('Requires at least one input argument.');
return;
end;
n = length(x);
if n < 7,
disp('Sample size must be greater than 7.');
return,
else
x = x(:);
x = sort(x);
p = mean(x); %Exponential parameter estimation
fx = 1 - exp(-x./p);
i = 1:n;
S = sum((((2*i)-1)/n)*(log(fx)+log(1-fx(n+1-i))));
AD2 = -n-S;
AD2a = AD2*(1 + 0.3/n); %correction factor for small sample sizes (adjusted)
%P-value (observed significance level probability)
a = 1.6193162; b = 2.5964836;
P = a*exp(-b*AD2a);
end
disp(' ')
fprintf('Sample size: %i\n', n);
fprintf('Anderson-Darling statistic: %3.4f\n', AD2);
fprintf('Anderson-Darling adjusted statistic: %3.4f\n', AD2a);
fprintf('Probability associated to the Anderson-Darling statistic = %3.4f\n', P);
fprintf('With a given significance = %3.3f\n', alpha);
if P >= alpha;
disp('The sampled population has an exponential distribution.');
fprintf('Thus, this sample have been drawn from an exponential population with parameter = %6.4f\n',p);
else
disp('The sampled population does not have an exponential distribution.');
end
return,