File Exchange

image thumbnail

AnDarksamtest

version 1.0.0.0 (7.02 KB) by Antonio Trujillo-Ortiz
Anderson-Darling k-sample procedure to test whether k sampled populations are identical.

10 Downloads

Updated 26 Dec 2007

View License

Anderson and Darling (1952, 1954) introduced a goodness-of-fit statistic to test the hypothesis that a random sample comes from a continuous population with a specified distribution function. It is a modification of the Kolmogorov-Smirnov (K-S) test and gives more weight to the tails than the K-S test.

The corresponding two-sample version was proposed by Darling (1957) and studied in detail by Pettitt (1976).

The Anderson-Darling k-sample test was introduced by Scholz and Stephens (1987) as a generalization of the two-sample Anderson-Darling test. It is a nonparametric statistical procedure, i.e., a rank test, and, thus, requires no assumptions other than that the samples are true independent random samples from their respective continuous populations (although provisions for tied observations are made). It tests the hypothesis that the populations from which two or more independent samples of data were drawn are identical. This test can be used to decide whether data from different sources may be combined, because they are judged to come from one common distribution, i.e., the null hypothesis Ho of same population distributions cannot be rejected. In its opposite use, it can be seen as a generalization of a one-way ANOVA for which the k-sample Kruskal-Wallis test (1952, 1953) is the most commonly used rank test.

It is an omnibus test because of its effectiveness against all alternatives to the null hypothesis Ho's (all k populations being equal). For example, it is effective for changes in scale while locations are matched, which is a weakness of the Kruskal-Wallis test.

The Anderson-Darling k-sample procedure assumes that i-th sample has a continuous distribution function and we are interested in testing the null hypothesis that all sampled populations have the same distribution without specifying the nature of that common distribution.

The observed k-sample Anderson-Darling statistic (ADK) is standardized using its exact sample mean and standard deviation to remove some of its dependence on the sample size. We note another mathematical expressions found in the literature, as MIL-HDBK-17-1E (1997).

The approximate P-value of the observed ADK statistic can be calculated using a spline interpolation method. For the interested users, we are also including, as a comment, the mathematical procedure to get the ADK critical value.

We give the Anderson-Darling k-sample procedure with and without adjustment for ties.

Finally, we compare the P-value with the desired significance level alpha to facilitate a decision about the null hypothesis Ho.

Syntax: function AnDarksamtest(X,alpha)

Inputs:
X - data matrix (Size of matrix must be n-by-2; data=column 1,
sample=column 2)
alpha - significance level (default = 0.05)

Output:
- Complete Anderson-Darling k-sample test

Cite As

Antonio Trujillo-Ortiz (2020). AnDarksamtest (https://www.mathworks.com/matlabcentral/fileexchange/17451-andarksamtest), MATLAB Central File Exchange. Retrieved .

Comments and Ratings (13)

Neel

Many Thanks! very useful function. I just added a output flag (reject/no-reject the null hypotesis) to the function in order to use it automatically within a loop. Also, a simple example about how to input two-sample data to the function would help.... unexperienced users like me.

Sia

Pankaj Dey

Dear Antonio
Thanks for sharing this important function. I have gone through some of the literature and found that very large samples give effective results in this test. So I am trying to figure out whether there is any possibility to have minimum sample size for each class to have satisfactory results.

Thank you in advance.

Pankaj Dey

caravansary

Thank you for your file. How about if I have two samples with different size?

Jon

I retract my silly question.

Jon

So this can only be used if your two samples have the same size?

Joanne

Joanne

Hi, as mentioned:

"Inputs:
X - data matrix (Size of matrix must be n-by-2; data=column 1, sample=column 2)"

For the sample that has to be provided in column 2, should it be of the pdf form or cdf form?

Tom Davidson

Excellent, thank you for posting this!

A couple requests: It would be great if the calculated values were returned in the output of the function (and if the text output could be suppressed).

Also, I find it convenient to pass in a cell array of samples, rather than the n x 2 array currently required. A quick patch to allow this is below:

Replace the lines:

X1 = X(:,1); %data vector
X2 = X(:,2); %grouping vector

With:

if iscell(X),
X1 = [];
X2 = [];
for k = 1:numel(X),
Xk = X{k};
X1 = vertcat(X1, Xk(:));
X2 = vertcat(X2, repmat(k, numel(Xk),1));
end
else
X1 = X(:,1); %data vector
X2 = X(:,2); %grouping vector
end

Michael Singer

This fills a major gap. Thanks!

Updates

1.0.0.0

Text was improved according to the Fritz Scholz and Michael Stephens'valuable suggestions.

Summary was improved.

It was added an appropriate format to cite this file.

MATLAB Release Compatibility
Created with R14
Compatible with any release
Platform Compatibility
Windows macOS Linux