KSTEST2 P-Value Calculation, how does Matlab do it?

9 views (last 30 days)
Hi All. I am currently using the KSTEST2 function in Matlab 2014B to compare two datasets. I understand finding the D statistic ( largest vertical difference between the Empirical CDFs), and also how to reject using the D_Alpha method (D_Alpha = 1.36*sqrt(n1+n2/n1*n2) at 5% sig, and reject when D is less than D_Alpha). But I do not understand how the p_value is calculated for either 1 or 2 sided tests. From the matlab function I have found that:
% Compute the asymptotic P-value approximation and accept or
% reject the null hypothesis on the basis of the P-value.
%
n1 = length(x1);
n2 = length(x2);
n = n1 * n2 /(n1 + n2);
lambda = max((sqrt(n) + 0.12 + 0.11/sqrt(n)) * KSstatistic , 0);
if tail ~= 0 % 1-sided test.
pValue = exp(-2 * lambda * lambda);
else % 2-sided test (default).
%
% Use the asymptotic Q-function to approximate the 2-sided P-value.
%
j = (1:101)';
pValue = 2 * sum((-1).^(j-1).*exp(-2*lambda*lambda*j.^2));
pValue = min(max(pValue, 0), 1);
end
H = (alpha >= pValue);
From this I figured first Matlab is taking the size of the two datasets, n1,n2 then calculating n (where is this equation from?). Then lambda is calculated, and has to be >=0. 1 tail is straight forward equation, where does it come from? 2 tailed is more complex, why is there j = (1:101) when the equation for the p-value only gives one answer when j = 1 (well at least in the examples I tried). Also where does this equation come from?
Basically which paper did Matlab base their script on? Where did these equations come from as I have looked at a lot of literature on this and have not found a similar equation.

Accepted Answer

Steve Clarke
Steve Clarke on 17 Jun 2015
Edited: Steve Clarke on 17 Jun 2015
After a lot of searching I found some good resources on the subject. 'Numerical Recipes - The Art of Scientific Computing' by Press et all, Page 736-740, 2007 which references (but is harder to understand) Use of the Kolmogorov-Smirnov, Cramer-Von Mises and Related Statistics Without Extensive Tables by Stephens, Page 115-122 from the Journal of the Royal Statistical Society. Series B (Methodological), 1970.
Just thought I would put this here incase anyone else wants to understand the KS Test better. Also check out COMPARING DISTRIBUTIONS: THE TWO-SAMPLE ANDERSON-DARLING TEST AS AN ALTERNATIVE TO THE KOLMOGOROV-SMIRNOFF TEST for more info on the limitations of the test.

More Answers (0)

Products

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!