kstest - One-sample Kolmogorov-Smirnov test

Syntax

h = kstest(x)
h = kstest(x,CDF)
h = kstest(x,CDF,alpha)
h = kstest(x,CDF,alpha,tail)
[h,p,ksstat,cv] = kstest(...)

Description

h = kstest(x) performs a Kolmogorov-Smirnov test to compare the values in the data vector x to a standard normal distribution. The null hypothesis is that x has a standard normal distribution. The alternative hypothesis is that x does not have that distribution. The result h is 1 if the test rejects the null hypothesis at the 5% significance level, 0 otherwise.

The test statistic is:

where is the empirical cdf and is the standard normal cdf.

h = kstest(x,CDF) compares the distribution of x to the hypothesized continuous distribution defined by the two-column matrix CDF. Column 1 contains a set of possible x values, and column 2 contains the corresponding hypothesized cumulative distribution function values . If possible, define CDF so that column 1 contains the values in x. If there are values in x not found in column 1 of CDF, kstest approximates by interpolation. All values in x must lie in the interval between the smallest and largest values in the first column of CDF. If the second argument is empty ([]), kstest uses the standard normal distribution.

The Kolmogorov-Smirnov test requires that CDF be predetermined. It is not accurate if CDF is estimated from the data. To test x against a normal distribution without specifying the parameters, use lillietest instead.

h = kstest(x,CDF,alpha) specifies the significance level alpha for the test. The default is 0.05.

h = kstest(x,CDF,alpha,tail) specifies the type of test using one of the following values for the string tail:

[h,p,ksstat,cv] = kstest(...) also returns the p-value p, the test statistic ksstat, and the cutoff value cv for determining if ksstat is significant.

Example

Generate evenly spaced numbers and perform a Kolmogorov-Smirnov test to see if they come from a standard normal distribution:

x = -2:1:4
x =
  -2  -1   0   1   2   3   4

[h,p,k,c] = kstest(x,[],0.05,0)
h =
   0
p =
   0.13632
k =
   0.41277
c =
   0.48342

The test fails to reject the null hypothesis that the values come from a standard normal distribution. This illustrates the difficulty of testing normality in small samples. (The Lilliefors test, implemented by the Statistics Toolbox function lillietest, may be more appropriate.)

The following figure illustrates the test statistic:

xx = -3:.1:5;
F = cdfplot(x);
hold on
G = plot(xx,normcdf(xx),'r-');
set(F,'LineWidth',2)
set(G,'LineWidth',2)
legend([F G],...
       'Empirical','Standard Normal',...
       'Location','NW')

The test statistic k is the maximum difference between the curves.

Setting tail to 'smaller' tests the alternative that the population cdf is smaller than the normal cdf:

[h,p,ksstat] = kstest(x,[],0.05,'smaller')
h =
   0
p =
   0.068181
k =
   0.41277

The test statistic is the same as before, but the p-value is smaller.

Setting tail to 'larger' changes the test statistic:

[h,p,k] = kstest(x,[],0.05,'larger')
h =
   0
p =
   0.77533
k =
   0.12706

References

[1] Massey, F. J. "The Kolmogorov-Smirnov Test for Goodness of Fit." Journal of the American Statistical Association. Vol. 46, No. 253, 1951, pp. 68–78.

[2] Miller, L. H. "Table of Percentage Points of Kolmogorov Statistics." Journal of the American Statistical Association. Vol. 51, No. 273, 1956, pp. 111–121.

[3] Marsaglia, G., W. Tsang, and J. Wang. "Evaluating Kolmogorov's Distribution." Journal of Statistical Software. Vol. 8, Issue 18, 2003.

See Also

kstest2, lillietest

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS