Kolmogorov Smirnov test help?

I have the test data below, the kstest(x) function compares the distribution of the data below against a standard normal distribution (mean of 0 and std of 1). Is it better to simply call the function as kstest(x) or correct the data so that its standard deviation and mean is 1 and 0 respectively?
Also when doing so, do you guys get probability as 0.1267 for uncorrected and 0.6506 for corrected?
It's just that I got significantly different values earlier.
Another question is that are the probabilities realistic? When plotting the values on excel the graphs are more or less normally distributed, however they don't pass the significance level of 5%.
Thanks
1.481336
-0.15023
2.253639
-3.44891
-2.06993
-0.54504
3.077467
-0.49623
-0.23977
0.098674
0.237035
-5.38399
1.753639
-1.65023
0.644677
1.407635
0.077467
-0.66607
1.981336
2.644677
-0.12763
4.035716
-1.18049
-1.04504
0.614422
1.345996
1.224973
-3.49454
-4.23659
0.223383
0.907635
0.724973

Answers (1)

Adam Danz
Adam Danz on 20 Jan 2020
Edited: Adam Danz on 20 Jan 2020
"Is it better to simply call the function as kstest(x) or correct the data so that its standard deviation and mean is 1 and 0 respectively"
The one-sample Kolmogorov-Smirnov test tests the null hypothesis that the data comes from a standard normal distribution (mean 0, std 1). If you correct your data so that it does have a mean of 0 and std of 1, what's the point of testing it?
If you want a more general test that your data come from a normal distribution with any mean or std, use the Anderson-Darling test or the Lilliefors test.
Null hypotheses (from the documentation)
One-sample Kolmogorov-Smirnov test: the data in vector x comes from a standard normal distribution (mean 0, std 1).
Lilliefors test: the data in vector x comes from a distribution in the normal family.
Anderson-Darling test: the data in vector x is from a population with a normal distribution.
If the null hypothesis is rejected (an outcome of 1 for all three tests), the data do not come from those distributions at a 5% significance level.
Note that if there is a failure to reject the null hypothesis (an outcome of 0 for all three tests), that does not indicate that the data do come from those distributions. This is a common misunderstanding of interpretting hypothesis testing.
Here's a domonstration showing the difference between the kstest and the two other ones.
% Create a data from a normal distributions with
% mean 0 and std 1.
x0 = randn(1,10000);
% Use that same exact data to create a normal distribution
% with mean 5 and std 2
x1 = x0*2 + 5;
% Plot both distributions
clf()
histogram(x0)
hold on
histogram(x1)
Notice how this creates two normal distribtions. The blue distribtuion has a mean of 0 and std of 1 while the reddish distribution has a mean of 5 and std of 2 (approximately).
% Look at the results of the ks-tests
ks0 = kstest(x0) % fail to reject
ks1 = kstest(x1) % reject null hyp
% Look at the results of the Lilliefors test
lt0 = lillietest(x0) % fail to reject
lt1 = lillietest(x1) % fail to reject
% Look at the results from the Anderson-Darling test
ad0 = adtest(x0) % fail to reject
ad1 = adtest(x1) % fail to reject
As you can see, the blue distribution is identified as a standard normal distribution and rightfully so since it has a mean of 0 and std of 1 (approximately) while the other distribution does not. However, both distributions are normal as indicated by both the lillietest() and adtest().

6 Comments

Hi, thank you for your answer. Maybe I misunderstood what I meant then.
For the tests that you have shown, they seem to be a better anaylsis tool for my data.
However, you've commented that the tests reject the hypothesis, when the return value is 0 indicating it does not reject the hypothesis?
Sorry I'm a bit confused then, they seem to be a good fit of normal distribution and hence they should reject the null hypothesis as there is clear evidence that the two data set are related, however it returns a value of 0 indicating that it fails to reject.
Adam Danz
Adam Danz on 20 Jan 2020
Edited: Adam Danz on 20 Jan 2020
In case you haven't seen the update in my answer, I copy-pasted the wrong comments next to the normality tests. They are corrected now.
When any of the tests output a value of 1 (true) that means you are rejecting the null hypothesis (that the values come from those distributions).
Since the data in my demo are indeed good fits to normal distributions, they fail to reject the null hypothesis except for the kstest() on the reddish distribution which clearly isn't a standard normal distribution.
Hmmm, thank you for the correction.
Your clarification has contradicted my understanding of the null hypothesis, so I hope you can help me clear it up.
The definition of the null hypothesis: A null hypothesis is a type of hypothesis used in statistics that proposes that no statistical significance exists in a set of given observations.
Therefore, I would believe that between a normally distributed data and a crude looking one like the examples you gave. The tests would come back true, because there is relationship between the two. i.e it rejects the null hypothesis
Failing to reject would mean otherwise.
Where have I gone wrong?
Adam Danz
Adam Danz on 21 Jan 2020
Edited: Adam Danz on 21 Jan 2020
That definition from Investopedia isn't precise enough. Even wiki's definition isn't precise enough: the null hypothesis is a general statement or default position that there is nothing significantly different happening.
To be more precise, the null hypothesis is typically a test statement that there is no difference in what you're testing at some arbitrary significance level.
The key differences between this definition and other others are
  1. The null hypothesis is not a general statement. On the contrary. It's specific to the test you're testing.
  2. It doesn't test that "nothing is significant" or "no significance exists". Instead, it only tests whatever specific property that the test is designed to test and it bases the significance on an arbitrary alpha value (ie, p=0.05).
These very common misconceptions are why 100s of scientists, statisticians, and researchers have supported a movement to stop significance testing.
Critically, the null hpothesis is whatever the test deems it to be. Look at the matlab documentation for the kstest:
Adam Danz, thanks for the clarification on the null hypothesis and normality tests, esp. in Matlab.
The question now is then what? Suppose I have small samples e.g 10 observatiuons and I have a situation where kstest() rejects that they are normally distributed, but the other two tests lillietest() and adtest() do not reject. Is the data then normally distributed and can be analyzed further with ANOVA etc. which require normality as a prerequisite?
Sounds like the data could come from a normal distribution that isn't a standard normal distribution. Normal distributions are described by a mean and standard deviation (SD). A standard normal distribution is a subject of normal distributions where the mean is 0 and SD is 1.
10 observations aren't much data. When you plot the distributions (using histogram, for example), do the test results make sense?

Sign in to comment.

Asked:

on 20 Jan 2020

Commented:

on 3 May 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!