Kolmogorov Smirnov test help?

Question

0 votes

I have the test data below, the kstest(x) function compares the distribution of the data below against a standard normal distribution (mean of 0 and std of 1). Is it better to simply call the function as kstest(x) or correct the data so that its standard deviation and mean is 1 and 0 respectively?

Also when doing so, do you guys get probability as 0.1267 for uncorrected and 0.6506 for corrected?

It's just that I got significantly different values earlier.

Another question is that are the probabilities realistic? When plotting the values on excel the graphs are more or less normally distributed, however they don't pass the significance level of 5%.

Thanks

1.481336

-0.15023

2.253639

-3.44891

-2.06993

-0.54504

3.077467

-0.49623

-0.23977

0.098674

0.237035

-5.38399

1.753639

-1.65023

0.644677

1.407635

0.077467

-0.66607

1.981336

2.644677

-0.12763

4.035716

-1.18049

-1.04504

0.614422

1.345996

1.224973

-3.49454

-4.23659

0.223383

0.907635

0.724973

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Adam Danz on 20 Jan 2020

Edited: Adam Danz on 20 Jan 2020

Open in MATLAB Online

1 vote

"Is it better to simply call the function as kstest(x) or correct the data so that its standard deviation and mean is 1 and 0 respectively"

The one-sample Kolmogorov-Smirnov test tests the null hypothesis that the data comes from a standard normal distribution (mean 0, std 1). If you correct your data so that it does have a mean of 0 and std of 1, what's the point of testing it?

If you want a more general test that your data come from a normal distribution with any mean or std, use the Anderson-Darling test or the Lilliefors test.

Null hypotheses (from the documentation)

One-sample Kolmogorov-Smirnov test: the data in vector x comes from a standard normal distribution (mean 0, std 1).

Lilliefors test: the data in vector x comes from a distribution in the normal family.

Anderson-Darling test: the data in vector x is from a population with a normal distribution.

If the null hypothesis is rejected (an outcome of 1 for all three tests), the data do not come from those distributions at a 5% significance level.

Note that if there is a failure to reject the null hypothesis (an outcome of 0 for all three tests), that does not indicate that the data do come from those distributions. This is a common misunderstanding of interpretting hypothesis testing.

Here's a domonstration showing the difference between the kstest and the two other ones.

% Create a data from a normal distributions with
% mean 0 and std 1.
x0 = randn(1,10000); 
% Use that same exact data to create a normal distribution
% with mean 5 and std 2
x1 = x0*2 + 5; 
% Plot both distributions
clf()
histogram(x0)
hold on
histogram(x1)

Notice how this creates two normal distribtions. The blue distribtuion has a mean of 0 and std of 1 while the reddish distribution has a mean of 5 and std of 2 (approximately).

% Look at the results of the ks-tests
ks0 = kstest(x0)  % fail to reject 
ks1 = kstest(x1)  % reject null hyp 
% Look at the results of the Lilliefors test
lt0 = lillietest(x0) % fail to reject
lt1 = lillietest(x1) % fail to reject
% Look at the results from the Anderson-Darling test
ad0 = adtest(x0) % fail to reject
ad1 = adtest(x1) % fail to reject

As you can see, the blue distribution is identified as a standard normal distribution and rightfully so since it has a mean of 0 and std of 1 (approximately) while the other distribution does not. However, both distributions are normal as indicated by both the lillietest() and adtest().

6 Comments
Show 4 older comments Hide 4 older comments

Adam Danz on 21 Jan 2020

Edited: Adam Danz on 21 Jan 2020

That definition from Investopedia isn't precise enough. Even wiki's definition isn't precise enough: the null hypothesis is a general statement or default position that there is nothing significantly different happening.

To be more precise, the null hypothesis is typically a test statement that there is no difference in what you're testing at some arbitrary significance level.

The key differences between this definition and other others are

The null hypothesis is not a general statement. On the contrary. It's specific to the test you're testing.
It doesn't test that "nothing is significant" or "no significance exists". Instead, it only tests whatever specific property that the test is designed to test and it bases the significance on an arbitrary alpha value (ie, p=0.05).

These very common misconceptions are why 100s of scientists, statisticians, and researchers have supported a movement to stop significance testing.

https://www.nature.com/articles/d41586-019-00857-9

Critically, the null hpothesis is whatever the test deems it to be. Look at the matlab documentation for the kstest:

John TS on 3 May 2020

Adam Danz, thanks for the clarification on the null hypothesis and normality tests, esp. in Matlab.

The question now is then what? Suppose I have small samples e.g 10 observatiuons and I have a situation where kstest() rejects that they are normally distributed, but the other two tests lillietest() and adtest() do not reject. Is the data then normally distributed and can be analyzed further with ANOVA etc. which require normality as a prerequisite?

Adam Danz on 3 May 2020

Sounds like the data could come from a normal distribution that isn't a standard normal distribution. Normal distributions are described by a mean and standard deviation (SD). A standard normal distribution is a subject of normal distributions where the mean is 0 and SD is 1.

10 observations aren't much data. When you plot the distributions (using histogram, for example), do the test results make sense?

Sign in to comment.

Kolmogorov Smirnov test help?

0 Comments
Show -2 older comments Hide -2 older comments

Answers (1)

6 Comments
Show 4 older comments Hide 4 older comments

Categories

Tags

Community Treasure Hunt

Kolmogorov Smirnov test help?

0 Comments Show -2 older comments Hide -2 older comments

Answers (1)

6 Comments Show 4 older comments Hide 4 older comments

Categories

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

6 Comments
Show 4 older comments Hide 4 older comments