55 views (last 30 days)

I have the test data below, the kstest(x) function compares the distribution of the data below against a standard normal distribution (mean of 0 and std of 1). Is it better to simply call the function as kstest(x) or correct the data so that its standard deviation and mean is 1 and 0 respectively?

Also when doing so, do you guys get probability as 0.1267 for uncorrected and 0.6506 for corrected?

It's just that I got significantly different values earlier.

Another question is that are the probabilities realistic? When plotting the values on excel the graphs are more or less normally distributed, however they don't pass the significance level of 5%.

Thanks

1.481336

-0.15023

2.253639

-3.44891

-2.06993

-0.54504

3.077467

-0.49623

-0.23977

0.098674

0.237035

-5.38399

1.753639

-1.65023

0.644677

1.407635

0.077467

-0.66607

1.981336

2.644677

-0.12763

4.035716

-1.18049

-1.04504

0.614422

1.345996

1.224973

-3.49454

-4.23659

0.223383

0.907635

0.724973

Adam Danz
on 20 Jan 2020

Edited: Adam Danz
on 20 Jan 2020

"Is it better to simply call the function as kstest(x) or correct the data so that its standard deviation and mean is 1 and 0 respectively"

The one-sample Kolmogorov-Smirnov test tests the null hypothesis that the data comes from a standard normal distribution (mean 0, std 1). If you correct your data so that it does have a mean of 0 and std of 1, what's the point of testing it?

If you want a more general test that your data come from a normal distribution with any mean or std, use the Anderson-Darling test or the Lilliefors test.

Null hypotheses (from the documentation)

One-sample Kolmogorov-Smirnov test: the data in vector x comes from a standard normal distribution (mean 0, std 1).

Lilliefors test: the data in vector x comes from a distribution in the normal family.

Anderson-Darling test: the data in vector x is from a population with a normal distribution.

If the null hypothesis is rejected (an outcome of 1 for all three tests), the data do not come from those distributions at a 5% significance level.

Note that if there is a failure to reject the null hypothesis (an outcome of 0 for all three tests), that does not indicate that the data do come from those distributions. This is a common misunderstanding of interpretting hypothesis testing.

Here's a domonstration showing the difference between the kstest and the two other ones.

% Create a data from a normal distributions with

% mean 0 and std 1.

x0 = randn(1,10000);

% Use that same exact data to create a normal distribution

% with mean 5 and std 2

x1 = x0*2 + 5;

% Plot both distributions

clf()

histogram(x0)

hold on

histogram(x1)

Notice how this creates two normal distribtions. The blue distribtuion has a mean of 0 and std of 1 while the reddish distribution has a mean of 5 and std of 2 (approximately).

% Look at the results of the ks-tests

ks0 = kstest(x0) % fail to reject

ks1 = kstest(x1) % reject null hyp

% Look at the results of the Lilliefors test

lt0 = lillietest(x0) % fail to reject

lt1 = lillietest(x1) % fail to reject

% Look at the results from the Anderson-Darling test

ad0 = adtest(x0) % fail to reject

ad1 = adtest(x1) % fail to reject

As you can see, the blue distribution is identified as a standard normal distribution and rightfully so since it has a mean of 0 and std of 1 (approximately) while the other distribution does not. However, both distributions are normal as indicated by both the lillietest() and adtest().

Adam Danz
on 20 Jan 2020

In case you haven't seen the update in my answer, I copy-pasted the wrong comments next to the normality tests. They are corrected now.

When any of the tests output a value of 1 (true) that means you are rejecting the null hypothesis (that the values come from those distributions).

Since the data in my demo are indeed good fits to normal distributions, they fail to reject the null hypothesis except for the kstest() on the reddish distribution which clearly isn't a standard normal distribution.

Adam Danz
on 21 Jan 2020

That definition from Investopedia isn't precise enough. Even wiki's definition isn't precise enough: the null hypothesis is a general statement or default position that there is nothing significantly different happening.

To be more precise, the null hypothesis is typically a test statement that there is no difference in what you're testing at some arbitrary significance level.

The key differences between this definition and other others are

- The null hypothesis is not a general statement. On the contrary. It's specific to the test you're testing.
- It doesn't test that "nothing is significant" or "no significance exists". Instead, it only tests whatever specific property that the test is designed to test and it bases the significance on an arbitrary alpha value (ie, p=0.05).

These very common misconceptions are why 100s of scientists, statisticians, and researchers have supported a movement to stop significance testing.

Critically, the null hpothesis is whatever the test deems it to be. Look at the matlab documentation for the kstest:

Sign in to comment.

Sign in to answer this question.

Opportunities for recent engineering grads.

Apply Today
## 0 Comments

Sign in to comment.