Shapiro-Wilk parametric hypothesis test of composite normality, for sample size 3<= n <= 5000. Based on Royston R94 algorithm.
This test also performs the Shapiro-Francia normality test for platykurtic samples.
Mindaugas: to add the option to force always wilk intead of francia you can easily change the following:
function [H, pValue, W] = swtest(x, alpha, wilk)
if kurtosis(x) > 3 && ~wilk
Now you should be able to force wilk test by set wilk in the input argument to true.
Wilk must be logical, if you want you can add further error checking lines on this.
It would be nice to implement testing normality not only for vectors, but also for 2D (or even for arrays with more dimensions), like 'ttest' with 'Dim' option.
Thanks, but you must add option to force using Shapiro-Wilk always
Wanna try this code.
Thanks in advance~~
Thanks for this implementation. I use it a lot in my current project.
I found a problem for samples which are uniform distributed, e.g.:
Error using erfc
Input must be real and full.
I traced this back to line 211
W = (weights' * x) ^2 / ((x - mean(x))' * (x - mean(x)));
W can become NaN of Inf if the sample is uniform. I'm not sure how to treat this correctly. Just check if W is NaN/Inf and if so reject the null hypothesis?
To answer all questions:
- the kurtosis (line 119) does not modify the power of the test, it's barely used to help choosing between Shapiro-Wilk and Shapiro-Francia method. Moreover, it's better to use the sample kurtosis 'kurtosis(x)'.
- When posing x=norminv((1:9)/10)), x here is not normally distributed, it represents the inverse of the CDF which is not normal by definition. So if you want to test its power you can compute x=normrnd(mu, sigma, n, 1), where you can choose the size of your sample (n) and perform the test.
Can you explain the following?
x is normally distributed.
If I perform a 2 tailed test, your function rejects the null hypothesis.
x = norminv((1:9)/10);
Sorry, what is the usefulness of tail option? When to use 1,0 or -1 value?
Thank you for help
i am unable to test the code with my data, and i don't know why.
I tested this code but I have a doubt. In line 119, the kurtosis computation seems to be for a population (kurtosis(x)) and not for a sample (kurtosis(x,0)). So, in line 119 shouldn't it be "if kurtosis(x,0)> 3" (flag=0)?
I tested this code for sample sizes as small as 4, and as large as 4096. at all levels tested, the p-values were fairly uniform. Thus it appears this test maintains the nominal rejection rate very well. (the Anderson-Darling test has similarly good performance for small sample sizes.)
Has anyone Tried to Validate this other than LARS?
thanks for implementing the code for Shapiro-Wilk and Shapiro-Francia normality tests.
good comments,easy to use and reference of the algorythm is present.
I have the impression, that this implementation is more liberal than it should be according to literature. On 1000 runs at 0.1-level with various sample sizes I calculated an average empirical alpha of 0.11.
Sorry, the file seems ok. My problem was that I had an out-of-date copy of the distchck.m file on my computer.
I just tried it on some test data (n=16) and it crashed because the value of the variable newSWstatistic was imaginary.
- Improved precision for sample size = 3;
Change the value in line 136 to 0.26758 instead of 0.026758 (Shapiro-Francia) to correct the significance level. Thanks to Kent Parsons for his remarks.