Shapiro-Wilk parametric hypothesis test of composite normality, for sample size 3<= n <= 5000. Based on Royston R94 algorithm.
This test also performs the Shapiro-Francia normality test for platykurtic samples.
Ahmed BenSaïda (2021). Shapiro-Wilk and Shapiro-Francia normality tests. (https://www.mathworks.com/matlabcentral/fileexchange/13964-shapiro-wilk-and-shapiro-francia-normality-tests), MATLAB Central File Exchange. Retrieved .
Thanks for the submission! I changed as proposed by Philipp Dehnen.
So I've been using this script (thanks Ahmed) for a while now, testing for normality with small sample sizes (n < 10). Works great, but it seems to report a false positive rate of .06 when the target I use to test is .05. I am doing large scale randomizations, and so it's pretty consistent whether I use 4 <= n <= 10 or 25 <= n <= 35. When I forced the SW test, it gave me the correct FP rate of .05. I fear that there is some sort of weird issue with the SF test.
Lars reports a similar issue (Mar 2008). I tried to see if it was the kurtosis calculation (see Luis Dec 2010) but it didn't fix the issue.
So I recommend forcing SW and avoiding the slightly higher FP rate. Just my opinion.
Mindaugas: to add the option to force always wilk intead of francia you can easily change the following:
function [H, pValue, W] = swtest(x, alpha, wilk)
if kurtosis(x) > 3 && ~wilk
Now you should be able to force wilk test by set wilk in the input argument to true.
Wilk must be logical, if you want you can add further error checking lines on this.
It would be nice to implement testing normality not only for vectors, but also for 2D (or even for arrays with more dimensions), like 'ttest' with 'Dim' option.
Thanks, but you must add option to force using Shapiro-Wilk always
Wanna try this code.
Thanks in advance~~
Thanks for this implementation. I use it a lot in my current project.
I found a problem for samples which are uniform distributed, e.g.:
Error using erfc
Input must be real and full.
I traced this back to line 211
W = (weights' * x) ^2 / ((x - mean(x))' * (x - mean(x)));
W can become NaN of Inf if the sample is uniform. I'm not sure how to treat this correctly. Just check if W is NaN/Inf and if so reject the null hypothesis?
To answer all questions:
- the kurtosis (line 119) does not modify the power of the test, it's barely used to help choosing between Shapiro-Wilk and Shapiro-Francia method. Moreover, it's better to use the sample kurtosis 'kurtosis(x)'.
- When posing x=norminv((1:9)/10)), x here is not normally distributed, it represents the inverse of the CDF which is not normal by definition. So if you want to test its power you can compute x=normrnd(mu, sigma, n, 1), where you can choose the size of your sample (n) and perform the test.
Can you explain the following?
x is normally distributed.
If I perform a 2 tailed test, your function rejects the null hypothesis.
x = norminv((1:9)/10);
Sorry, what is the usefulness of tail option? When to use 1,0 or -1 value?
Thank you for help
i am unable to test the code with my data, and i don't know why.
I tested this code but I have a doubt. In line 119, the kurtosis computation seems to be for a population (kurtosis(x)) and not for a sample (kurtosis(x,0)). So, in line 119 shouldn't it be "if kurtosis(x,0)> 3" (flag=0)?
I tested this code for sample sizes as small as 4, and as large as 4096. at all levels tested, the p-values were fairly uniform. Thus it appears this test maintains the nominal rejection rate very well. (the Anderson-Darling test has similarly good performance for small sample sizes.)
Has anyone Tried to Validate this other than LARS?
thanks for implementing the code for Shapiro-Wilk and Shapiro-Francia normality tests.
good comments,easy to use and reference of the algorythm is present.
I have the impression, that this implementation is more liberal than it should be according to literature. On 1000 runs at 0.1-level with various sample sizes I calculated an average empirical alpha of 0.11.
Sorry, the file seems ok. My problem was that I had an out-of-date copy of the distchck.m file on my computer.
I just tried it on some test data (n=16) and it crashed because the value of the variable newSWstatistic was imaginary.
Find the treasures in MATLAB Central and discover how the community can help you!Start Hunting!