The Shapiro-Wilk goodness-of-fit test for normality
This functionality does not run in MATLAB.
stats::swGOFT(x1, x2, …) stats::swGOFT([x1, x2, …]) stats::swGOFT(s, <c>)
stats::swGOFT([x1, x2, …]) applies the Shapiro-Wilk goodness-of-fit test for the null hypothesis: "the data x1, x2, … are normally distributed (with unknown mean and variance)". The sample size must not be larger than 5000 and not smaller than 3.
An error is raised by stats::swGOFT if any of the data cannot be converted to a real floating-point number or if the sample size is too large or too small.
Let y1, …, yn be the input data x1, …, xn arranged in ascending order. stats::swGOFT returns the list [PValue = p, StatValue = w] containing the following information:
w is the attained value of the Shapiro-Wilk statistic
Here, the ai are the Shapiro-Wilk coefficients, and S^2 is the statistical variance of the sample.
p is the observed significance level of the Shapiro-Wilk statistic W.
The observed significance level PValue = p returned by stats::swGOFT has to be interpreted in the following way: If p is smaller than a given significance level α<<1, the null hypothesis may be rejected at level α. If p is larger than α, the null hypothesis should not be rejected at level α.
The function is sensitive to the environment variable DIGITS which determines the numerical working precision.
We test a list of random data that purport to be a sample of normally distributed numbers:
f := stats::normalRandom(0, 1, Seed = 123): data := [f() $ i = 1..400]: stats::swGOFT(data)
The observed significance level is not small. Consequently, one should not reject the null hypothesis that the data are normally distributed.
Next, we dote the data with some uniformly continuous deviates:
impuredata := data . [frandom() $ i = 1..101]: stats::swGOFT(impuredata)
The doted data may be rejected as a sample of normal deviates at significance levels as small as .
delete f, data, impuredata:
We create a sample consisting of one string column and two non-string columns:
s := stats::sample( [["1996", 1242, PI - 1/2], ["1997", 1353, PI + 0.3], ["1998", 1142, PI + 0.5], ["1999", 1201, PI - 1], ["2001", 1201, PI] ])
"1996" 1242 PI - 1/2 "1997" 1353 PI + 0.3 "1998" 1142 PI + 0.5 "1999" 1201 PI - 1 "2001" 1201 PI
We check whether the data of the third column are normally distributed:
The observed significance level returned by the test is not small: the test does not indicate that the data are not normally distributed.
x1, x2, …
The statistical data: real numerical values
A sample of domain type stats::sample
An integer representing a column index of the sample s. This column provides the data x1, x2 etc. There is no need to specify a column number c if the sample has only one column.
List of two equations [PValue = p, StatValue = w] with floating-point values p and w. See the `Details' section below for the interpretation of these values.
The implemented algorithm for the computation of the Shapiro-Wilk coefficients, the Shapiro-Wilk statistic and the observed significance level is based on: Patrick Royston, "Algorithm AS R94", Applied Statistics, Vol.44, No.4 (1995).
Following Royston, the Shapiro-Wilk coefficients ai are computed by an approximation of
where M denotes the expected values of standard normal order statistic for a sample, V is the corresponding covariance matrix, and MT is the transpose of M.