stats::swGOFT

The Shapiro-Wilk goodness-of-fit test for normality

Use only in the MuPAD Notebook Interface.

This functionality does not run in MATLAB.

Syntax

stats::swGOFT(x1, x2, …)
stats::swGOFT([x1, x2, …])
stats::swGOFT(s, <c>)

Description

stats::swGOFT([x1, x2, …]) applies the Shapiro-Wilk goodness-of-fit test for the null hypothesis: "the data x1, x2, … are normally distributed (with unknown mean and variance)". The sample size must not be larger than 5000 and not smaller than 3.

External statistical data stored in an ASCII file can be imported into a MuPAD® session via import::readdata. In particular, see Example 1 of the corresponding help page.

An error is raised by stats::swGOFT if any of the data cannot be converted to a real floating-point number or if the sample size is too large or too small.

Let y1, …, yn be the input data x1, …, xn arranged in ascending order. stats::swGOFT returns the list [PValue = p, StatValue = w] containing the following information:

  • w is the attained value of the Shapiro-Wilk statistic

    .

    Here, the ai are the Shapiro-Wilk coefficients, and S^2 is the statistical variance of the sample.

  • p is the observed significance level of the Shapiro-Wilk statistic W.

The observed significance level PValue = p returned by stats::swGOFT has to be interpreted in the following way: If p is smaller than a given significance level α<<1, the null hypothesis may be rejected at level α. If p is larger than α, the null hypothesis should not be rejected at level α.

Environment Interactions

The function is sensitive to the environment variable DIGITS which determines the numerical working precision.

Examples

Example 1

We test a list of random data that purport to be a sample of normally distributed numbers:

f := stats::normalRandom(0, 1, Seed = 123):
data := [f() $ i = 1..400]:
stats::swGOFT(data)

The observed significance level is not small. Consequently, one should not reject the null hypothesis that the data are normally distributed.

Next, we dote the data with some uniformly continuous deviates:

impuredata := data . [frandom() $ i = 1..101]:
stats::swGOFT(impuredata)

The doted data may be rejected as a sample of normal deviates at significance levels as small as .

delete f, data, impuredata:

Example 2

We create a sample consisting of one string column and two non-string columns:

s := stats::sample(
   [["1996", 1242, PI - 1/2],
    ["1997", 1353, PI + 0.3], 
    ["1998", 1142, PI + 0.5], 
    ["1999", 1201, PI - 1], 
    ["2001", 1201, PI]
   ])
"1996"  1242  PI - 1/2
"1997"  1353  PI + 0.3
"1998"  1142  PI + 0.5
"1999"  1201    PI - 1
"2001"  1201        PI

We check whether the data of the third column are normally distributed:

stats::swGOFT(s, 3)

The observed significance level returned by the test is not small: the test does not indicate that the data are not normally distributed.

delete s:

Parameters

x1, x2, …

The statistical data: real numerical values

s

A sample of domain type stats::sample

c

An integer representing a column index of the sample s. This column provides the data x1, x2 etc. There is no need to specify a column number c if the sample has only one column.

Return Values

List of two equations [PValue = p, StatValue = w] with floating-point values p and w. See the `Details' section below for the interpretation of these values.

Algorithms

The implemented algorithm for the computation of the Shapiro-Wilk coefficients, the Shapiro-Wilk statistic and the observed significance level is based on: Patrick Royston, "Algorithm AS R94", Applied Statistics, Vol.44, No.4 (1995).

Following Royston, the Shapiro-Wilk coefficients ai are computed by an approximation of

where M denotes the expected values of standard normal order statistic for a sample, V is the corresponding covariance matrix, and MT is the transpose of M.

See Also

MuPAD Functions

Was this topic helpful?