Perform chi-square Test

For the classical chi-square goodness-of-fit test, MuPAD® provides the stats::csGOFT function. This function enables you to test the data against an arbitrary function f. For example, you can define f by using any of the cumulative distribution functions, probability density functions, and discrete probability functions available in the MuPAD Statistics library. You also can define f by using your own distribution function. For example, create the data sequence x that contains a thousand random entries:

reset()
f := stats::normalRandom(0, 1/2):
x := f() $ k = 1..1000:

Suppose, you want to test whether the entries of that sequence are normally distributed with the mean equal to 0 and the variance equal to 1/2. The classical chi-square test uses the following three-step approach:

  1. Divide the line of real values into several intervals (also called bins or cells).

  2. Compute the number of data elements in each interval.

  3. Compare those numbers with the numbers expected for the specified distribution.

When you use the stats::csGOFT function, specify the cell boundaries as an argument. You must specify at least three cells. The recommended minimum number of cells for a sample of n data elements is . The recommended method for defining the cells is to use the stats::equiprobableCells function. This function creates equiprobable cells when the underlying distribution is continuous:

q := stats::normalQuantile(0, 1/2):
cells := stats::equiprobableCells(40, q):

Now, call the stats::csGOFT function to test the data sequence x. For example, compare x with the cumulative normal distribution function with the same mean and variance. The stats::csGOFT returns a large p-value for this test. Therefore, the null hypothesis (x is normally distributed with the mean equal to 0 and the variance equal to 1/2) passes this test. Besides the p-value, stats::csGOFT returns the observed value of the chi-square statistics and the minimum of the expected cell frequencies:

stats::csGOFT(x, cells, CDF = stats::normalCDF(0, 1/2))

The stats::csGOFT enables you to test the data against any distribution function. For example, testing the sequence x against the probability density function gives the same result:

stats::csGOFT(x, cells, PDF = stats::normalPDF(0, 1/2))

If you test the same data sequence x against the normal distribution function with different values of the mean and the variance, stats::csGOFT returns the p-value that is below the typical significance level 0.05. The null hypothesis does not pass the test:

stats::csGOFT(x, cells, CDF = stats::normalCDF(0, 1))

Was this topic helpful?