Divide the real line into equiprobable intervals
This functionality does not run in MATLAB.
stats::equiprobableCells(k
, q
, <NoWarning>)
stats::equiprobableCells
is a utility function
for the classical chisquare test implemented by stats::csGOFT
. The call stats::equiprobableCells(k,
q)
creates a list of intervals ("cells") that
are equiprobable with respect to the statistical distribution corresponding
to the quantile function q
.
The chisquare goodnessoffit
test needs a cell partitioning of the real line to compare
the empirical frequencies of data falling into the cells with the
expected frequencies corresponding to a hypothesized statistical distribution.
It is recommended to use equiprobable cells in this test. stats::equiprobableCells
is
a utility function to compute such a partitioning.
The cell boundaries b_{i} of the returned cell partitioning [[b_{0}, b_{1}], …, [b_{k  1}, b_{k}]] are computed via . Mathematically, each cell [b_{i  1}, b_{i}] corresponds to a semiopen interval .
If q
is the quantile function of a continuous statistical
distribution, all cells have the same cell probability
.
The function q
can be a quantile procedure
provided by the MuPAD^{®} stats
library.
Quantile functions not provided by the stats
package
can be implemented easily by the user. A user defined quantile procedure q can
correspond to any statistical distribution. Quantile functions must
accept one numerical floatingpoint parameter x satisfying 0.0
≤ x ≤ 1.0. The
call q(x)
must produce a real value. In particular,
the return values q(0.0)
= 
infinity
and q(1.0)
= infinity
are allowed.
Quantile functions must be monotonically increasing. stats::equiprobableCells
issues
warnings if the computed quantile values
are
not real or
,
or if these values do not increase monotonically.
stats::equiprobableCells
also accepts quantile
functions of discrete distributions such as stats::empiricalQuantile(data)
or stats::binomialQuantile(n,
p)
.
Note:
Note, however, that in general, there are no equiprobable cell
partitionings for discrete distributions. Consequently, equiprobability
of the cells returned by 
In particular, it may happen for large k, that coincides with , i.e., the corresponding cell is empty. This will always happen, when k exceeds the number of possible discrete values the random variable can attain.
In such a case, a warning is issued. Passing such a cell partitioning
to stats::csGOFT
raises
an error.
Further to the examples on this help page, see also the examples
on the help page of stats::csGOFT
.
The function is sensitive to the environment variable DIGITS
which
determines the numerical working precision.
We divide the real line into 4 intervals that are equiprobable with respect to the standard normal distribution:
k:= 4: q := stats::normalQuantile(0, 1): cells := stats::equiprobableCells(k, q)
We check equiprobability by applying the function stats::normalCDF(0,
1)
to the cell boundaries:
cdf := stats::normalCDF(0, 1): p := map(cells, map, cdf)
The cell probabilities are given by the differences of the CDF function applied to the cell boundaries:
(p[i][2]  p[i][1]) $ i = 1..k
We use these cells for a chisquare test for normality of some random data:
r := stats::normalRandom(0, 1, Seed = 0): data := [r() $ i = 1..1000]: stats::csGOFT(data, cells, CDF = cdf)
With the observed significance level , the data pass this test well. We experiment with other equiprobable cell partitionings:
for k in [20, 30, 40, 50] do cells := stats::equiprobableCells(k, q); print(stats::csGOFT(data, cells, CDF = cdf)); end_for:
delete k, cells, p, cdf, r, data:
We create a sample of 1000 random integers between 0 and 100:
SEED := 10^2: r := random(0 .. 100): data := [r() $ i = 1..1000]:
We construct an `equiprobable' cell partitioning of 10 cells using the (discrete) empirical distribution of the data. I.e., each of the following cells should contain approximately the same number of data from the random sample:
k := 10: quantile := stats::empiricalQuantile(data): cells := stats::equiprobableCells(k, quantile)
For discrete distributions, `equiprobability' can only be achieved approximately. We compute the cell probabilities with respect to the empirical cumulative distribution function (CDF), by subtracting the CDF value of the left boundary from the CDF value of the right boundary:
cdf := stats::empiricalCDF(data): map(cells, cell > cdf(cell[2])  cdf(cell[1]))
The actual empirical frequency of the data in each cell is the cell probability times the sample size (1000):
map(cells, cell > 1000*(cdf(cell[2])  cdf(cell[1])))
When computing the probability of the cell [b[i1],
b[i]]
via cdf(b_{i})
 cdf(b_{i 
1}), the cell is regarded as the semiopen
interval
mathematically.
For this reason, the data points 0
contained in
the sample are not counted, and the cell frequencies do not quite
add up to the sample size:
_plus(op(%))
For the Symbol::chi^2 test,
this does not matter because it replaces the left boundary of the
first cell by 
infinity
, anyway. With an observed significance
level of
,
the data pass the test for a uniform distribution at levels as high
as
:
stats::csGOFT(data, cells, CDF = stats::uniformCDF(0, 100))
We test whether the data fit a normal distribution with the empirical mean and variance:
[m, v] := [stats::mean(data), stats::variance(data)]; stats::csGOFT(data, cells, CDF = stats::normalCDF(m, v))
With the observed significance level , the hypothesis of a normal distribution clearly has to be rejected.
delete r, data, k, quantile, cells, cdf, m, v:
We consider a binomial distribution with `trial parameter' n = 100 and `probability parameter' . It is the distribution of the number of successes in n = 100 independent Bernoulli experiments, each with success probability . This random variable can attain the discrete values 0, 1, …, 100. We create a cell partitioning of 4 cells:
n := 100: p := 1/2: quantile := stats::binomialQuantile(n, p): cells := stats::equiprobableCells(4, quantile)
Because of discreteness, an exact equiprobable cell partitioning does not exist. We compute the expected cell frequencies in the same way as in the previous example:
cdf := stats::binomialCDF(n, p): map(cells, cell > n*(cdf(cell[2])  cdf(cell[1])))
We create a random sample and apply the Symbol::chi^2 test:
r := stats::binomialRandom(n, p, Seed = 123): data := [r() $ i = 1..100]: stats::csGOFT(data, cells, CDF = cdf)
The observed significance level is not small, i.e., the data pass the test well.
The `trial parameter' n = 100 is large enough for the binomial distribution to be approximated by a normal distribution with mean n p and variance n p (1  p). The data pass the test for a normal distribution, too:
cdf := stats::normalCDF(n*p, n*p*(1  p)): stats::csGOFT(data, cells, CDF = cdf)
We repeat the test with another cell partitioning:
quantile := stats::normalQuantile(n*p, n*p*(1  p)): cells := stats::equiprobableCells(4, quantile)
stats::csGOFT(data, cells, CDF = cdf)
delete k, quantile, cells, cdf, r, data:
We demonstrate userdefined quantile functions. We consider the following distribution of a random variable X supported on the interval [0, 1]:
The quantile function q is given by for 0 ≤ x ≤ 1:
quantile := x > sqrt(x):
We test the hypothesis that the following data are distributed as defined above.
cells := stats::equiprobableCells(6, quantile)
data := [sqrt(frandom()) $ i = 1..10^3]: cdf := proc(x) begin if x <= 0 then return(0) elif x <= 1 then return(x^2) else return(1) end_if end_proc: stats::csGOFT(data, cells, CDF = cdf)
The data pass the test well. In fact, for a uniform deviate Y on
the interval [0, 1] (as produced
by frandom
),
the cumulative distribution function of
is
indeed given by cdf.
delete quantile, cells, data, cdf:

The number of cells: a positive integer 

A procedure representing a quantile
function of a statistical distribution. Typically, 


List of k "cells"
with floatingpoint values
.
This `cell partitioning' is suitable as input parameter for stats::csGOFT
.