# In the chi-square test, how to calculate (the correct number of parameters and consequently) the correct number of degrees of freedom, without using the chi2gof function?

21 views (last 30 days)
Sim on 21 Jun 2023
Commented: Sim on 23 Jun 2023
Question
In the chi-square test, how to calculate (the correct number of parameters and consequently) the correct number of degrees of freedom, without using the chi2gof function?
I have indeed noticed that the number of degrees of freedom was slightly different in one matlab answer and in the chi2gof function....
Option 1: "df = nbins - 1"
Population = [996, 749, 370, 53, 9, 3, 1, 0];
Sample = [647, 486, 100, 22, 0, 0, 0, 0];
Population2 = [996, 749, 370, sum(Population(4:8))];
Sample2 = [647, 486, 100, sum(Sample(4:8))];
chi2stat = sum((Sample2-Population2).^2./Population2);
df = length(Population2)-1;
pcrit = .05;
chi2crit = chi2inv(pcrit,df);
h2 = chi2stat > chi2crit;
p2 = 1 - chi2cdf(chi2stat,df);
fprintf('h=%d, p=%.3f df=%d\n',h2,p2,df);
h=1, p=0.000 df=3
Option 2: "df = nbins - 1 - nparams"
"chi2gof compares the value of the test statistic to a chi-square distribution with degrees of freedom equal to nbins - 1 - nparams, where nbins is the number of bins used for the data pooling and nparams is the number of estimated parameters used to determine the expected counts."
bins = 0:5;
obsCounts = [6 16 10 12 4 2];
n = sum(obsCounts);
pd = fitdist(bins','Poisson','Frequency',obsCounts');
expCounts = n * pdf(pd,bins);
[h,p,st] = chi2gof(bins,'Ctrs',bins,...
'Frequency',obsCounts, ...
'Expected',expCounts,...
'NParams',1)
h = 0
p = 0.4654
st = struct with fields:
chi2stat: 2.5550 df: 3 edges: [-0.5000 0.5000 1.5000 2.5000 3.5000 5.5000] O: [6 16 10 12 6] E: [7.0429 13.8041 13.5280 8.8383 6.0284]

dpb on 21 Jun 2023
Although you specified 'Ctrs', bins, chi2gof created only 5 bins because the obsCounts values for the last two bins in the 'Frequency' vector were too small individually. Hence the DOF for the chi-square test statistic turns out to be based on 5-1-1 --> 3 instead of 6-1-1 --> 4 that may have been what you were expecting?
dpb on 22 Jun 2023
Edited: dpb on 23 Jun 2023
Not sure what the remaing puzzle is so don't know how to try to add anything that haven't already said.
The correction to the number of DOF based solely on number of (collapsed) bins is simply how many parameters of the distribution used to calculate the expected counts per bin were estimated from the data itself -- IF ("the big if") the theoretical distribution parameter values are based on the input data itself.
If you test against counts from a theoretical distribution that is obtained from other considerations, then you've not estimated any further parameters from the count data itself and nParams=0.
Sim on 23 Jun 2023
Thanks @dpb! :-) :-)