- Create an x vector and a vector of bin edges, so that the count in each cell comes out to the values in your vector "sample". Use thtose as inputs to chi2gof().
- Compute the chi quared test statistic yourself and compare it to a critical value, using the correct degrees of freedom.
using chi2gof to determine sample representativeness
2 views (last 30 days)
Show older comments
Hello,
I am trying to use chi2gof function to test if the collected sample data is representative of the population data. Say here we have 8 bins and we have the population and sample value for each bin. Is this the correct way to do this test?
Population = [996, 749, 370, 53, 9, 3, 1, 0];
Sample = [647, 486, 100, 22, 0, 0, 0, 0];
[h,p,k]=chi2gof(Sample,'Expected',Population);
0 Comments
Answers (1)
William Rose
on 26 Oct 2022
Your code (below) does not work because chi2gof expects a vector x containing the observed values of the valriable- not the count of how many are in each cell, which you have provided.
There are (at least) 2 solutions.
Furthermore: Cells with 0 expected value cause the calculation of the chi squared statistic to blow up. Cells with less than 4-5 expected should be combined as needed, until all cells have at least 4-5 expected. Therefore combine cells 4-8 into a single cell:
Population = [996, 749, 370, 53, 9, 3, 1, 0];
Sample = [647, 486, 100, 22, 0, 0, 0, 0];
pop2 = [996, 749, 370, sum(Population(4:8))]
sample2 = [647, 486, 100, sum(Sample(4:8))]
Now let's try method 1 above:
x=[];
for i=1:length(sample2), x=[x,i*ones(1,sample2(i))]; end
edges=.5+(0:length(sample2));
Now do the chi2 test using chi2gof(). k has statistical info, so we inspect it, to make sure the observed values ("O") are what we want them to be.
[h,p,k]=chi2gof(x,'Expected',pop2,'Edges',edges)
The oberved vector "O" has the values in "sample2" vector. That means our x vector and the edges vector worked as desired.
h=1 means the null hypothesis (which is that the sample data matches the population) is rejected.
The low p value means it is highly improbable to get the observed data from this population.
Method 2: Compute the chi2 test statistic ourselves, then compare it to the critical value with the correct degrees of freedom.
chi2stat=sum((sample2-pop2).^2./pop2)
df=length(pop2)-1; pcrit=.05; chi2crit=chi2inv(pcrit,df);
h2=chi2stat>chi2crit; p2=1-chi2cdf(chi2stat,df);
fprintf('h=%d, p=%.3f\n',h2,p2);
The chi squared statistic and h and p match the test statistic and h and p we found above with Method 1.
0 Comments
See Also
Categories
Find more on Hypothesis Tests in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!