Extra note: The mean values I am using are all currently in the edges field. The value of 13.5667 is not one of the means I was using.
Edges and Observed values of Chi2gof
167 views (last 30 days)
Show older comments
Nathan
on 26 Dec 2025 at 17:16
Commented: William Rose
on 28 Dec 2025 at 6:40
I am trying to do a chi squared test on the means of this data which I have already sorted and is not the issue here. However, my observed values are just 4 ones even though I inputted the means I calculated. Does anyone know why this happens and how to solve it? Your help would be appreciated.
a=readmatrix("DIP Project- Population.xlsx");
population=a(:,1);
production=a(:,7);
gr1gen36=a(61:90,7);
gr1gen36(isnan(gr1gen36))=0;
gr1gen36;
m1=mean(gr1gen36);
gr2gen36=a(151:180,7);
gr2gen36(isnan(gr2gen36))=0;
gr2gen36;
m2=mean(gr2gen36);
gr3gen36=a(241:270,7);
gr3gen36(isnan(gr3gen36))=0;
m3=mean(gr3gen36);
gr4gen36=a(331:360,7);
gr4gen36(isnan(gr4gen36))=0;
gr4gen36;
m4=mean(gr4gen36);
gen36=[m1,m2,m3,m4];
expected=mean(gen36);
ex=[expected,expected,expected,expected];
[h,p,tbl]=chi2gof(gen36,'Expected',ex,'Alpha',0.05)
tbl =
chi2stat: 66.0168
df: 3
edges: [9.3667 13.5667 17.7667 21.9667 26.1667]
O: [1 1 1 1]
E: [18.4500 18.4500 18.4500 18.4500]
6 Comments
Accepted Answer
the cyclist
on 26 Dec 2025 at 21:04
Edited: the cyclist
on 26 Dec 2025 at 21:40
Here's what is happening. chi2gof() is expecting the raw, observed data. It is not expecting you to have precalculated the means. Therefore, what is it doing with your inputs?
It thinks that all of your observed data points are
x = [21.7000 26.1667 16.5667 9.3667]; % Total of four data points, to be binned
You then effectively tell it that you have four bins, because that is the length of the vector ex.
So, what does chi2gof do with this information? It puts one value of x into each of the four bins. This is why tbl.O = [1 1 1 1]. And then it (correctly) calculates the chi^2 stat, based on one observation in each bin, when it was told to expect 18.45 observations per bin.
You should have fed chi2gof() the raw counts.
12 Comments
the cyclist
on 27 Dec 2025 at 2:21
Well, a complication is that the test is for categorical or nominal variables, and yours is continuous. So in some ways it is just not the right test. You'll could bin the data to put them into a contingency table, and then it looks like you can use crosstab to report on the test. Maybe an AI can help you write the code. Depending on how critical the result is, you might want solicit someone with greater expertise, if you can. I don't want you to rely on my quick thoughts on a Friday evening.
That being said, a bigger issue is that the appropriate research question and statistical tests should really be determined before the results are seen. I think ANOVA (or perhaps kruskalwallis), by itself, is likely the best test, full stop.
William Rose
on 28 Dec 2025 at 6:40
@Nathan, I think @the cyclist is providing you with excellent advice: do the ANOVA or kruksal-wallis, then stop. And 100% for "the appropriate research question and statistical tests should really be determined before the results are seen".
More Answers (0)
See Also
Categories
Find more on Analysis of Variance and Covariance in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
