Testing goodness of fit: P-value

12 views (last 30 days)
Yasamin H. T.
Yasamin H. T. on 12 Jun 2015
Answered: Aditya on 31 Jan 2024
There is a continuous data-set, that I'm trying to test the goodness of its fit with chi-square.
I use [h,p,stats] = chi2gof(x,'CDF',pd,'NBins',nb), to test the null hypothesis and goodness of fit.
While the result shows h=0 ( NH is not rejected), p-value shows up as NaN. I even tried to change the number of bins, but p-value is still NaN. Any ideas why?
Thanks!

Answers (1)

Aditya
Aditya on 31 Jan 2024
When you get a NaN (Not a Number) result for the p-value from the chi2gof function in MATLAB, it typically indicates that there is an issue with the calculation of the p-value. This can happen for several reasons:
  1. All Observed Frequencies Match Expected Frequencies: If your observed frequencies match the expected frequencies exactly (or very closely), the chi-square statistic can be zero or extremely small, leading to numerical difficulties in calculating the p-value.
  2. Insufficient Data: If there are too few data points or if the number of bins (NBins) is too large for the amount of data you have, some bins may end up with expected frequencies that are too low. The chi-square goodness-of-fit test generally requires at least 5 expected occurrences per bin.
  3. Inappropriate Distribution: If the probability distribution object (pd) does not fit the data well or if it's not defined properly, the calculation of expected frequencies might not be valid, resulting in a NaN p-value.
Here are some steps you can take to troubleshoot the NaN p-value:
  • Check Expected Frequencies: Look at the stats output from chi2gof, which contains the observed and expected frequencies. Ensure that the expected frequencies are all greater than 5 to satisfy the assumptions of the chi-square test.
  • Adjust the Number of Bins: Try adjusting the number of bins (NBins) to ensure that you have a sufficient number of observations in each bin. You can start with a smaller number of bins and increase it gradually.
  • Verify the Distribution Fit: Ensure that the probability distribution (pd) you are using is appropriate for your data. You can plot your data against the probability distribution function (PDF) or cumulative distribution function (CDF) of pd to visually inspect the fit.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!