How do i fit a histogram properly?
Show older comments
I have a vector of data and i need to build an histogram and fit a normal distribution (the data are supposed to be normal). The fit seems good but the chi square test keeps failing.
I tried this way, loading the data in DATA into the variable e
%first fit
fit=e;
media=mean(fit)
sig=std(fit)
w=sig/3;
nbin=round((max(fit)-min(fit))/(w))
% rebin (if th fit is bad, try to remove data outside the 3Sigma)
% clear fit
% fit=e(e>=(media-3*sig) & e<=(media+3*sig));
% media=mean(fit)
% sig=std(fit)
% w=sig/3;
% nbin=round((max(fit)-min(fit))/(w))
figure('Name','both eyes')
histfit(fit, nbin); %make the histogram e fit the gaussian
fitBoth=fitdist(fit,'Normal'); %make the proper fit to get the parameters
%not sure if fitdist uses the nbin provided or how to pass the value
mu=fitBoth.mu; %get the fit parameters
sigma=fitBoth.sigma;
str= ['\mu=' num2str(mu) newline '\sigma=' num2str(sigma)];
annotation('textbox', [0.785773044110552 0.757296497913367 0.108809663250367 0.141321044546851],'String',str,'FitBoxToText','on', 'FontSize', 18,'EdgeColor','red');
[h,p,st]=chi2gof(fit, 'NBins',nbin, 'CDF',fitBoth) %should use the expected value from the fitdist, right?

The results mu and sigma are compatible with a old work in which the data were normal. However the chi2 test keeps refusing the hypotesis.
The code shown is the latest try, i also tried doing it "manually", getting the counts in the bin with histcounts, but i got stuck trying to get the "expected" values from the fit.
Lastly, the mu and sigma from the fit are exactly the same i got from the mean and std functions, which is suspicious, and once again i don't get how such a "good" fit could make the test fail.
Thank you in advance
Answers (1)
T1 = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/924499/DATA.txt', 'VariableNamingRule','preserve')
Var2_NotEmpty = nnz(~ismember(T1{:,2},{' '}))
[h,p,stats] = chi2gof(T1{:,1})
This appears to me to confirm that the data are normally distributed.
.
3 Comments
Andrea Carobbi
on 12 Mar 2022
I do not understand rejecting the hypothesis that the data are normally distributed. Every other analysis I can think of indicates that assuming the data are normally-distributed is appropriate.
T1 = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/924499/DATA.txt', 'VariableNamingRule','preserve')
Var2_NotEmpty = nnz(~ismember(T1{:,2},{' '}))
figure
histfit(T1{:,1})
[h,p,stats] = chi2gof(T1{:,1})
pd = fitdist(T1{:,1},'Normal')
figure
probplot(T1{:,1});
.
Andrea Carobbi
on 14 Mar 2022
Categories
Find more on Exploration and Visualization in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
