Histogram Fit: Scaling and offset
Show older comments
I have one dimensional data (~12500 entries) with values reaching from ~135 to ~1150, yielding 3 peaks (see attachment).
Now I want to create a histogram showing the data distribution, as well as a fitting curve and a goodness of fit (chi squared) test.
Thus far I got the following:
load Data.mat
bins = round(sqrt(length(Data))); % Number of bins
[f, x] = hist(Data,bins); % Calculate histogram
pd = fitdist(x','Kernel'); % Calculate fit
y = pdf(pd,f); % Calculate pdf
figure(1)
dx = diff(x(1:2));
bar(x, f/sum(f*dx)); % Normalizing and plotting
hold on
plot(x,y,'Linewidth',2) % Plot fit
hold off
[h,p] = chi2gof(x,'CDF',pd,'Alpha',0.05); % Chi squared test
While my chi2gof test yields expected results (h=0 ; p = 0.9983) my plot doesn't look to well:

The scale of the fitting curve sems to be way off for all 3 peaks. Additionally I'd expect the curve to get a lot closer to 0 for very low and very high values.
Thanks in advance for any suggestions on how to improve/fix my code!
2 Comments
Jeff Miller
on 26 Feb 2019
Regarding the scaling problem, what is sum(y*dx)?
Regarding the above-0 tails of the estimated pdf, do they drop off when you compute the density over a wider range, e.g. -500 to 2000? If so, the problem may be that the kernel bandwidth is not optimal. The default is to choose a good bandwidth to estimate a normal distribution. This looks much more like a mixture of three different distributions, so MATLAB's bandwidth guess may be pretty far from optimal.
Marcel Dorer
on 27 Feb 2019
Edited: Marcel Dorer
on 27 Feb 2019
Accepted Answer
More Answers (0)
Categories
Find more on Exploration and Visualization in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!