A distribution binning problem

4 views (last 30 days)
Hi,
I have a problem in which I need to count the number of (probabilistic) occurrences falling in each non-uniform interval of a distribution.
While the final goal itself can apparently be achieved with histcounts, my problem is upstream, that is with the data about the population to be binned, which is not given by a sample, but by some known parameters about the (entire) population. Here is an exemplification, using packages and their weight to illustrate the problem, which is more general. N.B. I do have the Statistics and Machine Learning toolbox, but I'm not an expert statistician myself.
I have a set of N = 100 packages, and their total weight, W = 1000 kg. Let's say that we know how the weight of packages is distributed (about the mean), and that the variance is also a known, exogenous, parameter. The minimum and maximum weight of the packages in the lot is also known. To recap:
Number of packages N = 100;
Total weight W = 1000 kg;
minimum weight of package wmin = 2 kg;
maximum weight of package wmax = 20 kg;
mean weight = mu = W/N = 1000 kg/100 = 10 kg
variance = sigmacap = 4 (exogenously determined)
distribution of weights about the mean = N(mu,sigmacap) in case of normal distribution
With the above input, how should I proceed in having a (probabilistic) count of how many packages will fall in unqually spaced weight intervals of the type 2-5, 5-10, 10-12, 12-16 and 16-20 kilograms?
Thank you very much for any help or lead you can offer.
Daniele
  5 Comments
Daniele Rocchetta
Daniele Rocchetta on 7 Nov 2019
Now, this is a lot more cut down to a size I think I can handle, rather than going through half the probability theory just to have one isolated problem solved. I'll take it up as a challenge, and I'll put my head into it starting tonight, after work. Whatever knowledge of stats I had, it has been left to rust for far too many years.
Thanks for your time and good leads; will revert once through.
Kind regards
Daniele
Daniele Rocchetta
Daniele Rocchetta on 7 Nov 2019
John,
thanks to your clues I managed to put down the code I needed to answer my question. It was, after all, a good idea to ask. I'm putting it in a separate answer below for anyone interested.
If you have any further observation, it is of course welcome.
Thanks again.
All the best
Daniele

Sign in to comment.

Accepted Answer

Daniele Rocchetta
Daniele Rocchetta on 7 Nov 2019
Inspired and encouraged by John D'Errico advice above, I post below the code that solves the submitted distribution problem.
N = 100; % <- number of packages
W = 1000; % <- total weight in kg
mu = W/N; % <- average weight
sigmacap = 4; % <- variance (exogenously determined)
wmin = 2; % <- minimum weight of package in kg
wmax = 20; % <- maximum weight of package in kg
interValues = [wmin 5 10 12 16 wmax]; % edges of the weight bins
pd = makedist('Normal',mu,sqrt(sigmacap)); % <- create a normal distribution with the given parameters;
pdt = truncate(pd,wmin,wmax); % <- truncate the distribution to exclude packages < 2 kg or > 20 kg;
% each element in packCount represents the expected number of packages in the weight range (bin) [interValues(i) interValues(i+1];
packCount = NaN(1,numel(interValues)-1);
for i = 1:numel(packCount)
packCount(i) = round(diff([cdf(pdt,interValues(i)), cdf(pdt,interValues(i+1))])*N);
end

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!