9 views (last 30 days)

Hi all,

I'm interested in comparing different bivariate histograms to an underlying 2D probability density function.

Additional info that you can skip for time:

My aim is to try and find the optimal bin size and smoothing for the histogram that best represents the known density function. In my field this is a common problem that doesn't really have a clear solution - there are many ways to estimate optimal bin size but I can't find any that also take smoothing into account, furthermore the histogram I want to compare is actually calculated as the ratio of 2 histograms generated with the same parameters but over very different underlying distributions. I have also not found any method for optimising parameters in such a situation. My ultimate aim is to generate histograms using a variety of different approaches and smoothing to try and find the 'best' or at least the best for different scenarios.

My first approach was to generate the histogram and then correlate the result with the PDF sampled at the same points (i.e. the histogram bin centers). Reading the literature a bit more I think I want to use the mean squared error (MSE) instead, but I'm not sure if this is a) appropriate or b) meaningful. Also, the Wikipedia page for MSE lists two equations and I'm not sure which is appropriate in this situation. I'm also worried that I should be calcualting the mean integrated squared error (MISE) instead, but I don't know how to do that for a discrete histogram vs a continuous PDF both of which are 2D. I have Matlab 2018b and all the toolboxes.

Here is the code I have so far:

% generate distribution of points, make histogram of these and get actual PDF underlying this

mu = [100 100];

sigma = [60 50;50 80];

num = 100;

pos1 = mvnrnd(mu,sigma,num); % the points

% in this example we will just have one distribution, but in the real data there are multiple such distributions all summed together

% which makes fitting a continuous function to the real data nearly impossible

bcx = 0:5:200;

bcy = 0:5:200;

[x,y] = meshgrid(bcx,bcy); % the grid over which to generate histogram or evaluate PDF

bcents = [x(:) y(:)];

map1 = mvnpdf(bcents,mu,sigma); % the PDF

map1 = reshape(map1,size(x));

map2 = hist3(pos1,'Ctrs',{bcx(:) bcy(:)}); % the histogram

% plot all three

figure

subplot(1,3,1)

plot(pos1(:,1),pos1(:,2),'ko')

axis([0 200 0 200])

axis square xy

title('Points')

subplot(1,3,2)

imagesc(map1)

axis square xy

title('PDF')

subplot(1,3,3)

imagesc(map2)

axis square xy

title('Histogram')

% calculate MSE

map_pdf = map_pdf .* 25; % scale so sum is unity (i.e probability - multiply by bin area to approximate Riemann sum)

map_hist = map_hist./sum(map_hist(:)); % scale so sum is unity (i.e probability)

mse = sum((map_pdf(:)-map_hist(:)).^2) .* (1/numel(map_pdf))

cor = corr(map_pdf(:),map_hist(:),'rows','pairwise')

Sign in to answer this question.

Opportunities for recent engineering grads.

Apply Today
## 0 Comments

Sign in to comment.