File Exchange

image thumbnail

Histogram Binwidth Optimization

version 1.4.0.0 (2.29 KB) by Hideaki Shimazaki
Function `sshist' returns optimal number of bins in a histogram used for density estimation.

8 Downloads

Updated 22 Jan 2010

View Version History

View License

function [optN, C, N] = sshist(x,N)
% [optN, C, N] = sshist(x,N)
%
% Function `sshist' returns the optimal number of bins in a histogram
% used for density estimation.
% Optimization principle is to minimize expected L2 loss function between
% the histogram and an unknown underlying density function.
% An assumption made is merely that samples are drawn from the density
% independently each other.
%
% The optimal binwidth D* is obtained as a minimizer of the formula,
% (2K-V) / D^2,
% where K and V are mean and variance of sample counts across bins with width D.
% Optimal number of bins is given as (max(x) - min(x)) / D*.
%
% For more information, visit
% http://2000.jukuin.keio.ac.jp/shimazaki/res/histogram.html
%
% Original paper:
% Hideaki Shimazaki and Shigeru Shinomoto
% A method for selecting the bin size of a time histogram
% Neural Computation 19(6), 1503-1527, 2007
% http://dx.doi.org/10.1162/neco.2007.19.6.1503
%
% Example usage:
% optN = sshist(x); hist(x,optN);
%
% Input argument
% x: Sample data vector.
% N (optinal):
% A vector that specifies the number of bins to be examined.
% The optimal number of bins is selected from the elements of N.
% Default value is N = 2:50.
% * Do not search binwidths smaller than a sampling resolution of data.
%
% Output argument
% optN: Optimal number of bins.
% N: Bin numbers examined.
% C: Cost function of N.
%
% See also SSKERNEL
%
% Copyright (c) 2009 2010, Hideaki Shimazaki All rights reserved.
% http://2000.jukuin.keio.ac.jp/shimazaki

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Parameters Setting
x = reshape(x,1,numel(x));
x_min = min(x);
x_max = max(x);

if nargin < 2
buf = abs(diff(sort(x)));
dx = min(buf(logical(buf ~= 0)));
N_MIN = 2; % Minimum number of bins (integer)
% N_MIN must be more than 1 (N_MIN > 1).
N_MAX = min(floor((x_max - x_min)/(2*dx)),50);
% Maximum number of bins (integer)
N = N_MIN:N_MAX; % # of Bins
end

SN = 30; % # of partitioning positions for shift average
D = (x_max - x_min) ./ N; % Bin Size Vector

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Computation of the Cost Function
Cs = zeros(length(N),SN);
for i = 1: length(N)

shift = linspace(0,D(i),SN);
for p = 1 : SN
edges = linspace(x_min+shift(p)-D(i)/2,...
x_max+shift(p)-D(i)/2,N(i)+1); % Bin edges

ki = histc(x,edges); % Count # of events in bins
ki = ki(1:end-1);

k = mean(ki); % Mean of event count
v = sum( (ki-k).^2 )/N(i); % Variance of event count

Cs(i,p) = ( 2*k - v ) / D(i)^2; % The Cost Function
end

end
C = mean(Cs,2);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Optimal Bin Size Selectioin
[Cmin idx] = min(C);
optN = N(idx); % Optimal number of bins
%optD = D(idx); % *Optimal binwidth
%edges = linspace(x_min,x_max,N(idx)); % Optimal segmentation

Cite As

Hideaki Shimazaki (2021). Histogram Binwidth Optimization (https://www.mathworks.com/matlabcentral/fileexchange/24913-histogram-binwidth-optimization), MATLAB Central File Exchange. Retrieved .

Comments and Ratings (3)

tsan toso

For a large dataset (i.e. 3200) it takes quite a while to run the function. Is there a way to speed up the function?

KAE

Useful. Would be nice if it picked the best human-readable bins (i.e. not 1014.37:119.27:2057.63).

Il

MATLAB Release Compatibility
Created with R2009a
Compatible with any release
Platform Compatibility
Windows macOS Linux

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!