File Exchange

image thumbnail

Histogram Binwidth Optimization

version 1.4.0.0 (2.29 KB) by Hideaki Shimazaki
Function `sshist' returns optimal number of bins in a histogram used for density estimation.

9 Downloads

Updated 22 Jan 2010

View License

function [optN, C, N] = sshist(x,N)
% [optN, C, N] = sshist(x,N)
%
% Function `sshist' returns the optimal number of bins in a histogram
% used for density estimation.
% Optimization principle is to minimize expected L2 loss function between
% the histogram and an unknown underlying density function.
% An assumption made is merely that samples are drawn from the density
% independently each other.
%
% The optimal binwidth D* is obtained as a minimizer of the formula,
% (2K-V) / D^2,
% where K and V are mean and variance of sample counts across bins with width D.
% Optimal number of bins is given as (max(x) - min(x)) / D*.
%
% For more information, visit
% http://2000.jukuin.keio.ac.jp/shimazaki/res/histogram.html
%
% Original paper:
% Hideaki Shimazaki and Shigeru Shinomoto
% A method for selecting the bin size of a time histogram
% Neural Computation 19(6), 1503-1527, 2007
% http://dx.doi.org/10.1162/neco.2007.19.6.1503
%
% Example usage:
% optN = sshist(x); hist(x,optN);
%
% Input argument
% x: Sample data vector.
% N (optinal):
% A vector that specifies the number of bins to be examined.
% The optimal number of bins is selected from the elements of N.
% Default value is N = 2:50.
% * Do not search binwidths smaller than a sampling resolution of data.
%
% Output argument
% optN: Optimal number of bins.
% N: Bin numbers examined.
% C: Cost function of N.
%
% See also SSKERNEL
%
% Copyright (c) 2009 2010, Hideaki Shimazaki All rights reserved.
% http://2000.jukuin.keio.ac.jp/shimazaki

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Parameters Setting
x = reshape(x,1,numel(x));
x_min = min(x);
x_max = max(x);

if nargin < 2
buf = abs(diff(sort(x)));
dx = min(buf(logical(buf ~= 0)));
N_MIN = 2; % Minimum number of bins (integer)
% N_MIN must be more than 1 (N_MIN > 1).
N_MAX = min(floor((x_max - x_min)/(2*dx)),50);
% Maximum number of bins (integer)
N = N_MIN:N_MAX; % # of Bins
end

SN = 30; % # of partitioning positions for shift average
D = (x_max - x_min) ./ N; % Bin Size Vector

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Computation of the Cost Function
Cs = zeros(length(N),SN);
for i = 1: length(N)

shift = linspace(0,D(i),SN);
for p = 1 : SN
edges = linspace(x_min+shift(p)-D(i)/2,...
x_max+shift(p)-D(i)/2,N(i)+1); % Bin edges

ki = histc(x,edges); % Count # of events in bins
ki = ki(1:end-1);

k = mean(ki); % Mean of event count
v = sum( (ki-k).^2 )/N(i); % Variance of event count

Cs(i,p) = ( 2*k - v ) / D(i)^2; % The Cost Function
end

end
C = mean(Cs,2);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Optimal Bin Size Selectioin
[Cmin idx] = min(C);
optN = N(idx); % Optimal number of bins
%optD = D(idx); % *Optimal binwidth
%edges = linspace(x_min,x_max,N(idx)); % Optimal segmentation

Cite As

Hideaki Shimazaki (2020). Histogram Binwidth Optimization (https://www.mathworks.com/matlabcentral/fileexchange/24913-histogram-binwidth-optimization), MATLAB Central File Exchange. Retrieved .

Comments and Ratings (3)

tsan toso

For a large dataset (i.e. 3200) it takes quite a while to run the function. Is there a way to speed up the function?

KAE

Useful. Would be nice if it picked the best human-readable bins (i.e. not 1014.37:119.27:2057.63).

Il

Updates

1.4.0.0

minor change: manual updated

1.3.0.0

minor change

1.2.0.0

minor change

1.1.0.0

minor change

MATLAB Release Compatibility
Created with R2009a
Compatible with any release
Platform Compatibility
Windows macOS Linux