5.0

5.0 | 2 ratings Rate this file 22 Downloads (last 30 days) File Size: 2.29 KB File ID: #24913
image thumbnail

Histogram Binwidth Optimization

by Hideaki Shimazaki

 

03 Aug 2009 (Updated 22 Jan 2010)

Function `sshist' returns optimal number of bins in a histogram used for density estimation.

| Watch this File

File Information
Description

function [optN, C, N] = sshist(x,N)
% [optN, C, N] = sshist(x,N)
%
% Function `sshist' returns the optimal number of bins in a histogram
% used for density estimation.
% Optimization principle is to minimize expected L2 loss function between
% the histogram and an unknown underlying density function.
% An assumption made is merely that samples are drawn from the density
% independently each other.
%
% The optimal binwidth D* is obtained as a minimizer of the formula,
% (2K-V) / D^2,
% where K and V are mean and variance of sample counts across bins with width D.
% Optimal number of bins is given as (max(x) - min(x)) / D*.
%
% For more information, visit
% http://2000.jukuin.keio.ac.jp/shimazaki/res/histogram.html
%
% Original paper:
% Hideaki Shimazaki and Shigeru Shinomoto
% A method for selecting the bin size of a time histogram
% Neural Computation 19(6), 1503-1527, 2007
% http://dx.doi.org/10.1162/neco.2007.19.6.1503
%
% Example usage:
% optN = sshist(x); hist(x,optN);
%
% Input argument
% x: Sample data vector.
% N (optinal):
% A vector that specifies the number of bins to be examined.
% The optimal number of bins is selected from the elements of N.
% Default value is N = 2:50.
% * Do not search binwidths smaller than a sampling resolution of data.
%
% Output argument
% optN: Optimal number of bins.
% N: Bin numbers examined.
% C: Cost function of N.
%
% See also SSKERNEL
%
% Copyright (c) 2009 2010, Hideaki Shimazaki All rights reserved.
% http://2000.jukuin.keio.ac.jp/shimazaki

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Parameters Setting
x = reshape(x,1,numel(x));
x_min = min(x);
x_max = max(x);

if nargin < 2
    buf = abs(diff(sort(x)));
    dx = min(buf(logical(buf ~= 0)));
    N_MIN = 2; % Minimum number of bins (integer)
                    % N_MIN must be more than 1 (N_MIN > 1).
    N_MAX = min(floor((x_max - x_min)/(2*dx)),50);
                    % Maximum number of bins (integer)
    N = N_MIN:N_MAX; % # of Bins
end
    
SN = 30; % # of partitioning positions for shift average
D = (x_max - x_min) ./ N; % Bin Size Vector

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Computation of the Cost Function
Cs = zeros(length(N),SN);
for i = 1: length(N)

       shift = linspace(0,D(i),SN);
       for p = 1 : SN
               edges = linspace(x_min+shift(p)-D(i)/2,...
                    x_max+shift(p)-D(i)/2,N(i)+1); % Bin edges

               ki = histc(x,edges); % Count # of events in bins
               ki = ki(1:end-1);

               k = mean(ki); % Mean of event count
               v = sum( (ki-k).^2 )/N(i); % Variance of event count

               Cs(i,p) = ( 2*k - v ) / D(i)^2; % The Cost Function
       end

end
C = mean(Cs,2);

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Optimal Bin Size Selectioin
[Cmin idx] = min(C);
optN = N(idx); % Optimal number of bins
%optD = D(idx); % *Optimal binwidth
%edges = linspace(x_min,x_max,N(idx)); % Optimal segmentation

MATLAB release MATLAB 7.8 (R2009a)
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (2)
28 Aug 2009 Il  
04 Jan 2011 Kathleen

Useful. Would be nice if it picked the best human-readable bins (i.e. not 1014.37:119.27:2057.63).

Please login to add a comment or rating.
Updates
07 Aug 2009

minor change

07 Aug 2009

minor change

07 Aug 2009

minor change

22 Jan 2010

minor change: manual updated

Tag Activity for this File
Tag Applied By Date/Time
statistics Hideaki Shimazaki 03 Aug 2009 11:47:28
density estimation Hideaki Shimazaki 03 Aug 2009 11:47:28
optimization Hideaki Shimazaki 03 Aug 2009 11:47:28
histogram Hideaki Shimazaki 03 Aug 2009 11:47:28
mean integrated squared error Hideaki Shimazaki 03 Aug 2009 11:47:28
binwidth Hideaki Shimazaki 03 Aug 2009 11:47:28
bin Hideaki Shimazaki 03 Aug 2009 11:47:28
binwidth Andrey Kan 04 Nov 2011 02:40:06

Contact us at files@mathworks.com