Kernel Density Estimator

Version 1.5.0.0 (5.5 KB) by Zdravko Botev

Reliable and extremely fast kernel density estimator for one-dimensional data

29.9K Downloads

Updated 30 Dec 2015

Reliable and extremely fast kernel density estimator for one-dimensional data;
Gaussian kernel is assumed and the bandwidth is chosen automatically;
Unlike many other implementations, this one is immune to problems
caused by multimodal densities with widely separated modes (see example). The
estimation does not deteriorate for multimodal densities, because we never assume
a parametric model for the data (like those used in rules of thumb).
INPUTS:
data - a vector of data from which the density estimate is constructed;
n - the number of mesh points used in the uniform discretization of the
interval [MIN, MAX]; n has to be a power of two; if n is not a power of two, then
n is rounded up to the next power of two, i.e., n is set to n=2^ceil(log2(n));
the default value of n is n=2^12;
MIN, MAX - defines the interval [MIN,MAX] on which the density estimate is constructed;
the default values of MIN and MAX are:
MIN=min(data)-Range/10 and MAX=max(data)+Range/10, where Range=max(data)-min(data);
OUTPUTS:
bandwidth - the optimal bandwidth (Gaussian kernel assumed);
density - column vector of length 'n' with the values of the density
estimate at the grid points;
xmesh - the grid over which the density estimate is computed;
- If no output is requested, then the code automatically plots a graph of
the density estimate.
cdf - column vector of length 'n' with the values of the cdf

Reference:
Kernel density estimation via diffusion
Z. I. Botev, J. F. Grotowski, and D. P. Kroese (2010)
Annals of Statistics, Volume 38, Number 5, pages 2916-2957
doi:10.1214/10-AOS799
Example (run in command window):
data=[randn(100,1);randn(100,1)*2+35 ;randn(100,1)+55];
kde(data,2^14,min(data)-5,max(data)+5);

Cite As

Zdravko Botev (2024). Kernel Density Estimator (https://www.mathworks.com/matlabcentral/fileexchange/14034-kernel-density-estimator), MATLAB Central File Exchange. Retrieved July 27, 2024.

MATLAB Release Compatibility

Created with R2015a

Compatible with any release

Platform Compatibility

Windows macOS Linux

Tags Add Tags

Acknowledgements

Inspired: SimOutUtils, h-coefficient, Kernel Density Estimator for High Dimensions, synctest( X,varargin )

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

kde(data,n,MIN,MAX)

Version	Published	Release Notes
1.5.0.0	30 Dec 2015	corrected the title back to "kernel density estimator" ; updated reference bug fixes: 1) in some rare cases with small 'n', fzero used to fail; code now deals with these failures; 2) density output forced to be positive (may be small and negative due to round-off errors, confusing some users) - the updated version provides additionally a cdf estimator as an output argument - designed not to crash for small number of data, e.g., kde(rand(1,5)) - published reference updated	Download
1.4.0.0	7 Mar 2010	-Published in the Annals of Statistics, 2010, see Section 5. - works on old versions of Matlab without nested functions. - plots a graph when no output is requested	Download
1.3.0.0	28 Jun 2009	As pointed out by Dazhi Jiang in the comments section, the healine "function [bandwidth,density,xmesh]=kde(data,n,MIN,MAX)" is missing. This version corrects this editing mistake.	Download
1.1.0.0	26 May 2009	updated the reference - now a journal paper submitted to the Annals of Statistics	Download
1.0.0.0	17 Oct 2007	Using higher order asymptotic approximations to achieve superior estimation accuracy for problems with few data points.	Download