kernel density estimation

Version 1.3.0.0 (7.58 KB) by Zdravko Botev

fast and accurate state-of-the-art bivariate kernel density estimator

4.4

(40)

24.3K Downloads

Updated 30 Dec 2015

View License

fast and accurate state-of-the-art bivariate kernel density estimator with diagonal bandwidth matrix.
The kernel is assumed to be Gaussian. The two bandwidth parameters are chosen optimally without ever
using/assuming a parametric model for the data or any "rules of thumb". Unlike many other procedures, this one
is immune to accuracy failures in the estimation of multimodal densities with widely separated modes (see examples).
INPUTS:
data - an N by 2 array with continuous data
n - size of the n by n grid over which the density is computed
n has to be a power of 2, otherwise n=2^ceil(log2(n)); the default value is 2^8;
MIN_XY,MAX_XY - limits of the bounding box over which the density is computed;
the format is:
MIN_XY=[lower_Xlim,lower_Ylim]
MAX_XY=[upper_Xlim,upper_Ylim].

The dafault limits are computed as:
MAX=max(data,[],1); MIN=min(data,[],1); Range=MAX-MIN;
MAX_XY=MAX+Range/4; MIN_XY=MIN-Range/4;
OUTPUT:
bandwidth - a row vector with the two optimal bandwidths for a bivaroate Gaussian kernel;
the format is:
bandwidth=[bandwidth_X, bandwidth_Y];
density - an 'n' by 'n' matrix containing the density values over the 'n' by 'n' grid;
density is not computed unless the function is asked for such an output;
X,Y - the meshgrid over which the variable "density" has been computed;
the intended usage is as follows:
surf(X,Y,density)
Example (simple Gaussian mixture)
clear all
% generate a Gaussian mixture with distant modes
data=[randn(500,2);
randn(500,1)+3.5, randn(500,1);];
% call the routine
[bandwidth,density,X,Y]=kde2d(data);
% plot the data and the density estimate
contour3(X,Y,density,50), hold on
plot(data(:,1),data(:,2),'r.','MarkerSize',5)

Example (Gaussian mixture with distant modes):

clear all
% generate a Gaussian mixture with distant modes
data=[randn(100,1), randn(100,1)/4;
randn(100,1)+18, randn(100,1);
randn(100,1)+15, randn(100,1)/2-18;];
% call the routine
[bandwidth,density,X,Y]=kde2d(data);
% plot the data and the density estimate
surf(X,Y,density,'LineStyle','none'), view([0,60])
colormap hot, hold on, alpha(.8)
set(gca, 'color', 'blue');
plot(data(:,1),data(:,2),'w.','MarkerSize',5)

Example (Sinusoidal density):

clear all
X=rand(1000,1); Y=sin(X*10*pi)+randn(size(X))/3; data=[X,Y];
% apply routine
[bandwidth,density,X,Y]=kde2d(data);
% plot the data and the density estimate
surf(X,Y,density,'LineStyle','none'), view([0,70])
colormap hot, hold on, alpha(.8)
set(gca, 'color', 'blue');
plot(data(:,1),data(:,2),'w.','MarkerSize',5)

Reference:
Kernel density estimation via diffusion
Z. I. Botev, J. F. Grotowski, and D. P. Kroese (2010)
Annals of Statistics, Volume 38, Number 5, pages 2916-2957
doi:10.1214/10-AOS799

Cite As

Zdravko Botev (2024). kernel density estimation (https://www.mathworks.com/matlabcentral/fileexchange/17204-kernel-density-estimation), MATLAB Central File Exchange. Retrieved April 23, 2024.

MATLAB Release Compatibility

Created with R2015a

Compatible with any release

Platform Compatibility

Windows macOS Linux

Tags Add Tags

Acknowledgements

Inspired: synctest( X,varargin ), Cornerplot, Kernel Density Estimator for High Dimensions

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

kde2d(data,n,MIN_XY,MAX_XY)

Version	Published	Release Notes
1.3.0.0	30 Dec 2015	The algorithm uses the FFT for speed. Sometimes round-off computational errors due to using the FFT result in vanishingly small density values (e.g. -10^-17). Any such values are now set to machine precision. use old title "kernel density estimation"; update reference	Download
1.2.0.0	26 May 2009	updated reference and added new license as requested by Matlab	Download
1.0.0.0	29 Oct 2007		Download