4.3

4.3 | 10 ratings Rate this file 180 downloads (last 30 days) File Size: 3.8 KB File ID: #17204

kernel density estimation

by Zdravko Botev

 

28 Oct 2007 (Updated 26 May 2009)

Code covered by the BSD License  

fast and accurate state-of-the-art bivariate kernel density estimator

Download Now | Watch this File

File Information
Description

% fast and accurate state-of-the-art
% bivariate kernel density estimator
% with diagonal bandwidth matrix.
% The kernel is assumed to be Gaussian.
% The two bandwidth parameters are
% chosen optimally without ever
% using/assuming a parametric model for the data or any "rules of thumb".
% Unlike many other procedures, this one
% is immune to accuracy failures in the estimation of
% multimodal densities with widely separated modes (see examples).
% INPUTS: data - an N by 2 array with continuous data
% n - size of the n by n grid over which the density is computed
% n has to be a power of 2, otherwise n=2^ceil(log2(n));
% the default value is 2^8;
% MIN_XY,MAX_XY- limits of the bounding box over which the density is computed;
% the format is:
% MIN_XY=[lower_Xlim,lower_Ylim]
% MAX_XY=[upper_Xlim,upper_Ylim].
% The dafault limits are computed as:
% MAX=max(data,[],1); MIN=min(data,[],1); Range=MAX-MIN;
% MAX_XY=MAX+Range/4; MIN_XY=MIN-Range/4;
% OUTPUT: bandwidth - a row vector with the two optimal
% bandwidths for a bivaroate Gaussian kernel;
% the format is:
% bandwidth=[bandwidth_X, bandwidth_Y];
% density - an n by n matrix containing the density values over the n by n grid;
% density is not computed unless the function is asked for such an output;
% X,Y - the meshgrid over which the variable "density" has been computed;
% the intended usage is as follows:
% surf(X,Y,density)
% Example (simple Gaussian mixture)
% clear all
% % generate a Gaussian mixture with distant modes
% data=[randn(500,2);
% randn(500,1)+3.5, randn(500,1);];
% % call the routine
% [bandwidth,density,X,Y]=kde2d(data);
% % plot the data and the density estimate
% contour3(X,Y,density,50), hold on
% plot(data(:,1),data(:,2),'r.','MarkerSize',5)
%
% Example (Gaussian mixture with distant modes):
%
% clear all
% % generate a Gaussian mixture with distant modes
% data=[randn(100,1), randn(100,1)/4;
% randn(100,1)+18, randn(100,1);
% randn(100,1)+15, randn(100,1)/2-18;];
% % call the routine
% [bandwidth,density,X,Y]=kde2d(data);
% % plot the data and the density estimate
% surf(X,Y,density,'LineStyle','none'), view([0,60])
% colormap hot, hold on, alpha(.8)
% set(gca, 'color', 'blue');
% plot(data(:,1),data(:,2),'w.','MarkerSize',5)
%
% Example (Sinusoidal density):
%
% clear all
% X=rand(1000,1); Y=sin(X*10*pi)+randn(size(X))/3; data=[X,Y];
% % apply routine
% [bandwidth,density,X,Y]=kde2d(data);
% % plot the data and the density estimate
% surf(X,Y,density,'LineStyle','none'), view([0,70])
% colormap hot, hold on, alpha(.8)
% set(gca, 'color', 'blue');
% plot(data(:,1),data(:,2),'w.','MarkerSize',5)
%
% Notes: If you have a more accurate density estimator
% (as measured by which routine attains the smallest
% L_2 distance between the estimate and the true density) or you have
% problems running this code, please email me at botev@maths.uq.edu.au

% Reference: Z. I. Botev, J. F. Grotowski and D. P. Kroese
% "KERNEL DENSITY ESTIMATION VIA DIFFUSION" ,Submitted to the
% Annals of Statistics, 2009

MATLAB release MATLAB 7.4 (R2007a)
Zip File Content  
Other Files kde2d.m,
license.txt
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (13)
29 Oct 2007 Thomas T.

Easy to use bivariate kernel density estimation procedure. Thanks for sharing the code.

07 Nov 2007 dan p

This is some really nice code. It runs very quickly and is easy to get up and running. The author also provides plenty of comments and examples.

01 Jan 2008 AMin Gheibi

It's easy to use and good organized code. I suggest it.

17 Apr 2008 jagabandhu paul

good.

17 Jun 2008 Alexander SepĂșlveda  
05 Oct 2008 Ying Lin

Maximum recursion limit of 500 reached. Use set(0,'RecursionLimit',N)
to change the limit.

16 Dec 2008 Nate Yoder  
25 Jan 2009 Michael Jordan  
13 Jun 2009 Nikola Toljic

Just what I needed.

12 Nov 2009 Matt

Should this work when there are multiple points at one location?
If I input a vector or points, with s number of identical points, for example:
>> y=rand(400, 2); y(371:400, 1)=y(370,1); y(370:400,2)=y(370,1);
>> [bandwidth, density, X, Y]=kde2d(y);
I find that the minimum density value is negative, although the bandwidth is still real and positive.

28 Dec 2009 Jianguang

I sum the 'density' and find sum(sum(denstiy)) > 1. This is an obivous bug and i don't know why. Should the density be normalizated to unit ?

29 Dec 2009 Dazhi Jiang

Hi, Jiangguang,

I don't think the sum of the density values is necessary to be 1. In fact, the volume is.

By the way, even a single value of density function can be over than 1.

Thanks

07 Jan 2010 David

i have the same question, and posted it on the other kernel estimator.

Please login to add a comment or rating.
Updates
26 May 2009

updated reference and added new license as requested by Matlab

Tag Activity for this File
Tag Applied By Date/Time
statistics Zdravko Botev 22 Oct 2008 09:33:07
probability Zdravko Botev 22 Oct 2008 09:33:07
optimal bandwidth selection Zdravko Botev 22 Oct 2008 09:33:07
kernel density estimator Zdravko Botev 22 Oct 2008 09:33:07
optimal bandwidth selection Bartolomeu Rabacal 23 Nov 2009 08:35:26
 

MATLAB Central Terms of Use

NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Terms prior to use.

Contact us at files@mathworks.com