Code covered by the BSD License  

Highlights from
kernel density estimation

4.22222

4.2 | 20 ratings Rate this file 159 Downloads (last 30 days) File Size: 3.8 KB File ID: #17204
image thumbnail

kernel density estimation

by Zdravko Botev

 

28 Oct 2007 (Updated 26 May 2009)

fast and accurate state-of-the-art bivariate kernel density estimator

| Watch this File

File Information
Description

% fast and accurate state-of-the-art
% bivariate kernel density estimator
% with diagonal bandwidth matrix.
% The kernel is assumed to be Gaussian.
% The two bandwidth parameters are
% chosen optimally without ever
% using/assuming a parametric model for the data or any "rules of thumb".
% Unlike many other procedures, this one
% is immune to accuracy failures in the estimation of
% multimodal densities with widely separated modes (see examples).
% INPUTS: data - an N by 2 array with continuous data
% n - size of the n by n grid over which the density is computed
% n has to be a power of 2, otherwise n=2^ceil(log2(n));
% the default value is 2^8;
% MIN_XY,MAX_XY- limits of the bounding box over which the density is computed;
% the format is:
% MIN_XY=[lower_Xlim,lower_Ylim]
% MAX_XY=[upper_Xlim,upper_Ylim].
% The dafault limits are computed as:
% MAX=max(data,[],1); MIN=min(data,[],1); Range=MAX-MIN;
% MAX_XY=MAX+Range/4; MIN_XY=MIN-Range/4;
% OUTPUT: bandwidth - a row vector with the two optimal
% bandwidths for a bivaroate Gaussian kernel;
% the format is:
% bandwidth=[bandwidth_X, bandwidth_Y];
% density - an n by n matrix containing the density values over the n by n grid;
% density is not computed unless the function is asked for such an output;
% X,Y - the meshgrid over which the variable "density" has been computed;
% the intended usage is as follows:
% surf(X,Y,density)
% Example (simple Gaussian mixture)
% clear all
% % generate a Gaussian mixture with distant modes
% data=[randn(500,2);
% randn(500,1)+3.5, randn(500,1);];
% % call the routine
% [bandwidth,density,X,Y]=kde2d(data);
% % plot the data and the density estimate
% contour3(X,Y,density,50), hold on
% plot(data(:,1),data(:,2),'r.','MarkerSize',5)
%
% Example (Gaussian mixture with distant modes):
%
% clear all
% % generate a Gaussian mixture with distant modes
% data=[randn(100,1), randn(100,1)/4;
% randn(100,1)+18, randn(100,1);
% randn(100,1)+15, randn(100,1)/2-18;];
% % call the routine
% [bandwidth,density,X,Y]=kde2d(data);
% % plot the data and the density estimate
% surf(X,Y,density,'LineStyle','none'), view([0,60])
% colormap hot, hold on, alpha(.8)
% set(gca, 'color', 'blue');
% plot(data(:,1),data(:,2),'w.','MarkerSize',5)
%
% Example (Sinusoidal density):
%
% clear all
% X=rand(1000,1); Y=sin(X*10*pi)+randn(size(X))/3; data=[X,Y];
% % apply routine
% [bandwidth,density,X,Y]=kde2d(data);
% % plot the data and the density estimate
% surf(X,Y,density,'LineStyle','none'), view([0,70])
% colormap hot, hold on, alpha(.8)
% set(gca, 'color', 'blue');
% plot(data(:,1),data(:,2),'w.','MarkerSize',5)
%
% Notes: If you have a more accurate density estimator
% (as measured by which routine attains the smallest
% L_2 distance between the estimate and the true density) or you have
% problems running this code, please email me at botev@maths.uq.edu.au

% Reference: Z. I. Botev, J. F. Grotowski and D. P. Kroese
% "KERNEL DENSITY ESTIMATION VIA DIFFUSION" ,Submitted to the
% Annals of Statistics, 2009

MATLAB release MATLAB 7.4 (R2007a)
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (28)
29 Oct 2007 Thomas T.

Easy to use bivariate kernel density estimation procedure. Thanks for sharing the code.

07 Nov 2007 dan p

This is some really nice code. It runs very quickly and is easy to get up and running. The author also provides plenty of comments and examples.

01 Jan 2008 AMin Gheibi

It's easy to use and good organized code. I suggest it.

17 Apr 2008 jagabandhu paul

good.

17 Jun 2008 Alexander SepĂșlveda  
05 Oct 2008 Ying Lin

Maximum recursion limit of 500 reached. Use set(0,'RecursionLimit',N)
to change the limit.

16 Dec 2008 Nate Yoder  
25 Jan 2009 Michael Jordan  
13 Jun 2009 Nikola Toljic

Just what I needed.

12 Nov 2009 Matt

Should this work when there are multiple points at one location?
If I input a vector or points, with s number of identical points, for example:
>> y=rand(400, 2); y(371:400, 1)=y(370,1); y(370:400,2)=y(370,1);
>> [bandwidth, density, X, Y]=kde2d(y);
I find that the minimum density value is negative, although the bandwidth is still real and positive.

28 Dec 2009 Jianguang

I sum the 'density' and find sum(sum(denstiy)) > 1. This is an obivous bug and i don't know why. Should the density be normalizated to unit ?

29 Dec 2009 Dazhi Jiang

Hi, Jiangguang,

I don't think the sum of the density values is necessary to be 1. In fact, the volume is.

By the way, even a single value of density function can be over than 1.

Thanks

07 Jan 2010 David

i have the same question, and posted it on the other kernel estimator.

16 Feb 2010 Sue C

It's easy to use .

but i got a problem;I am trying to do density estimation.
here is the m-file

%--------------(density estimation)------------------------------
clear all
data=[118,50;151,13;152,17;156,202;208,43;256,402;260,204;263,415;] ;
     [bandwidth,density,X,Y]=kde2d(data);
     contour3(X,Y,density,50), hold on
     plot(data(:,1),data(:,2),'r.','MarkerSize',5)
%------------(density estimation)------------------------------(end)

and an error happened

??? Error using ==> fzero at 293
The function values at the interval endpoints must
differ in sign

Does anyone got the same error?

08 Jun 2010 Anton Haug

I have tried this subroutine and found that it gives me a biased result. For samples from a mixture distribution of two Gaussians with known means, the kernel density is offset from the true mean in one dimension.

14 Jun 2010 Anton Haug

The author of this subroutine contacted me and when I exchanged my code with him he pointed out where I had made an error and helped me to correct my code. His subroutine is not biased. The bias was createrd by my lack of understanding of how the subroutine worked. I thank the author for his help!!

21 Aug 2010 gwideman

Is it possible to use this procedure when the input data are weighted? That is to say, input data have an X and Y value, and should not simply count as "1" at that location, but as some variable amount?
Thanks.

14 Mar 2011 Mi Matthew

My god,you are so excellent.Thanks a lot!

22 Apr 2011 Nick  
14 Jul 2011 Gil Tidhar

The minimum density values returned are NEGATIVE. This cannot be true for a density estimator and does not happen with other kernel type estimators that use positive definite Kernel functions.

16 Aug 2011 Alexander

Great code. One small improvements that I came across:
When data across one of the dimensions is constant, auto scaling is zero which leads to an error. Moreover, in that case it's better to increase/decrease max/min value, so that resulting meshgrid would not be 1-dimensional (which can cause wrong behavior in interpolation, for example)
Smth like this:
for ind=1:2
    if MAX_XY(ind)==MIN_XY(ind)
        MAX_XY(ind)=MAX_XY(ind)*1.1;
        MIN_XY(ind)=MIN_XY(ind)/1.1;
    end
end

23 Aug 2011 Paulo

I'm having the same problem reported above by Sue C (16 Feb 2010). A similar problem was found for the 1-d version of this code and, apparently, was only fixed there: http://www.mathworks.com/matlabcentral/fileexchange/14034-kernel-density-estimator

15 Sep 2011 Arthur Allen

I have exactly the same question as gwideman (21 Aug 2010) "Is it possible to use this procedure when the input data are weighted? " The 1-D ksdensity.m function has this ability. Thanks.

11 Oct 2011 JERRY

I am having the same problem as Gil Tidhar. When using the WAFO kde function, the minimum is 0 (which is what I expected), but your function returns negative values.

test code:
>> [temp, pab] = kde2d(rand(10000, 2),...
>> 2^4, [-3, -3], [5, 5]);
>> min(min(pab))
returns -0.0026

25 Nov 2011 Alexander

Hi. For the 1-dimensional estimation you've fixed the "fzero at 293:The function values at the interval endpoints must" error with a try-catch block using :

t_star=.28*N^(-2/5);

What is the correct expression for kde2d?

I'm guessing the power is (-1/3), but what is the coefficient (any where is it coming from)?

29 Nov 2011 El-ad David Amir  
09 Dec 2011 name

Hi,

The 1d version of the kernel estimator also provides cdf values at the representative points. I would like to use these for 2d too. I don't have a strong background on this and I was not able to compute it.
Does anyone know how to compute cdf for 2d kernel?

08 Jan 2012 John

Hi,
I just wanted to verify that the volume under the output curve is 1. Right?
Thanks

Please login to add a comment or rating.
Updates
26 May 2009

updated reference and added new license as requested by Matlab

Tag Activity for this File
Tag Applied By Date/Time
statistics Zdravko Botev 22 Oct 2008 09:33:07
probability Zdravko Botev 22 Oct 2008 09:33:07
kernel density estimator Zdravko Botev 22 Oct 2008 09:33:07
optimal bandwidth selection Zdravko Botev 22 Oct 2008 09:33:07
optimal bandwidth selection Bartolomeu Rabacal 23 Nov 2009 08:35:26
kernel density estimator Andrey Kan 18 May 2011 22:03:41
kernel density estimator Krishna Kumar Kottakki 26 Oct 2011 02:31:37
probability Jose Paredes 31 Oct 2011 16:36:40

Contact us at files@mathworks.com