Code covered by the BSD License

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

### Highlights from kernel density estimation

4.25806
4.3 | 35 ratings Rate this file 87 Downloads (last 30 days) File Size: 7.58 KB File ID: #17204 Version: 1.3

# kernel density estimation

### Zdravko Botev (view profile)

28 Oct 2007 (Updated )

fast and accurate state-of-the-art bivariate kernel density estimator

File Information
Description

fast and accurate state-of-the-art bivariate kernel density estimator with diagonal bandwidth matrix.
The kernel is assumed to be Gaussian. The two bandwidth parameters are chosen optimally without ever
using/assuming a parametric model for the data or any "rules of thumb". Unlike many other procedures, this one
is immune to accuracy failures in the estimation of multimodal densities with widely separated modes (see examples).
INPUTS:
data - an N by 2 array with continuous data
n - size of the n by n grid over which the density is computed
n has to be a power of 2, otherwise n=2^ceil(log2(n)); the default value is 2^8;
MIN_XY,MAX_XY - limits of the bounding box over which the density is computed;
the format is:
MIN_XY=[lower_Xlim,lower_Ylim]
MAX_XY=[upper_Xlim,upper_Ylim].

The dafault limits are computed as:
MAX=max(data,[],1); MIN=min(data,[],1); Range=MAX-MIN;
MAX_XY=MAX+Range/4; MIN_XY=MIN-Range/4;
OUTPUT:
bandwidth - a row vector with the two optimal bandwidths for a bivaroate Gaussian kernel;
the format is:
bandwidth=[bandwidth_X, bandwidth_Y];
density - an 'n' by 'n' matrix containing the density values over the 'n' by 'n' grid;
density is not computed unless the function is asked for such an output;
X,Y - the meshgrid over which the variable "density" has been computed;
the intended usage is as follows:
surf(X,Y,density)
Example (simple Gaussian mixture)
clear all
% generate a Gaussian mixture with distant modes
data=[randn(500,2);
randn(500,1)+3.5, randn(500,1);];
% call the routine
[bandwidth,density,X,Y]=kde2d(data);
% plot the data and the density estimate
contour3(X,Y,density,50), hold on
plot(data(:,1),data(:,2),'r.','MarkerSize',5)

Example (Gaussian mixture with distant modes):

clear all
% generate a Gaussian mixture with distant modes
data=[randn(100,1), randn(100,1)/4;
randn(100,1)+18, randn(100,1);
randn(100,1)+15, randn(100,1)/2-18;];
% call the routine
[bandwidth,density,X,Y]=kde2d(data);
% plot the data and the density estimate
surf(X,Y,density,'LineStyle','none'), view([0,60])
colormap hot, hold on, alpha(.8)
set(gca, 'color', 'blue');
plot(data(:,1),data(:,2),'w.','MarkerSize',5)

Example (Sinusoidal density):

clear all
X=rand(1000,1); Y=sin(X*10*pi)+randn(size(X))/3; data=[X,Y];
% apply routine
[bandwidth,density,X,Y]=kde2d(data);
% plot the data and the density estimate
surf(X,Y,density,'LineStyle','none'), view([0,70])
colormap hot, hold on, alpha(.8)
set(gca, 'color', 'blue');
plot(data(:,1),data(:,2),'w.','MarkerSize',5)

Reference:
Kernel density estimation via diffusion
Z. I. Botev, J. F. Grotowski, and D. P. Kroese (2010)
Annals of Statistics, Volume 38, Number 5, pages 2916-2957
doi:10.1214/10-AOS799

Acknowledgements

This file inspired Cornerplot and Kernel Density Estimator For High Dimensions.

MATLAB release MATLAB 8.5 (R2015a)
MATLAB Search Path
`/`
Tags for This File   Please login to tag files.
 Please login to add a comment or rating.
Comments and Ratings (50)
06 Oct 2015 Aslak Grinsted

### Aslak Grinsted (view profile)

Thank you. It would be great if you could add a parameter for EffectiveSampleSize being smaller than N. I want to use it to analyze the output of MCMC chains where the individual samples in data are not independent (using cornerplots). I've added the feature to my personal version.

21 Sep 2015 Chris Williams

### Chris Williams (view profile)

Super, thanks for thos

21 Jul 2015 leo nidas

### leo nidas (view profile)

Thank you for sharing the code. Is there a way of obtaining the corresponding bivariate cdf as well as the marginal cdfs and the inverse marginal cdfs? Could you provide an example? Thank you.

Comment only
01 May 2015 Pearl

### Pearl (view profile)

18 Mar 2015 Arpad Rozsas

### Arpad Rozsas (view profile)

06 Feb 2015 Aslak Grinsted

### Aslak Grinsted (view profile)

24 Dec 2014 yaq zhou

### yaq zhou (view profile)

I ran into the same error as described by Oliver，how can I fix it？

28 Nov 2014 Lee

### Lee (view profile)

22 Oct 2014 Felipe Uribe

### Felipe Uribe (view profile)

16 Aug 2014 Jakob Nikolas

### Jakob Nikolas (view profile)

If you need to use a fixed bandwidth, just overwrite t_x and t_y in the main function. Thanks to Zdravko for this hint.

16 Aug 2014 Jakob Nikolas

### Jakob Nikolas (view profile)

I ran into the same error as described by Oliver and fixed it by replacing

t_star=fzero( @(t)(t-evolve(t)),[0,0.1]);

with

options = optimset('FunValCheck','off')
try
t_star=fzero( @(t)(t-evolve(t)),[0,0.1],options);
catch
defaultError('override error, be cautious!');
t_star=0
end

30 Jan 2014 Oliver

### Oliver (view profile)

I am running the code perfectly on sets of 45 points (each has a X and Y coordinate). But when I try to only use a subset of those 45 points I get:
Error using fzero (line 274)
The function values at the interval endpoints must differ in sign.

Error in kde2d (line 101)
t_star=fzero(@(t)(t-evolve(t)),[0,0.1]);

Is code not able to handle the low sample size? Not sure what is happening. It seems to not be able to find a bandwidth.

Comment only
26 Jan 2014 sepideh

22 May 2013 Asha

### Asha (view profile)

Hi, I am using kde2d package for whale distribution data. I would like to plot the 50% and 95% kernels to look at where the core of the sightings are being made. Is this possible with this package? I am also not sure what the current legend means. I like the colour gradation that shows highly dense areas vs. lower density areas but it would be great if I could figure out why the numbers are as high as they are. I would appreciate any help on this matter please.
Thanks
Asha

Comment only
05 Apr 2013 Simon

### Simon (view profile)

hey there
I have a problem: Im having a vector of log returns (1189 rows) whose distribution i want to estimate with a kernel distribution.
So i just type in: "ksdensity(returnvector)". I get a curve, which looks ok, but on the y axis, I get values up to 60 (nothing normalised). And the mean (by clicking on the curve, choosing tools and data statistics) is completely different from the normal mean calculated as "mean(returnvector)". I dont know what is wrong here, which mean do i need to chose and which std??

Thx

13 Feb 2013 Arnault

### Arnault (view profile)

A quick question about the bandwidth parameter. Does it have a unit? Does it correspond the the larger of the kernel related to the size of the grid cell?

Comment only
16 Nov 2012 Harish S

21 Sep 2012 Syed

### Syed (view profile)

I keep getting negative probability densities, which is surprising as KDE is an addition of strictly positive numbers ...

31 Aug 2012 Tobias

### Tobias (view profile)

Hi,
I need to input an own bandwidth for the density estimation.
Is it possible to make changes to the code that allow either the input of another bandwidth or to change the data over which the density is computed after bandwidth estimation?

Thanks

Comment only
09 May 2012 Luca Baglivo

### Luca Baglivo (view profile)

20 Mar 2012 Rachel

### Rachel (view profile)

No need to answer, I managed to do it by calculating the bin width from the [X, Y] (meshgrid) output.

Comment only
19 Mar 2012 Rachel

### Rachel (view profile)

First of all, thank you for the code, it works great.

I want to get the marginal PDFs from the joint distribution, and am therefore doing:

[~,pdf_joint,X,Y]=kde2d(data);
m1 = sum(pdf_joint, 2); %marginal for signal 1 (column vector)
m2 = sum(pdf_joint, 1); %marginal for signal 2 (row vector)

How can I normalize m1 and m2 so that their total area is equal to unity?

Thank you.

Comment only
08 Jan 2012 John

### John (view profile)

Hi,
I just wanted to verify that the volume under the output curve is 1. Right?
Thanks

Comment only
09 Dec 2011 name

### name (view profile)

Hi,

The 1d version of the kernel estimator also provides cdf values at the representative points. I would like to use these for 2d too. I don't have a strong background on this and I was not able to compute it.
Does anyone know how to compute cdf for 2d kernel?

Comment only
29 Nov 2011 El-ad David Amir

### El-ad David Amir (view profile)

25 Nov 2011 Alexander Stepanov

### Alexander Stepanov (view profile)

Hi. For the 1-dimensional estimation you've fixed the "fzero at 293:The function values at the interval endpoints must" error with a try-catch block using :

t_star=.28*N^(-2/5);

What is the correct expression for kde2d?

I'm guessing the power is (-1/3), but what is the coefficient (any where is it coming from)?

11 Oct 2011 JERRY

### JERRY (view profile)

I am having the same problem as Gil Tidhar. When using the WAFO kde function, the minimum is 0 (which is what I expected), but your function returns negative values.

test code:
>> [temp, pab] = kde2d(rand(10000, 2),...
>> 2^4, [-3, -3], [5, 5]);
>> min(min(pab))
returns -0.0026

Comment only
15 Sep 2011 Arthur Allen

### Arthur Allen (view profile)

I have exactly the same question as gwideman (21 Aug 2010) "Is it possible to use this procedure when the input data are weighted? " The 1-D ksdensity.m function has this ability. Thanks.

Comment only
23 Aug 2011 Paulo

### Paulo (view profile)

I'm having the same problem reported above by Sue C (16 Feb 2010). A similar problem was found for the 1-d version of this code and, apparently, was only fixed there: http://www.mathworks.com/matlabcentral/fileexchange/14034-kernel-density-estimator

16 Aug 2011 Alexander Stepanov

### Alexander Stepanov (view profile)

Great code. One small improvements that I came across:
When data across one of the dimensions is constant, auto scaling is zero which leads to an error. Moreover, in that case it's better to increase/decrease max/min value, so that resulting meshgrid would not be 1-dimensional (which can cause wrong behavior in interpolation, for example)
Smth like this:
for ind=1:2
if MAX_XY(ind)==MIN_XY(ind)
MAX_XY(ind)=MAX_XY(ind)*1.1;
MIN_XY(ind)=MIN_XY(ind)/1.1;
end
end

14 Jul 2011 Gil Tidhar

### Gil Tidhar (view profile)

The minimum density values returned are NEGATIVE. This cannot be true for a density estimator and does not happen with other kernel type estimators that use positive definite Kernel functions.

22 Apr 2011 Nick

### Nick (view profile)

14 Mar 2011 Mi Matthew

### Mi Matthew (view profile)

My god,you are so excellent.Thanks a lot!

21 Aug 2010 gwideman

### gwideman (view profile)

Is it possible to use this procedure when the input data are weighted? That is to say, input data have an X and Y value, and should not simply count as "1" at that location, but as some variable amount?
Thanks.

Comment only
14 Jun 2010 Anton Haug

### Anton Haug (view profile)

The author of this subroutine contacted me and when I exchanged my code with him he pointed out where I had made an error and helped me to correct my code. His subroutine is not biased. The bias was createrd by my lack of understanding of how the subroutine worked. I thank the author for his help!!

08 Jun 2010 Anton Haug

### Anton Haug (view profile)

I have tried this subroutine and found that it gives me a biased result. For samples from a mixture distribution of two Gaussians with known means, the kernel density is offset from the true mean in one dimension.

16 Feb 2010 Sue C

### Sue C (view profile)

It's easy to use .

but i got a problem;I am trying to do density estimation.
here is the m-file

%--------------(density estimation)------------------------------
clear all
data=[118,50;151,13;152,17;156,202;208,43;256,402;260,204;263,415;] ;
[bandwidth,density,X,Y]=kde2d(data);
contour3(X,Y,density,50), hold on
plot(data(:,1),data(:,2),'r.','MarkerSize',5)
%------------(density estimation)------------------------------(end)

and an error happened

??? Error using ==> fzero at 293
The function values at the interval endpoints must
differ in sign

Does anyone got the same error?

07 Jan 2010 David

### David (view profile)

i have the same question, and posted it on the other kernel estimator.

Comment only
29 Dec 2009 Dazhi Jiang

### Dazhi Jiang (view profile)

Hi, Jiangguang,

I don't think the sum of the density values is necessary to be 1. In fact, the volume is.

By the way, even a single value of density function can be over than 1.

Thanks

Comment only
28 Dec 2009 Jianguang

### Jianguang (view profile)

I sum the 'density' and find sum(sum(denstiy)) > 1. This is an obivous bug and i don't know why. Should the density be normalizated to unit ?

12 Nov 2009 Matt

### Matt (view profile)

Should this work when there are multiple points at one location?
If I input a vector or points, with s number of identical points, for example:
>> y=rand(400, 2); y(371:400, 1)=y(370,1); y(370:400,2)=y(370,1);
>> [bandwidth, density, X, Y]=kde2d(y);
I find that the minimum density value is negative, although the bandwidth is still real and positive.

Comment only
13 Jun 2009 Nikola Toljic

### Nikola Toljic (view profile)

Just what I needed.

25 Jan 2009 Michael Jordan

### Michael Jordan (view profile)

16 Dec 2008 Nathanael Yoder

### Nathanael Yoder (view profile)

05 Oct 2008 Ying Lin

Maximum recursion limit of 500 reached. Use set(0,'RecursionLimit',N)
to change the limit.

17 Jun 2008 Alexander Sepúlveda
17 Apr 2008 jagabandhu paul

good.

01 Jan 2008 AMin Gheibi

It's easy to use and good organized code. I suggest it.

07 Nov 2007 dan p

This is some really nice code. It runs very quickly and is easy to get up and running. The author also provides plenty of comments and examples.

29 Oct 2007 Thomas T.

Easy to use bivariate kernel density estimation procedure. Thanks for sharing the code.

Updates
26 May 2009 1.2

updated reference and added new license as requested by Matlab

13 Jun 2015 1.3

The algorithm uses the FFT for speed. Sometimes round-off computational errors due to using the FFT result in vanishingly small density values (e.g. -10^-17). Any such values are now set to machine precision.

30 Dec 2015 1.3

use old title "kernel density estimation"; update reference

Contact us