Kernel Density Estimator for High Dimensions

version (4.85 KB) by Zdravko Botev
fast multivariate kernel density estimation for high dimensions


Updated 21 Jul 2016

Fast adaptive kernel density estimation in high dimensions in one m-file.
Provides optimal accuracy/speed trade-off, controlled via a parameter "gam";
To increase speed for "big data" applications, use small "gam";
Typically gam=n^(1/2), where "n" is the number of points. '

USAGE: [pdf,X1,X2]=akde(X,grid,gam)
X - data as a 'n' by 'd' vector;
grid - 'm' points of dimension 'd' over which pdf is computed;
default provided only for 2-dimensional data;
see example below on how to construct it in higher dimensions;
gam - (optional) cost/accuracy trade-off parameter, where gam<n;
default value is gam=ceil(n^(1/2)); larger values
may result in better accuracy, but reduce speed;
to speedup the code, use smaller "gam";

pdf - the value of the estimated density at 'grid'
X1,X2 - default grid (used only for 2 dimensional data)
see example on how to construct grid on higher dimensions


data=[randn(10^3,3);randn(10^3,3)/2+2]; % three dimensional data
[n,d]=size(data); ng=100; % total grid points = ng^d
MAX=max(data,[],1); MIN=min(data,[],1); scaling=MAX-MIN;
% create meshgrid in 3-dimensions
grid=reshape([X1(:),X2(:),X3(:)],ng^d,d); % create points for plotting
pdf=akde(data,grid); % run adaptive kde
pdf=reshape(pdf,size(X1)); % reshape pdf for use with meshgrid
for iso=[0.005:0.005:0.015] % isosurfaces with pdf = 0.005,0.01,0.015
isosurface(X1,X2,X3,pdf,iso),view(3),alpha(.3),box on,hold on,colormap cool

Kernel density estimation via diffusion
Z. I. Botev, J. F. Grotowski, and D. P. Kroese (2010)
Annals of Statistics, Volume 38, Number 5, pages 2916-2957.

Could anyone provide any resources explaining this method? There doesn't seem to be any mention about it in the linked paper, nor have I been able to find it elsewhere.


Hi Botev,

I have a 2D data (83 rows X 92 columns), which is map of temperature. I need to produce map of hotspot areas by considering different number of grids.
I have a 2D data (83 rows X 92 columns), which is map of temperature. I need to produce map of hotspot areas by considering different number of grids.

When I apply the example provided in the code for 2 dimensional data, I am getting the error; “Output argument "X1" (and maybe others) not assigned during call to "akde".” The example works only for 2 column data. However, my data have 83 rows X 92 columns.

Help me on how can I adapt it the code please.



MATLAB Release Compatibility
Created with R2016a
Compatible with any release
Platform Compatibility
Windows macOS Linux

