## Kernel Density Estimator for High Dimensions

fast multivariate kernel density estimation for high dimensions

Fast adaptive kernel density estimation in high dimensions in one m-file.
Provides optimal accuracy/speed trade-off, controlled via a parameter "gam";
To increase speed for "big data" applications, use small "gam";
Typically gam=n^(1/2), where "n" is the number of points. '

USAGE: [pdf,X1,X2]=akde(X,grid,gam)
INPUTS:
X - data as a 'n' by 'd' vector;
grid - 'm' points of dimension 'd' over which pdf is computed;
default provided only for 2-dimensional data;
see example below on how to construct it in higher dimensions;
gam - (optional) cost/accuracy trade-off parameter, where gam<n;
default value is gam=ceil(n^(1/2)); larger values
may result in better accuracy, but reduce speed;
to speedup the code, use smaller "gam";

OUTPUT:
pdf - the value of the estimated density at 'grid'
X1,X2 - default grid (used only for 2 dimensional data)
see example on how to construct grid on higher dimensions

EXAMPLE IN 2 DIMENSIONS:
L=chol([1,-0.999;-0.999,1],'lower');L1=chol([1,0.999;0.999,1],'lower');
data=[(L1*randn(10^3,2)')';(L*randn(10^3,2)')'*2;rand(10^4,2)*5-2.5];
[pdf,X1,X2]=akde(data);pdf=reshape(pdf,size(X1));contour(X1,X2,pdf,20)

EXAMPLE IN 3 DIMENSIONS:
data=[randn(10^3,3);randn(10^3,3)/2+2]; % three dimensional data
[n,d]=size(data); ng=100; % total grid points = ng^d
MAX=max(data,[],1); MIN=min(data,[],1); scaling=MAX-MIN;
% create meshgrid in 3-dimensions
[X1,X2,X3]=meshgrid(MIN(1):scaling(1)/(ng-1):MAX(1),...
MIN(2):scaling(2)/(ng-1):MAX(2),MIN(3):scaling(3)/(ng-1):MAX(3));
grid=reshape([X1(:),X2(:),X3(:)],ng^d,d); % create points for plotting
pdf=reshape(pdf,size(X1)); % reshape pdf for use with meshgrid
for iso=[0.005:0.005:0.015] % isosurfaces with pdf = 0.005,0.01,0.015
isosurface(X1,X2,X3,pdf,iso),view(3),alpha(.3),box on,hold on,colormap cool
end

Reference:
Kernel density estimation via diffusion
Z. I. Botev, J. F. Grotowski, and D. P. Kroese (2010)
Annals of Statistics, Volume 38, Number 5, pages 2916-2957.

Pavel Junker

Could anyone provide any resources explaining this method? There doesn't seem to be any mention about it in the linked paper, nor have I been able to find it elsewhere.

maomao

Kurt Ehlert

Engdaw Chane

Hi Botev,

Thank you for providing the code. I am using it to apply Kernel density on maps which have lat/lon coordinates.
I have a 2D data (83 rows X 92 columns), which is map of temperature. I need to produce map of hotspot areas by considering different number of grids.

When I apply the example provided in the code for 2 dimensional data, I am getting the error; “Output argument "X1" (and maybe others) not assigned during call to "akde".” The example works only for 2 column data. However, my data have 83 rows X 92 columns.

Kindly,

Chane,

Hendrik Schulte

Hendrik Schulte

Hou

Joseph Armstrong

Andres Jacome Garcia

Andres Jacome Garcia

Ivan Botev

