version 1.3.0.0 (11.5 KB) by
Hideaki Shimazaki

Kernel density estimation with bandwidths locally adapted to data.

First thing to do:

Run a tutorial code, tutorial.m

Documentation:

http://2000.jukuin.keio.ac.jp/shimazaki/res/kernel.html

see also sskernel for optimization of a fixed kernel bandwidth and sshist for histogram optimization.

% [y,t,optw,gs,C,confb95,yb] = ssvkernel(x,t,W)

%

% Function `ssvkernel' returns an optimized kernel density estimate

% using a Gauss kernel function with bandwidths locally adapted to the data.

%

% Examples:

% >> x = 0.5-0.5*log(rand(1,1e3)); t = linspace(0,3,500);

% >> [y,t,optw] = ssvkernel(x,t);

% This example produces a vector of kernel density estimates, y, at points

% specified in a vector t, using locally adaptive bandwidths, optw

% (a standard deviation of a normal density function).

%

% >> ssvkernel(x);

% By calling the function without output arguments, the estimated density

% is displayed.

%

% Input arguments:

% x: Sample data vector.

% tin (optinal):

% Points at which estimation are computed.

% W (optinal):

% A vector of kernel bandwidths.

% If W is provided, the optimal bandwidth is selected from the

% elements of W.

% * Do not search bandwidths smaller than a sampling resolution of data.

% If W is not provided, the program searches the optimal bandwidth

% using a golden section search method.

%

% Output arguments:

% y: Estimated density

% t: Points at which estimation was computed.

% The same as tin if tin is provided.

% (If the sampling resolution of tin is smaller than the sampling

% resolution of the data, x, the estimation was done at smaller

% number of points than t. The results, t and y, are obtained by

% interpolating the low resolution sampling points.)

% optw: Optimal kernel bandwidth.

% gs: Stiffness constants of the variable bandwidth examined.

% The stifness constant is defined as a ratio of the optimal fixed

% bandwidth to a length of a local interval in which a fixed-kernel

% bandwidth optimization was performed.

% C: Cost functions of stiffness constants.

% conf95:

% Bootstrap confidence intervals.

% yb: Booststrap samples.

%

%

% Usage:

% >> [y,t,optw] = ssvkernel(x);

% When t is not given in the input arguments, i.e., the output argument t

% is generated automatically.

%

% >> W = linspace(0.01,1,20);

% >> [y,t,optw] = ssvkernel(x,t,W);

% The optimal bandwidth is selected from the elements of W.

%

% >> [y,t,optw,confb95,yb] = ssvkernel(x);

% This additionally computes 95% bootstrap confidence intervals, confb95.

% The bootstrap samples are provided as yb.

%

%

% Optimization principle:

% The optimization is based on a principle of minimizing

% expected L2 loss function between the kernel estimate and an unknown

% underlying density function. An assumption is merely that samples

% are drawn from the density independently each other.

%

% The locally adaptive bandwidth is obtained by iteratively computing

% optimal fixed-size bandwidths wihtihn local intervals. The optimal

% bandwidths are selected such that they are selected in the intervals

% that are \gamma times larger than the optimal bandwidths themselves.

% The paramter \gamma was optimized by minimizing the L2 risk estimate.

%

% The method is described in

% Hideaki Shimazaki and Shigeru Shinomoto

% Kernel Bandwidth Optimization in Spike Rate Estimation

% Journal of Computational Neuroscience 2010

% http://dx.doi.org/10.1007/s10827-009-0180-4

%

% * Instead of a Gaussian window described in the paper, a boxcar window is

% used in this program.

%

% For more information, please visit

% http://2000.jukuin.keio.ac.jp/shimazaki/res/kernel.html

%

% See also SSKERNEL, SSHIST

%

%

% Hideaki Shimazaki

% http://2000.jukuin.keio.ac.jp/shimazaki

Hideaki Shimazaki (2020). ssvkernel(x,tin) (https://www.mathworks.com/matlabcentral/fileexchange/37374-ssvkernel-x-tin), MATLAB Central File Exchange. Retrieved .

Created with
R2009a

Compatible with any release

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!Create scripts with code, output, and formatted text in a single executable document.

David WalwarkI'm not a statistician but it seems to work well. Some parameters are hidden in the function, so check inside before doing a lot of computing. Make sure to parfor if applying it to many vectors, it parallelizes well. This isn't rigorous but what I miss from this (and from the other AKDE function on File Exchange) is the ability to tweak the function for better ad hoc results. What I mean is that after the algorithm has minimized its cost function and found what it thinks is the 'optimum' bandwidth for each neighborhood, I'd like the ability to step in there and massage the numbers a little. I know, parameterizing a non-parametric thing is asking for trouble, but being able to control how much the bandwidth varies over the distribution and to scale all the bandwidths up or down as needed would be helpful in cases where I need the algorithm to apply my judgement to a large dataset.

Thomas JohnDear Mr. Shimazaki

really a great contribution. Here I have a small script which described my problem an a (partial) solution.

% script to estimate the probability function(pdf) in case of periodic boundary conditions

% it uses the great "Locally Adaptive Kernel Density Estimation" from Hideaki Shimazaki

%

% Problem statement: we want to estimate the pdf of random directions

% direction phi=[0..2pi] e.g. uniform -> phi = 2*pi*rand(n,1)

% the option 'support' in ksdensity of the matlab statistics toolbox uses a

% log transformation: ty = log(yData-L) - log(U-yData);

% L = lower bound U = upper bound

% this nonlinear transformation under/over-estimates the values closed to the boundries :-(

%

% idea: extend the dataset with same data in both directions, estimate

% the pdf and use only the original support

clear variables; close all;

nrand = 500; npoint = 100;

L = 0; U = 2*pi;

phi = L + (U-L).*rand(nrand,1);

xi=linspace(L,U,npoint);

% classical histogram

b = histc(phi,xi);

bar(xi,b/sum(b)/(xi(2)-xi(1)),1);

h = findobj(gca,'Type','patch');

set(h,'FaceColor',.7*[1 1 1],'EdgeColor',0.8*[1 1 1]);

%

% classical kernel estimate with finite support

[pdf,xi] = ksdensity(phi,xi,'support',[L U]);

% adaptive kernel estimate without periodic support

[pdfadapt,xiadapt] = ssvkernel(phi,xi);

hold on;plot(xi,pdf,'-b',xiadapt,pdfadapt,'-r','LineWidth',2);

% extended dataset

phiext=[phi-(U-L) phi phi+(U-L)];

% extended x-points too

xiext=[ xi-(U-L) xi xi+(U-L)] ;

% but remove double entries from begin and end

xiext=unique(xiext);

% again adaptive kernel estimate

[pdfextadapt,xiextadapt] = ssvkernel(phiext,xiext);

% normalisation to integral(pdf)[L..U] = 1;

pdfextadapt=3*pdfextadapt(npoint:2*npoint);

% also back to original support

xiext=xiext(npoint:2*npoint);

%

plot(xiext,pdfextadapt,'-m','LineWidth',3);xlim([L U]);

title({'pdf of uniform distributed, random numbers';...

'with ~periodic~ boundaries'})

legend('histogram',['ksdensity with support [', num2str(L), '...', ...

num2str(U), ']'], 'adaptive estimation',...

'adaptive + periodic assumption');

fprintf(' The Integral is %f\n',trapz(pdfadapt)*(U-L)/npoint);

% comment: this approach is straight forward and gives already good results

% but it is not really a periodic boundary solution, to do this, a deeper change

% in the kernel estimation calculation with the mod function is required

% may be later, Thomas

Lee