KERNEL : mean integrated squared error- Bandwidth Selection

5 views (last 30 days)
Hello all,
I have my set of data and I estimated the function using kernel, however the Bandwidth must be estimated for a correct density from the given data. I just put 0.2 for initial start so I will be able to play around with the bandwidth before looking into proper method but the kernel didn't work for width = 0.2,however for another set of data it did work. there is more proffesional method to pick the best bandwith for the given data and it is using mean integrated squared error, Is there any in-built function in Matlab, I didn't seem to find any, not sure if there is a method in one of the toolboxes not available to me. I would like to know why the width 0.2 is not working to my code??..
Thank you all,
sample1 = [6.52689332414481E7
6.52693837402845E7
6.5270203713004336E7
6.527122138667133E7
6.52717237415096E7
6.527173346449997E7
6.527211590239384E7
6.5272540473269284E7
6.527282568117965E7
6.527314005807114E7
];
x = sample1.';
[xi,f]=ksdensity(x,'width',0.2);
plot(f,xi);
line(repmat(x,2,1),repmat([0;0.1*max(xi)],1,length(x)),'color','g' );

Accepted Answer

Ilya
Ilya on 29 Aug 2011
The "right" width depends on your assumptions about the fitted distribution. MATLAB does not choose the bandwidth "randomly". It computes the optimal bandwidth for the normal distribution:
help ksdensity
[snip]
[F,XI,U]=ksdensity(...) also returns the bandwidth of the kernel smoothing window.
[snip]
'width' The bandwidth of the kernel smoothing window. The default is optimal for estimating normal densities, but you may want to choose a smaller value to reveal features such as multiple modes.
If you look at that Wikipedia article, note this paragraph:
Neither the AMISE nor the hAMISE formulas are able to be used directly since they involve the unknown density function ƒ or its second derivative ƒ'', so a variety of automatic, data-based methods have been developed for selecting the bandwidth. Many review studies have been carried out to compare their efficacities,[6][7][8][9][10] with the general consensus that the plug-in selectors[11] and cross validation selectors[12][13][14] are the most useful over a wide range of data sets.
I suggest that you choose the optimal bandwidth by cross-validation using ksdensity and crossval functions. Often the approximation based on the normal distribution (which you get by default from ksdensity) is good enough. -Ilya

More Answers (1)

the cyclist
the cyclist on 28 Aug 2011
In your case, your data are order-of-magnitude 1e7, but you are choosing a width of 0.2, so it is much, much too tiny. I suspect you do not have a very good understanding of what kernel density estimation is doing, so you might want to read some basic articles to understand the technique better. This is not a bad place to start:
The easiest thing to do is to not include the 'width' parameter at all, and let MATLAB choose it for you:
[xi,f] = ksdensity(x);
  1 Comment
Susan
Susan on 28 Aug 2011
Thank you Cyclist, I actually understand how the kernel works by creating Gaussian curve on the center of each point then summing it up to get the peak for that part but I was not aware because of the data magnitude I have to make the width reasonable but now I understand that, I know that matlab will choose it for me randomly but I need for my work to use a technique to make sure that I am picking the right width so the kernel would not be over smoothed etc.. Is there any in-built function or way in Matlab to achieve this?? I read that mean integrated squared error is best bandwidth selection?? any idea how it works or will you recommend any Matlab source to look at??
Thank you,
Susan

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!