Due to numerical round-off error from the fft.m function, it is possible to get density values of -1.38e-018 (instead of 0) and cdf values slightly larger than 1.
If this is a problem, one can correct the output from kde by overwriting:
Dear George, the kde function works as it should. There is no problem with the kde. What you call a problem is actually one of the main strengths of the routine.
By typing data = [d1;d1;d1;d1;d1;d2;d3];
you are creating DISCRETE data, because you create ties (the same values appear multiple times). For a truly continuous data, there can be no ties or repeated values!!!
If you have ties, then the data CANNOT be continuous be definition.
The kde.m CORRECTLY recognizes that the data you have provided is perfectly discrete and since discrete data does not need smoothing, the selected bandwidth should be zero. kde.m is the only routine I am aware of that does this correctly, every other routine fails this BASIC theoretical test.
Zdravko's kernel density estimator works a lot more quicker than traditional methods although I am getting spurious artifacts due to too low a bandwidth selected of 0.02 (a third smaller than when i used another selector which minimised expected L2 loss between estimate and underlying). The latter bandwidth works smoothly but takes a bit longer. Also, I get negative densities at the outliers so I adjusted the minmax boundaries. Is there a way to alter the estimator to avoid this issue?
hi, it's a really a fast and robust script. I have a question about what the time complexity (in terms of data size n) is, namely O(n) or O(n^2)? Could someone provide some time complexity analysis ? Great thanks~
The 1d version of the kernel estimator also provides cdf values at the representative points. I would like to use these for 2d too. I don't have a strong background on this and I was not able to compute it.
Does anyone know how to compute cdf for 2d kernel?
Zdravko's kernel density estimator works a lot more quicker than traditional methods although I am getting spurious artifacts due to too low a bandwidth selected of 0.02 (a third smaller than when i used another selector which minimised expected L2 loss between estimate and underlying). The latter bandwidth works smoothly but takes a bit longer. Also, I get negative densities at the outliers so I adjusted the minmax boundaries. Is there a way to alter the estimator to avoid this issue?
hi, it's a really a fast and robust script. I have a question about what the time complexity (in terms of data size n) is, namely O(n) or O(n^2)? Could someone provide some time complexity analysis ? Great thanks~
Hi,
The 1d version of the kernel estimator also provides cdf values at the representative points. I would like to use these for 2d too. I don't have a strong background on this and I was not able to compute it.
Does anyone know how to compute cdf for 2d kernel?
Comment only