Thread Subject: kernel smooth example

Subject: kernel smooth example

From: leo nidas

Date: 19 Nov, 2009 15:37:21

Message: 1 of 4


Hi there,

I am trying to smooth some data using kernel (without using ksdensity, just trying to understand the theory). I have a code (found it in a book actually as a simple example) that smooths via the normal kernel with some bandwidth h, and I understand it. I tried then to modify it so as it can accomodate right censoring. But my results are different (in the case of censoring) from ksdensity, and of course I am wrong because the density does not integrates to one.

I understand that I need to find the jumps (s) of the empirical cdf at each datum.
So s(i)=0 if there is a censored datum, else s(i)=jump. For this I made the function expandcdf which does what ecdf does, but calculates the f at each datum and not only at the events.

Aftrewards I multiply the kernel with s. That's what I hane understand but obviously I am wrong. Any help??

P.S. I tried "edit ksdensity" but this is too complicated programming for me and can't figure out what is happening..

clear all
close all
clc

%------------------- First kernel smooth without censoring:
% Generate standard normal random variables.
n = 3; data = randn(1,n); %
% We will get the density estimate at these x values.
x = linspace(-5,5,100); fhat = zeros(size(x));
h = 1.06*n^(-1/5);
hold on
for i=1:n
% get each kernel function evaluated at x
% centered at data
f = exp(-(1/(2*h^2))*(x-data(i)).^2)/sqrt(2*pi)/h;
plot(x,f/(n*h));
fhat = fhat+f/(n);
end
plot(x,fhat);
area=trapz(x,fhat)
plot(data,zeros(1,n),'o')

[f xi]=ksdensity(data,'width',h);
plot(xi,f,'r')
hold off

%-------end of kernel density with no censoring

%---------------------------Start the same but with right censoring---------

clear all
figure

n=5; data=randn(1,n); c=randn(1,n); status=(data>c); data=min(data,c);
% We will get the density estimate at these x values.
x = linspace(-7,7,100); fhat = zeros(size(x));
h = 1.06*n^(-1/5);
[xi fi s]=expandcdf(data,status);
% s(i) is the jump at the datum data(i), s(i)=0 if data(i) is censored
hold on

for i=1:n
% get each kernel function evaluated at x
% centered at data
f = s(i)*exp(-(1/(2*h^2))*(x-data(i)).^2)/sqrt(2*pi)/h;
plot(x,f/(n*h));
fhat = (fhat+f/(n));
end
plot(x,fhat);
area=trapz(x,fhat)
plot(data(status==0),zeros(1,length(data(status==0))),'o')
plot(data(status==1),zeros(1,length(data(status==1))),'or')

[fk xk]=ksdensity(data,'width',h,'censoring',status);
plot(xk,fk,'r')
hold off
hold off


%---------------------------------------------------------

function [xi,fexp,s]=expandcdf(xcen,status)

%does the same thing with ecdf, but calculates f at all points
%not only the events. (I.e. an expanded version of f and xi).
%Also yields the s, which is the jumps at each x.
%If x is censored the s=0, else s is the jump.

sizex=size(xcen);
sizes=size(status);
if sizex(1)==1; xcen=xcen';end
if sizes(1)==1; status=status';end
M=[xcen status];
M=sortrows([M(:,1) M(:,2)]);
xcen=M(:,1); status=M(:,2);
[fi xi]=ecdf(xcen,'censoring',status);

k=2;
n=length(xcen);
for i=1:n
    if status(i)==1 && i~=1
        ff(i)=ff(i-1);
    elseif status(i)==0
        ff(i)=fi(k);
        k=k+1;
    elseif status(i)==1 && i==1
        ff(i)=0;
    end
end

out=[xcen ff'];
xi=out(:,1);fexp=out(:,2);
s=[fexp(1);diff(fexp)];
end

Subject: kernel smooth example

From: Tom Lane

Date: 20 Nov, 2009 20:19:27

Message: 2 of 4

> I understand that I need to find the jumps (s) of the empirical cdf at
> each datum.
> So s(i)=0 if there is a censored datum, else s(i)=jump. For this I made
> the function expandcdf which does what ecdf does, but calculates the f at
> each datum and not only at the events.
>
> Aftrewards I multiply the kernel with s. That's what I hane understand but
> obviously I am wrong. Any help??

Leo, I can't say that I understand your code, but there are two things I
wonder about:

f = s(i)*exp(-(1/(2*h^2))*(x-data(i)).^2)/sqrt(2*pi)/h;
plot(x,f/(n*h));
fhat = (fhat+f/(n));

1. You divide f by h when you plot it. You also have a 1/h factor in the
definition of f. Do you want this both places?

2. You also divide f by n when you sum it and when you plot it. But doesn't
s take the place of the 1/n factor? Shouldn't I think of s as having the
value 1/n at each of n uncensored points? For censored data, it seems that s
is similar.

-- Tom

Subject: kernel smooth example

From: leo nidas

Date: 21 Nov, 2009 11:39:05

Message: 3 of 4


> Leo, I can't say that I understand your code, but there are two things I
> wonder about:
>
> f = s(i)*exp(-(1/(2*h^2))*(x-data(i)).^2)/sqrt(2*pi)/h;
> plot(x,f/(n*h));
> fhat = (fhat+f/(n));
>
> 1. You divide f by h when you plot it. You also have a 1/h factor in the
> definition of f. Do you want this both places?
>
> 2. You also divide f by n when you sum it and when you plot it. But doesn't
> s take the place of the 1/n factor? Shouldn't I think of s as having the
> value 1/n at each of n uncensored points? For censored data, it seems that s
> is similar.
>
> -- Tom
>


Thanx for your time Tom,

Well I modified a little but let me mention two things about the theory so as to let me know if I get this wrong.

The normal kernel is K(t)=1/(sqrt(2*pi))*exp(-t^2/2).

So K(( x-X(i) ) / h)=1/sqrt(2*pi)* exp( (x-X(i) )^2/(2*h^2) ) = let K
where h is the bandwidth.
The density estimate is then fhat=1/(n*h)*Σ( K(i) )

and in the presence of censoring fhat=s(i)/h*Σ(Κ(i)), where s(i) is the jump of the ecdf at X(i). So the code is below and seems to agree with ksdensity when I plot them together (ksdensity is with red and mine with blue).

My problem now is that I only seem to agree with ksdensity when the last datum is an event, so I am making a mistake again. If the last obsevation is censored I thought that my code would behave like ksdensity and would "cut to the ground". Not to mention that in this case, before that last censored datum, my estimator and the ksdensity's are different. Just run 2-3 times the code. The P(censored value)=0.5 with n=5, so it will hapen.

Thanx again Tom, I really appreciate your help and loking forward for your answer!


clear all

n=5; data=randn(1,n)+30; c=randn(1,n)+30; status=(data>c); data=min(data,c);
% We will get the density estimate at these x values.
x = linspace(25,35,100); fhat = zeros(size(x));
h = 1.06*n^(-1/5);
[xi fi s]=expandcdf(data,status); %expandcdf calculates cdf at all data, not only the events
% s(i) is the jump of the cdf at the datum data(i), s(i)=0 if data(i) is censored
hold on

for i=1:n
% get each kernel function evaluated at x
% centered at data
f = (s(i)/(h)) * exp(-(1/(2*h^2))*(x-xi(i)).^2)/(sqrt(2*pi));
fhat = fhat+f;
end
plot(x,fhat);
area=trapz(x,fhat)
plot(data(status==0),zeros(1,length(data(status==0))),'o')
plot(data(status==1),zeros(1,length(data(status==1))),'or')

[fk xk]=ksdensity(data,'width',h,'censoring',status');
plot(xk,fk,'r')
hold off
 

Subject: kernel smooth example

From: Tom Lane

Date: 23 Nov, 2009 18:56:08

Message: 4 of 4

> My problem now is that I only seem to agree with ksdensity when the last
> datum is an event, so I am making a mistake again. If the last obsevation
> is censored I thought that my code would behave like ksdensity and would
> "cut to the ground". Not to mention that in this case, before that last
> censored datum, my estimator and the ksdensity's are different. Just run
> 2-3 times the code. The P(censored value)=0.5 with n=5, so it will hapen.

Leo, the ksdensity function has treats this as a special case. It's not
really possible to estimate the density beyond the last censoring point C.

Pick a value C, and x and y values so that

    x < C < y

If there were no censoring, then the kernel density estimate f(x) would have
contributions from kernels near x and y.

With censoring, we could just throw away kernels centered above C. But then
the density estimate f(x) would be biased because it's missing the
contribution from y. And the density estimate f(y) would come only from
points like x whose kernels spill some probability past C. The density would
not be forced to "cut to the ground" as you describe it.

To adjust for this, ksdensity tries to reduce the bias by folding the kernel
at x back across the censoring point C. If the density is relatively flat
here, that will have the effect of reducing some bias. Some experimentation
shows that this adjustment works better than just throwing out the kernel
from y and allowing the kernel from x to spill across C.

-- Tom

Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

Contact us at files@mathworks.com