|
Hi there,
I am trying to smooth some data using kernel (without using ksdensity, just trying to understand the theory). I have a code (found it in a book actually as a simple example) that smooths via the normal kernel with some bandwidth h, and I understand it. I tried then to modify it so as it can accomodate right censoring. But my results are different (in the case of censoring) from ksdensity, and of course I am wrong because the density does not integrates to one.
I understand that I need to find the jumps (s) of the empirical cdf at each datum.
So s(i)=0 if there is a censored datum, else s(i)=jump. For this I made the function expandcdf which does what ecdf does, but calculates the f at each datum and not only at the events.
Aftrewards I multiply the kernel with s. That's what I hane understand but obviously I am wrong. Any help??
P.S. I tried "edit ksdensity" but this is too complicated programming for me and can't figure out what is happening..
clear all
close all
clc
%------------------- First kernel smooth without censoring:
% Generate standard normal random variables.
n = 3; data = randn(1,n); %
% We will get the density estimate at these x values.
x = linspace(-5,5,100); fhat = zeros(size(x));
h = 1.06*n^(-1/5);
hold on
for i=1:n
% get each kernel function evaluated at x
% centered at data
f = exp(-(1/(2*h^2))*(x-data(i)).^2)/sqrt(2*pi)/h;
plot(x,f/(n*h));
fhat = fhat+f/(n);
end
plot(x,fhat);
area=trapz(x,fhat)
plot(data,zeros(1,n),'o')
[f xi]=ksdensity(data,'width',h);
plot(xi,f,'r')
hold off
%-------end of kernel density with no censoring
%---------------------------Start the same but with right censoring---------
clear all
figure
n=5; data=randn(1,n); c=randn(1,n); status=(data>c); data=min(data,c);
% We will get the density estimate at these x values.
x = linspace(-7,7,100); fhat = zeros(size(x));
h = 1.06*n^(-1/5);
[xi fi s]=expandcdf(data,status);
% s(i) is the jump at the datum data(i), s(i)=0 if data(i) is censored
hold on
for i=1:n
% get each kernel function evaluated at x
% centered at data
f = s(i)*exp(-(1/(2*h^2))*(x-data(i)).^2)/sqrt(2*pi)/h;
plot(x,f/(n*h));
fhat = (fhat+f/(n));
end
plot(x,fhat);
area=trapz(x,fhat)
plot(data(status==0),zeros(1,length(data(status==0))),'o')
plot(data(status==1),zeros(1,length(data(status==1))),'or')
[fk xk]=ksdensity(data,'width',h,'censoring',status);
plot(xk,fk,'r')
hold off
hold off
%---------------------------------------------------------
function [xi,fexp,s]=expandcdf(xcen,status)
%does the same thing with ecdf, but calculates f at all points
%not only the events. (I.e. an expanded version of f and xi).
%Also yields the s, which is the jumps at each x.
%If x is censored the s=0, else s is the jump.
sizex=size(xcen);
sizes=size(status);
if sizex(1)==1; xcen=xcen';end
if sizes(1)==1; status=status';end
M=[xcen status];
M=sortrows([M(:,1) M(:,2)]);
xcen=M(:,1); status=M(:,2);
[fi xi]=ecdf(xcen,'censoring',status);
k=2;
n=length(xcen);
for i=1:n
if status(i)==1 && i~=1
ff(i)=ff(i-1);
elseif status(i)==0
ff(i)=fi(k);
k=k+1;
elseif status(i)==1 && i==1
ff(i)=0;
end
end
out=[xcen ff'];
xi=out(:,1);fexp=out(:,2);
s=[fexp(1);diff(fexp)];
end
|