Thread Subject: calculate probabilities ksdensity trapz

Subject: calculate probabilities ksdensity trapz

From: leo nidas

Date: 14 Nov, 2009 21:09:02

Message: 1 of 8


Hi there,

I would like to use the ksdensity function in order to nonparametrically estimate the density of some (positive supported) censored data.

Then I am interested in calulating probabilities of the form P(X<x) which is like integrating the density from 0 to x. I do this with trapz but I am not sure if I am doing this correclty. Specifically I am worried if let's say we order the data such us x(1)< x(2)< .... then the integration will begin from the x(1) and not from 0 ?

My goal is to calculate the probablities P(X<x(1)), P(X<x(2))... could you please check the for loop below? I calculate them all except for the first due to the problem I mention in the parenthesis below. But I am not sure..I think that I calculate the P(x(1)<X<x)

(Now about the trapz: trapz(2,3)=0 wich is ok, trapz(2.3,3)=0 wich is ok again, trapz(2.3,3.3) yield an error while it should give a zero right?)

Am I doing something wrong? Is there another way?

Thanx in advance for any answers!

n=200;

x=exprnd(3,n,1);
c=exprnd(3,n,1); %censoring variable
xcen=min(x,c); %my data
status=(x>c); %indicator

xcen=sort(xcen); %sort data

[f xi]=ksdensity(xcen,xcen,'censoring',status,'support','positive'); %
plot(xi,f)

%xcen and xi are the same

for i=2:n % I would like to say for i=1:n
    points=1:i;
    p(i)=trapz(xcen(points),f(points));
end
p

Subject: calculate probabilities ksdensity trapz

From: Peter Perkins

Date: 15 Nov, 2009 08:57:57

Message: 2 of 8

leo nidas wrote:

> I would like to use the ksdensity function in order to nonparametrically estimate the density of some (positive supported) censored data.
>
> Then I am interested in calulating probabilities of the form P(X<x) which is like integrating the density from 0 to x.

KSDENSITY acceots a 'Function' parameter, and one of the choices is 'CDF'. That would seem to do what you want.

Subject: calculate probabilities ksdensity trapz

From: leo nidas

Date: 15 Nov, 2009 16:36:02

Message: 3 of 8


> > I would like to use the ksdensity function in order to nonparametrically estimate the density of some (positive supported) censored data.
> >
> > Then I am interested in calulating probabilities of the form P(X<x) which is like integrating the density from 0 to x.
>
> KSDENSITY acceots a 'Function' parameter, and one of the choices is 'CDF'. That would seem to do what you want.


--------------------------
Thanx for your answer Tom!

That was exactly what I wanted. Now,if I go one step further:
What if I wanted to calculate an integral from 0 to x(i) of a form g(x)*f(x) where g(x) has a closed form let's say g(x)=exp(x), and f(x) is the kernel estimate of the pdf.

Then wouldn't I need to calculate the density at each point of x(i) and have the same problem again? (I.e. could not integrate from 0 but from x(1) to x(i)).

for i=2:n % I would like to say for i=1:n
    points=1:i;
    p(i)=trapz(xcen(points),exp(x(points)).*f(points));
end
p

Subject: calculate probabilities ksdensity trapz

From: Tom Lane

Date: 16 Nov, 2009 15:42:11

Message: 4 of 8

> Thanx for your answer Tom!

Leo, that was Peter, but I'll try to answer your next question.

> What if I wanted to calculate an integral from 0 to x(i) of a form
> g(x)*f(x) where g(x) has a closed form let's say g(x)=exp(x), and f(x) is
> the kernel estimate of the pdf.
>
> Then wouldn't I need to calculate the density at each point of x(i) and
> have the same problem again? (I.e. could not integrate from 0 but from
> x(1) to x(i)).

You can pass in the places at which you want to evaluate the density as the
second input to ksdensity. You can evaluate it at something like
linspace(0,5.6). This sort of grid is more typical for trapezoidal
integration, rather than the scattered data values.

For censored data, it's possible that the kernel smooth density doesn't
integrate to 1. This happens when the last observed failure time comes
before the last censoring time. This is the same condition that causes the
Kaplan-Meier estimate of the cdf not to reach 1. If this happens with your
data, you won't be able to estimate distribution properties like the mean
that require knowing the entire distribution. You may be able to estimate
such properties as the median, though.

Here's another idea that may or may not be useful. The way you phrase the
problem above, you are computing the expected value of g(x) under the
distribution with density f(x). An alternative way to estimate things like
this is to compute a random sample x from f, and estimate the expected value
as the sample mean of g(x). This is more tractable under some conditions,
but may require lots of samples to get accuracy comparable to what you would
get from ordinary numerical integration. But this, too, won't work if you
can't generate the entire distribution.

-- Tom

Subject: calculate probabilities ksdensity trapz

From: leo nidas

Date: 19 Nov, 2009 08:43:06

Message: 5 of 8


>
> > What if I wanted to calculate an integral from 0 to x(i) of a form
> > g(x)*f(x) where g(x) has a closed form let's say g(x)=exp(x), and f(x) is
> > the kernel estimate of the pdf.
> >
> > Then wouldn't I need to calculate the density at each point of x(i) and
> > have the same problem again? (I.e. could not integrate from 0 but from
> > x(1) to x(i)).
>
> You can pass in the places at which you want to evaluate the density as the
> second input to ksdensity. You can evaluate it at something like
> linspace(0,5.6). This sort of grid is more typical for trapezoidal
> integration, rather than the scattered data values.
>

So if I understand correctly if I have n=200 data, and want to evaluate P(X<x(i)), for each x, then I should re-smooth with linspace(0,x(i)) 200 times the density f(x)?
(It is feasible since ksdensity gets fast the job done, just asking)


> For censored data, it's possible that the kernel smooth density doesn't
> integrate to 1. This happens when the last observed failure time comes
> before the last censoring time. This is the same condition that causes the
> Kaplan-Meier estimate of the cdf not to reach 1.

I am aware of this problem if the last datum is censored. Well lets consider the problem of calculating probabilities P(X>x(i)). But let's say we are interested in calculating these probabilitities for all x's except the last. Then we can find again the same way as before the probs P(X<x(i)) so as to get 1-P(X<x(i)). This way we don't care about the fact that the kernel density is cut at last censoring, right?


Another question, is there any function that smooth the f(x,y)? Remember that x is suffering from censoring.

Thanx in advance for any answers!!

Subject: calculate probabilities ksdensity trapz

From: Tom Lane

Date: 20 Nov, 2009 20:08:51

Message: 6 of 8

> So if I understand correctly if I have n=200 data, and want to evaluate
> P(X<x(i)), for each x, then I should re-smooth with linspace(0,x(i)) 200
> times the density f(x)?
> (It is feasible since ksdensity gets fast the job done, just asking)

Leo, I only intended to suggest that you can evaluate the density over a
grid rather than at irregularly-spaced data points. You could compute a
separate grid for each upper limit of integration. Maybe you could also try
defining a single grid and using cumulative sums to compute the cdf.

> I am aware of this problem if the last datum is censored. Well lets
> consider the problem of calculating probabilities P(X>x(i)). But let's say
> we are interested in calculating these probabilitities for all x's except
> the last. Then we can find again the same way as before the probs
> P(X<x(i)) so as to get 1-P(X<x(i)). This way we don't care about the fact
> that the kernel density is cut at last censoring, right?

You can compute a smaller set of probabilities. I assumed you wanted to
compute the expectation of g, but I now see you only wrote your integral to
go up to a finite value. Maybe this won't be a problem, then.

> Another question, is there any function that smooth the f(x,y)? Remember
> that x is suffering from censoring.

I don't know what f(x,y) is. The kernel cdf is a smoothed version of the
empirical cdf, and the kernel density is a smoothed version of the empirical
discrete probability function, but those are functions of a single argument.

-- Tom

Subject: calculate probabilities ksdensity trapz

From: leo nidas

Date: 21 Nov, 2009 11:47:01

Message: 7 of 8


> I don't know what f(x,y) is. The kernel cdf is a smoothed version of the
> empirical cdf, and the kernel density is a smoothed version of the empirical
> discrete probability function, but those are functions of a single argument.
>
> -- Tom
>


Thanx Tom, I really appreciate your help once again.

What I meant with f(x,y) , was the joint distribution of the random varialbles X, Y. So is there any function that could smooth that by using kernels, so as to get a 3d graph (x,y,f) , in the presence of censoring?

I saw in the FEX the kde2d but no censoring is allowed..

Thanx again!

Subject: calculate probabilities ksdensity trapz

From: Tom Lane

Date: 23 Nov, 2009 18:57:01

Message: 8 of 8

> What I meant with f(x,y) , was the joint distribution of the random
> varialbles X, Y. So is there any function that could smooth that by using
> kernels, so as to get a 3d graph (x,y,f) , in the presence of censoring?

Leo, I haven't seen any code for bivariate kernel density estimating with
censoring.

-- Tom

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
ksdensity trapz leo nidas 14 Nov, 2009 16:14:03
rssFeed for this Thread

Contact us at files@mathworks.com