Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Expected value from a empirical cdf that is a step function

Subject: Expected value from a empirical cdf that is a step function

From: Economist

Date: 22 Oct, 2011 18:50:14

Message: 1 of 3

Given an empirical cumulative distribution function, i.e. calculated based on real data and thus a step function, how would you calculate the expected value, i.e. in this case, average spell length or expected spell length?

The complicating factor is that the empirical cdf never reaches 100% because the data consists of spell lengths and some of the spells never end. Thus, I have a step cdf that flats out at 60-70% and I would like to know the average of the spells that end at some stage.

Subject: Expected value from a empirical cdf that is a step function

From: TideMan

Date: 24 Oct, 2011 02:59:21

Message: 2 of 3

On Oct 23, 7:50 am, "Economist "
<starfaic...@dunflimblag.mailexpire.com> wrote:
> Given an empirical cumulative distribution function, i.e. calculated based on real data and thus a step function, how would you calculate the expected value, i.e. in this case, average spell length or expected spell length?
>
> The complicating factor is that the empirical cdf never reaches 100% because the data consists of spell lengths and some of the spells never end. Thus, I have a step cdf that flats out at 60-70% and I would like to know the average of the spells that end at some stage.

I'm afraid this post makes no sense whatever to me.
I use empirical CDFs all the time, but I've never seen one that is a
step function, nor one that never reaches 100%.

Most people calculate empirical CDFs as follows:
1. Make a histogram using hist or histc. Need to pay attention to
the bin sizes.
2. Calculate the PDF by dividing by the number of data.
3. Integrate the PDF using cumsum to produce the cumulative
probabilities - by definition, these will go from 0 to 1 (or 0 to
100%).

Subject: Expected value from a empirical cdf that is a step function

From: Roger Stafford

Date: 24 Oct, 2011 05:35:31

Message: 3 of 3

"Economist" wrote in message <j7v396$ocf$1@newscl01ah.mathworks.com>...
> Given an empirical cumulative distribution function, i.e. calculated based on real data and thus a step function, how would you calculate the expected value, i.e. in this case, average spell length or expected spell length?
>
> The complicating factor is that the empirical cdf never reaches 100% because the data consists of spell lengths and some of the spells never end. Thus, I have a step cdf that flats out at 60-70% and I would like to know the average of the spells that end at some stage.
- - - - - - - - - - -
  Given a step function for the cumulative distribution function, you can easily calculate an expected value by simply multiplying each of the step amounts - that is, the probability of that step (using the 'diff' function) - by the corresponding value ("spell length") and summing. This is simply the integral of the "spell length" taken with respect to the cumulative distribution function, which is a valid method of computing any expected value. If your true distribution is actually continuous, this result would be an approximation to the true expected value. All that seems straightforward.

  It is your statement in the second paragraph that is possibly disturbing. You say that you have data only for up to the 60-70% cdf level, and presumably the missing 30-40% pertain to longer "spell lengths" that have not yet occurred or that may never occur. All you can do with such incomplete data is to compute a conditional expected value, given that these "spell lengths" do not exceed whatever maximum level you have had time to wait for. This is a perfectly valid concept in conditional probability theory. It means you are restricting your probability space to those events that have a "spell length" within some specified maximum length. To accomplish this you would have to divide each of the step probabilities by the observed cdf maximum - .7 or .6 or whatever - in order to obtain the correct conditional probability values. These corrected conditional cdf values would then
automatically range over a full 100%. The point is that to obtain a valid expected value, whatever cumulative distribution you use must range from zero to one.

  However, such a computed value could presumably be exceedingly sensitive to whatever spell length amount you stop at. If you wait for a longer time, this expected value could be greatly increased if a substantial number of longer lengths were encountered. To take an extreme example, if a final one percent are seen to have "spell lengths" of a hundred years or more, that would make drastic alterations in your computed value if you were somehow to wait that long. It seems to me that such an expected value is a comparatively meaningless concept, depending critically as it would on how long you decide to wait. Am I making sense here?

Roger Stafford

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us