"Economist" wrote in message <j7v396$ocf$1@newscl01ah.mathworks.com>...
> Given an empirical cumulative distribution function, i.e. calculated based on real data and thus a step function, how would you calculate the expected value, i.e. in this case, average spell length or expected spell length?
>
> The complicating factor is that the empirical cdf never reaches 100% because the data consists of spell lengths and some of the spells never end. Thus, I have a step cdf that flats out at 6070% and I would like to know the average of the spells that end at some stage.
          
Given a step function for the cumulative distribution function, you can easily calculate an expected value by simply multiplying each of the step amounts  that is, the probability of that step (using the 'diff' function)  by the corresponding value ("spell length") and summing. This is simply the integral of the "spell length" taken with respect to the cumulative distribution function, which is a valid method of computing any expected value. If your true distribution is actually continuous, this result would be an approximation to the true expected value. All that seems straightforward.
It is your statement in the second paragraph that is possibly disturbing. You say that you have data only for up to the 6070% cdf level, and presumably the missing 3040% pertain to longer "spell lengths" that have not yet occurred or that may never occur. All you can do with such incomplete data is to compute a conditional expected value, given that these "spell lengths" do not exceed whatever maximum level you have had time to wait for. This is a perfectly valid concept in conditional probability theory. It means you are restricting your probability space to those events that have a "spell length" within some specified maximum length. To accomplish this you would have to divide each of the step probabilities by the observed cdf maximum  .7 or .6 or whatever  in order to obtain the correct conditional probability values. These corrected conditional cdf values would then
automatically range over a full 100%. The point is that to obtain a valid expected value, whatever cumulative distribution you use must range from zero to one.
However, such a computed value could presumably be exceedingly sensitive to whatever spell length amount you stop at. If you wait for a longer time, this expected value could be greatly increased if a substantial number of longer lengths were encountered. To take an extreme example, if a final one percent are seen to have "spell lengths" of a hundred years or more, that would make drastic alterations in your computed value if you were somehow to wait that long. It seems to me that such an expected value is a comparatively meaningless concept, depending critically as it would on how long you decide to wait. Am I making sense here?
Roger Stafford
