Empirical cumulative distribution function
In survival and reliability analysis, this empirical cdf is called the Kaplan-Meier estimate. And the data might correspond to survival or failure times.
For example, you can specify the type of function to evaluate or which data is censored.
Compute the Kaplan-Meier estimate of the cumulative distribution function (cdf) for simulated survival data.
Generate survival data from a Weibull distribution with parameters 3 and 1.
rng('default') % for reproducibility failuretime = random('wbl',3,1,15,1);
Compute the Kaplan-Meier estimate of the cdf for survival data.
[f,x] = ecdf(failuretime); [f,x]
ans = 0 0.0895 0.0667 0.0895 0.1333 0.1072 0.2000 0.1303 0.2667 0.1313 0.3333 0.2718 0.4000 0.2968 0.4667 0.6147 0.5333 0.6684 0.6000 1.3749 0.6667 1.8106 0.7333 2.1685 0.8000 3.8350 0.8667 5.5428 0.9333 6.1910 1.0000 6.9825
Plot the estimated cdf.
Compute and plot the hazard function of simulated right-censored survival data.
Generate failure times from a Birnbaum-Saunders distribution.
rng('default') % for reproducibility failuretime = random('birnbaumsaunders',0.3,1,100,1);
Assuming that the end of the study is at time 0.9, generate a logical array that indicates simulated failure times that are larger than 0.9 as censored data, and store this information in a vector.
T = 0.9; cens = (failuretime>T);
Plot the empirical hazard function for the data.
ecdf(failuretime,'function','cumulative hazard',... 'censoring',cens,'bounds','on');
Generate right-censored survival data and compare the empirical cumulative distribution function (cdf) with the known cdf.
Generate failure times from an exponential distribution with mean failure time of 15.
rng('default') y = exprnd(15,75,1);
Generate drop-out times from an exponential distribution with mean failure time of 30.
d = exprnd(30,75,1);
Generate the observed failure times. They are the minimum of the generated failure times and the drop-out times.
t = min(y,d);
Create a logical array that indicates generated failure times that are larger than the drop-out times. The data for which this is true are censored.
censored = (y>d);
Compute the empirical cdf and confidence bounds.
[f,x,flo,fup] = ecdf(t,'censoring',censored);
Plot the cdf and confidence bounds.
figure() ecdf(t,'censoring',censored,'bounds','on'); hold on
Superimpose a plot of the known population cdf.
xx = 0:.1:max(t); yy = 1-exp(-xx/15); plot(xx,yy,'g-','LineWidth',2) axis([0 50 0 1]) legend('Empirical','LCB','UCB','Population',... 'Location','SE') hold off
Generate survival data and plot the empirical survivor function with 99% confidence bounds.
Generate lifetime data from a Weibull distribution with parameters 100 and 2.
rng('default') % For reproducibility R = wblrnd(100,2,100,1);
Plot the survivor function for the data with 99% confidence bounds.
ecdf(R,'function','survivor','alpha',0.01,'bounds','on') hold on
Fit the Weibull survivor function.
x = 1:1:250; wblsurv = 1-cdf('weibull',x,100,2); plot(x,wblsurv,'g-','LineWidth',2) legend('Empirical','LCB','UCB','Population',... 'Location','NE')
The survivor function based on the actual distribution is within the confidence bounds.
Input data, specified as a column vector. For example, in survival or reliability analysis, data might be survival or failure times for each item or individual.
Data Types: single | double
Axes handle for the figure ecdf plots to, specified as a handle.
For instance, if h is a handle for a figure, then ecdf can plot to that figure as follows.
Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.Example: 'censoring',c,'function','cumulative hazard','alpha',0.025,'bounds','on' specifies that ecdf returns the cumulative hazard function and plots the 97.5% confidence bounds, accounting for the censored data specified by vector c.
Indicator of censored data, specified as the comma-separated pair including 'censoring' and a Boolean array of the same size as x. Enter 1 for observations that are right-censored and 0 for observations that are fully observed. Default is all observations are fully observed.
For instance, if vector cdatastores the censored data information, you can enter the censoring information as follows.
Data Types: logical
Frequency of observations, specified as the comma-separated pair consisting of 'frequency' and a vector containing nonnegative integer counts. This vector is the same size as the vector x. The jth element of this vector gives the number of times the jth element of x was observed. Default is one observation per element of x.
For instance, if failurefreq is a vector of frequencies, then you can enter it as follows.
Data Types: single | double
Confidence level for the confidence interval of the evaluated function, specified as the comma-separated pair consisting of 'alpha' and a scalar value between in the range (0,1). Default is 0.05 for 95% confidence. For a given value alpha, the confidence level is 100(1-alpha)%.
For instance, for a 99% confidence interval, you can specify the alpha value as follows.
Data Types: single | double
Type of function that ecdf evaluates and returns, specified as the comma-separated pair consisting of 'function' and one of the following.
|'cdf'||Default. Cumulative distribution function.|
|'cumulative hazard'||Cumulative hazard function.|
Example: 'function','cumulative hazard'
Data Types: char
Function values evaluated at the points in x, returned as a column vector.
Distinct observed points in data vector y, returned as a column vector.
Lower confidence bound for the evaluated function, returned as a column vector. ecdf computes the confidence bounds using Greenwood's formula. They are not simultaneous confidence bounds.
Approximation for the variance of Kaplan-Meier estimator.
The variance estimate is given by
where ri is the number at risk at time ti, and di is the number of failures at time ti.
 Cox, D. R., and D. Oakes. Analysis of Survival Data. London: Chapman & Hall, 1984.
 Lawless, J. F. Statistical Models and Methods for Lifetime Data. 2nd ed., Hoboken, NJ: John Wiley & Sons, Inc., 2003.