Empirical cumulative distribution function
Compute the Kaplan-Meier estimate of the cumulative distribution function (cdf) for simulated survival data.
Generate survival data from a Weibull distribution with parameters 3 and 1.
rng default; % for reproducibility failuretime = random('wbl',3,1,15,1);
Compute the Kaplan-Meier estimate of the cdf for survival data.
[f,x] = ecdf(failuretime); [f,x]
ans = 0 0.0895 0.0667 0.0895 0.1333 0.1072 0.2000 0.1303 0.2667 0.1313 0.3333 0.2718 0.4000 0.2968 0.4667 0.6147 0.5333 0.6684 0.6000 1.3749 0.6667 1.8106 0.7333 2.1685 0.8000 3.8350 0.8667 5.5428 0.9333 6.1910 1.0000 6.9825
Plot the estimated cdf.
Compute and plot the hazard function of simulated right-censored survival data.
Generate failure times from a Birnbaum-Saunders distribution.
rng default % for reproducibility failuretime = random('birnbaumsaunders',0.3,1,100,1);
Assuming that the end of the study is at time 0.9, generate a logical array that indicates simulated failure times that are larger than 0.9 as censored data, and store this information in a vector.
T = 0.9; cens = (failuretime>T);
Plot the empirical hazard function for the data.
ecdf(failuretime,'function','cumulative hazard',... 'censoring',cens,'bounds','on');
Generate right-censored survival data and compare the empirical cumulative distribution function (cdf) with the known cdf.
Generate failure times from an exponential distribution with mean failure time of 15.
rng default % for reproducibility y = exprnd(15,75,1);
Generate drop-out times from an exponential distribution with mean failure time of 30.
d = exprnd(30,75,1);
Generate the observed failure times. They are the minimum of the generated failure times and the drop-out times.
t = min(y,d);
Create a logical array that indicates generated failure times that are larger than the drop-out times. The data for which this is true are censored.
censored = (y>d);
Compute the empirical cdf and confidence bounds.
[f,x,flo,fup] = ecdf(t,'censoring',censored);
Plot the cdf and confidence bounds.
figure() ecdf(t,'censoring',censored,'bounds','on'); hold on
Superimpose a plot of the known population cdf.
xx = 0:.1:max(t); yy = 1-exp(-xx/15); plot(xx,yy,'g-','LineWidth',2) axis([0 50 0 1]) legend('Empirical','LCB','UCB','Population',... 'Location','SE') hold off
Generate survival data and plot the empirical survivor function with 99% confidence bounds.
Generate lifetime data from a Weibull distribution with parameters 100 and 2.
rng default % for reproducibility R = wblrnd(100,2,100,1);
Plot the survivor function for the data with 99% confidence bounds.
ecdf(R,'function','survivor','alpha',0.01,'bounds','on') hold on
Fit the Weibull survivor function.
x = 1:1:250; wblsurv = 1-cdf('weibull',x,100,2); plot(x,wblsurv,'g-','LineWidth',2) legend('Empirical','LCB','UCB','Population',... 'Location','NE')
The survivor function based on the actual distribution is within the confidence bounds.
y— Input datacolumn vector
Input data, specified as a column vector. For example, in survival or reliability analysis, data might be survival or failure times for each item or individual.
ax— Axes handlehandle
Axes handle for the figure
ecdf plots to,
specified as a handle.
For instance, if
h is a handle for a figure,
ecdf can plot to that figure as follows.
Specify optional comma-separated pairs of
Name is the argument
Value is the corresponding
Name must appear
inside single quotes (
You can specify several name and value pair
arguments in any order as
'censoring',c,'function','cumulative hazard','alpha',0.025,'bounds','on'specifies that
ecdfreturns the cumulative hazard function and plots the 97.5% confidence bounds, accounting for the censored data specified by vector
'censoring'— Indicator of censored dataarray of 0s (default) | vector of 0s and 1s
Indicator of censored data, specified as the comma-separated
'censoring' and a Boolean array
of the same size as
observations that are right-censored and
observations that are fully observed. Default is all observations
are fully observed.
For instance, if vector
cdatastores the censored
data information, you can enter the censoring information as follows.
'frequency'— Frequency of observationsarray of 1s (default) | vector of nonnegative scalars
Frequency of observations, specified as the comma-separated
pair consisting of
'frequency' and a vector containing
nonnegative integer counts. This vector is the same size as the vector
jth element of this vector gives the number
of times the
jth element of
observed. Default is one observation per element of
For instance, if
failurefreq is a vector
of frequencies, then you can enter it as follows.
'alpha'— Confidence level0.05 (default) | scalar value in the range (0,1)
Confidence level for the confidence interval of the evaluated
function, specified as the comma-separated pair consisting of
a scalar value between in the range (0,1). Default is 0.05 for 95%
confidence. For a given value
alpha, the confidence
For instance, for a 99% confidence interval, you can specify the alpha value as follows.
'function'— Type of function returned
Type of function that
ecdf evaluates and
returns, specified as the comma-separated pair consisting of
one of the following.
|Default. Cumulative distribution function.|
|Cumulative hazard function.|
'bounds'— Indicator for including bounds
Indicator for including bounds, specified as the comma-separated
pair consisting of
'bounds' and one of the following.
|Default. Specify to omit bounds.|
|Specify to include bounds.|
Note: This name-value argument is used only for plotting.
f— Function valuescolumn vector
Function values evaluated at the points in
returned as a column vector.
x— Distinct observed pointscolumn vector
Distinct observed points in data vector
returned as a column vector.
flo— Lower confidence boundcolumn vector
Lower confidence bound for the evaluated function, returned
as a column vector.
ecdf computes the confidence
bounds using Greenwood's
formula. They are not simultaneous confidence bounds.
Approximation for the variance of Kaplan-Meier estimator.
The variance estimate is given by
where ri is the number at risk at time ti, and di is the number of failures at time ti.
 Cox, D. R., and D. Oakes. Analysis of Survival Data. London: Chapman & Hall, 1984.
 Lawless, J. F. Statistical Models and Methods for Lifetime Data. 2nd ed., Hoboken, NJ: John Wiley & Sons, Inc., 2003.