ecdf

Empirical cumulative distribution function

Syntax

Description

example

[f,x] = ecdf(y) returns the empirical cumulative distribution function (cdf), f, evaluated at the points in x, using the data in the vector y.

In survival and reliability analysis, this empirical cdf is called the Kaplan-Meier estimate. And the data might correspond to survival or failure times.

example

[f,x] = ecdf(y,Name,Value) returns the empirical function values, f, evaluated at the points in x, with additional options specified by one or more Name,Value pair arguments.

For example, you can specify the type of function to evaluate or which data is censored.

example

[f,x,flo,fup] = ecdf(___) also returns the 95% lower and upper confidence bounds for the evaluated function values. You can use any of the input arguments in the previous syntaxes.

ecdf computes the confidence bounds using Greenwood's formula. They are not simultaneous confidence bounds.

example

ecdf(___) plots the evaluated function.

ecdf(ax,___) plots the evaluated function using axes with the handle, ax, instead of the current axes returned by gca.

Examples

expand all

Compute Empirical Cumulative Distribution Function

Compute the Kaplan-Meier estimate of the cumulative distribution function (cdf) for simulated survival data.

Generate survival data from a Weibull distribution with parameters 3 and 1.

rng default;  % for reproducibility
failuretime = random('wbl',3,1,15,1);

Compute the Kaplan-Meier estimate of the cdf for survival data.

[f,x] = ecdf(failuretime);
[f,x]
ans =

         0    0.0895
    0.0667    0.0895
    0.1333    0.1072
    0.2000    0.1303
    0.2667    0.1313
    0.3333    0.2718
    0.4000    0.2968
    0.4667    0.6147
    0.5333    0.6684
    0.6000    1.3749
    0.6667    1.8106
    0.7333    2.1685
    0.8000    3.8350
    0.8667    5.5428
    0.9333    6.1910
    1.0000    6.9825

Plot the estimated cdf.

figure()
plot(x,f)

Empirical Hazard Function of Right-Censored Data

Compute and plot the hazard function of simulated right-censored survival data.

Generate failure times from a Birnbaum-Saunders distribution.

rng default  % for reproducibility
failuretime = random('birnbaumsaunders',0.3,1,100,1);

Assuming that the end of the study is at time 0.9, generate a logical array that indicates simulated failure times that are larger than 0.9 as censored data, and store this information in a vector.

T = 0.9;
cens = (failuretime>T);

Plot the empirical hazard function for the data.

ecdf(failuretime,'function','cumulative hazard',...
'censoring',cens,'bounds','on');

Compare Empirical Cumulative Distribution Function (CDF) with Known CDF

Generate right-censored survival data and compare the empirical cumulative distribution function (cdf) with the known cdf.

Generate failure times from an exponential distribution with mean failure time of 15.

rng default  % for reproducibility
y = exprnd(15,75,1);

Generate drop-out times from an exponential distribution with mean failure time of 30.

d = exprnd(30,75,1);

Generate the observed failure times. They are the minimum of the generated failure times and the drop-out times.

t = min(y,d);

Create a logical array that indicates generated failure times that are larger than the drop-out times. The data for which this is true are censored.

censored = (y>d);

Compute the empirical cdf and confidence bounds.

[f,x,flo,fup] = ecdf(t,'censoring',censored);

Plot the cdf and confidence bounds.

figure()
ecdf(t,'censoring',censored,'bounds','on');
hold on

Superimpose a plot of the known population cdf.

xx = 0:.1:max(t);
yy = 1-exp(-xx/15);
plot(xx,yy,'g-','LineWidth',2)
axis([0 50 0 1])
legend('Empirical','LCB','UCB','Population',...
       'Location','SE')
hold off

Empirical Survivor Function with 99% Confidence Bounds

Generate survival data and plot the empirical survivor function with 99% confidence bounds.

Generate lifetime data from a Weibull distribution with parameters 100 and 2.

rng default  % for reproducibility
R = wblrnd(100,2,100,1);

Plot the survivor function for the data with 99% confidence bounds.

ecdf(R,'function','survivor','alpha',0.01,'bounds','on')
hold on

Fit the Weibull survivor function.

x = 1:1:250;
wblsurv = 1-cdf('weibull',x,100,2);
plot(x,wblsurv,'g-','LineWidth',2)
legend('Empirical','LCB','UCB','Population',...
'Location','NE')

The survivor function based on the actual distribution is within the confidence bounds.

Input Arguments

expand all

y — Input datacolumn vector

Input data, specified as a column vector. For example, in survival or reliability analysis, data might be survival or failure times for each item or individual.

Data Types: single | double

ax — Axes handlehandle

Axes handle for the figure ecdf plots to, specified as a handle.

For instance, if h is a handle for a figure, then ecdf can plot to that figure as follows.

Example: ecdf(h,x)

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'censoring',c,'function','cumulative hazard','alpha',0.025,'bounds','on' specifies that ecdf returns the cumulative hazard function and plots the 97.5% confidence bounds, accounting for the censored data specified by vector c.

'censoring' — Indicator of censored dataarray of 0s (default) | vector of 0s and 1s

Indicator of censored data, specified as the comma-separated pair including 'censoring' and a Boolean array of the same size as x. Enter 1 for observations that are right-censored and 0 for observations that are fully observed. Default is all observations are fully observed.

For instance, if vector cdatastores the censored data information, you can enter the censoring information as follows.

Example: 'censoring',cdata

Data Types: logical

'frequency' — Frequency of observationsarray of 1s (default) | vector of nonnegative scalars

Frequency of observations, specified as the comma-separated pair consisting of 'frequency' and a vector containing nonnegative integer counts. This vector is the same size as the vector x. The jth element of this vector gives the number of times the jth element of x was observed. Default is one observation per element of x.

For instance, if failurefreq is a vector of frequencies, then you can enter it as follows.

Example: 'frequency',failurefreq

Data Types: single | double

'alpha' — Confidence level0.05 (default) | scalar value in the range (0,1)

Confidence level for the confidence interval of the evaluated function, specified as the comma-separated pair consisting of 'alpha' and a scalar value between in the range (0,1). Default is 0.05 for 95% confidence. For a given value alpha, the confidence level is 100(1-alpha)%.

For instance, for a 99% confidence interval, you can specify the alpha value as follows.

Example: 'alpha',0.01

Data Types: single | double

'function' — Type of function returned'cdf' (default) | 'survivor' | 'cumulative hazard'

Type of function that ecdf evaluates and returns, specified as the comma-separated pair consisting of 'function' and one of the following.

'cdf'Default. Cumulative distribution function.
'survivor'Survivor function.
'cumulative hazard'Cumulative hazard function.

Example: 'function','cumulative hazard'

Data Types: char

'bounds' — Indicator for including bounds'off' (default) | 'on'

Indicator for including bounds, specified as the comma-separated pair consisting of 'bounds' and one of the following.

'off'Default. Specify to omit bounds.
'on' Specify to include bounds.

    Note:   This name-value argument is used only for plotting.

Example: 'bounds','on'

Data Types: char

Output Arguments

expand all

f — Function valuescolumn vector

Function values evaluated at the points in x, returned as a column vector.

x — Distinct observed pointscolumn vector

Distinct observed points in data vector y, returned as a column vector.

flo — Lower confidence boundcolumn vector

Lower confidence bound for the evaluated function, returned as a column vector. ecdf computes the confidence bounds using Greenwood's formula. They are not simultaneous confidence bounds.

fup — Upper confidence boundcolumn vector

Upper confidence bound for the evaluated function, returned as a column vector. ecdf computes the confidence bounds using Greenwood's formula. They are not simultaneous confidence bounds.

More About

expand all

Greenwood's Formula

Approximation for the variance of Kaplan-Meier estimator.

The variance estimate is given by

V(S(t))=S2(t)ti<Tdiri(ridi),

where ri is the number at risk at time ti, and di is the number of failures at time ti.

References

[1] Cox, D. R., and D. Oakes. Analysis of Survival Data. London: Chapman & Hall, 1984.

[2] Lawless, J. F. Statistical Models and Methods for Lifetime Data. 2nd ed., Hoboken, NJ: John Wiley & Sons, Inc., 2003.

See Also

|

Was this topic helpful?