Documentation

### This is machine translation

Translated by
Mouseover text to see original. Click the button below to return to the English version of the page.

# ecdf

Empirical cumulative distribution function

## Syntax

``````[f,x] = ecdf(y)``````
``````[f,x] = ecdf(y,Name,Value)``````
``````[f,x,flo,fup] = ecdf(___)``````
``ecdf(___)``
``ecdf(ax,___)``

## Description

example

``````[f,x] = ecdf(y)``` returns the empirical cumulative distribution function (cdf), `f`, evaluated at the points in `x`, using the data in the vector `y`.In survival and reliability analysis, this empirical cdf is called the Kaplan-Meier estimate. And the data might correspond to survival or failure times.```

example

``````[f,x] = ecdf(y,Name,Value)``` returns the empirical function values, `f`, evaluated at the points in `x`, with additional options specified by one or more `Name,Value` pair arguments.For example, you can specify the type of function to evaluate or which data is censored.```

example

``````[f,x,flo,fup] = ecdf(___)``` also returns the 95% lower and upper confidence bounds for the evaluated function values. You can use any of the input arguments in the previous syntaxes.`ecdf` computes the confidence bounds using Greenwood's formula. They are not simultaneous confidence bounds.```

example

````ecdf(___)` plots the evaluated function.```
````ecdf(ax,___)` plots the evaluated function using axes with the handle, `ax`, instead of the current axes returned by `gca`.```

## Examples

collapse all

Compute the Kaplan-Meier estimate of the cumulative distribution function (cdf) for simulated survival data.

Generate survival data from a Weibull distribution with parameters 3 and 1.

```rng('default') % for reproducibility failuretime = random('wbl',3,1,15,1);```

Compute the Kaplan-Meier estimate of the cdf for survival data.

```[f,x] = ecdf(failuretime); [f,x]```
```ans = 16×2 0 0.0895 0.0667 0.0895 0.1333 0.1072 0.2000 0.1303 0.2667 0.1313 0.3333 0.2718 0.4000 0.2968 0.4667 0.6147 0.5333 0.6684 0.6000 1.3749 ⋮ ```

Plot the estimated cdf.

`ecdf(failuretime)`

Compute and plot the hazard function of simulated right-censored survival data.

Generate failure times from a Birnbaum-Saunders distribution.

```rng default % for reproducibility failuretime = random('birnbaumsaunders',0.3,1,100,1);```

Assuming that the end of the study is at time 0.9, generate a logical array that indicates simulated failure times that are larger than 0.9 as censored data, and store this information in a vector.

```T = 0.9; cens = (failuretime>T);```

Plot the empirical hazard function for the data.

```ecdf(failuretime,'function','cumulative hazard',... 'censoring',cens,'bounds','on');```

Generate right-censored survival data and compare the empirical cumulative distribution function (cdf) with the known cdf.

Generate failure times from an exponential distribution with mean failure time of 15.

```rng default % for reproducibility y = exprnd(15,75,1);```

Generate drop-out times from an exponential distribution with mean failure time of 30.

`d = exprnd(30,75,1);`

Generate the observed failure times. They are the minimum of the generated failure times and the drop-out times.

`t = min(y,d);`

Create a logical array that indicates generated failure times that are larger than the drop-out times. The data for which this is true are censored.

`censored = (y>d);`

Compute the empirical cdf and confidence bounds.

`[f,x,flo,fup] = ecdf(t,'censoring',censored);`

Plot the cdf and confidence bounds.

```figure() ecdf(t,'censoring',censored,'bounds','on'); hold on```

Superimpose a plot of the known population cdf.

```xx = 0:.1:max(t); yy = 1-exp(-xx/15); plot(xx,yy,'g-','LineWidth',2) axis([0 50 0 1]) legend('Empirical','LCB','UCB','Population',... 'Location','SE') hold off```

Generate survival data and plot the empirical survivor function with 99% confidence bounds.

Generate lifetime data from a Weibull distribution with parameters 100 and 2.

```rng default % for reproducibility R = wblrnd(100,2,100,1);```

Plot the survivor function for the data with 99% confidence bounds.

```ecdf(R,'function','survivor','alpha',0.01,'bounds','on') hold on```

Fit the Weibull survivor function.

```x = 1:1:250; wblsurv = 1-cdf('weibull',x,100,2); plot(x,wblsurv,'g-','LineWidth',2) legend('Empirical','LCB','UCB','Population',... 'Location','NE')```

The survivor function based on the actual distribution is within the confidence bounds.

## Input Arguments

collapse all

Input data, specified as a column vector. For example, in survival or reliability analysis, data might be survival or failure times for each item or individual.

Data Types: `single` | `double`

Axes handle for the figure `ecdf` plots to, specified as a handle.

For instance, if `h` is a handle for a figure, then `ecdf` can plot to that figure as follows.

Example: `ecdf(h,x)`

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `'censoring',c,'function','cumulative hazard','alpha',0.025,'bounds','on'` specifies that `ecdf` returns the cumulative hazard function and plots the 97.5% confidence bounds, accounting for the censored data specified by vector `c`.

Indicator of censored data, specified as the comma-separated pair including `'censoring'` and a Boolean array of the same size as `x`. Enter `1` for observations that are right-censored and `0` for observations that are fully observed. Default is all observations are fully observed.

For instance, if vector `cdata` stores the censored data information, you can enter the censoring information as follows.

Example: `'censoring',cdata`

Data Types: `logical`

Frequency of observations, specified as the comma-separated pair consisting of `'frequency'` and a vector containing nonnegative integer counts. This vector is the same size as the vector `x`. The `j`th element of this vector gives the number of times the `j`th element of `x` was observed. Default is one observation per element of `x`.

For instance, if `failurefreq` is a vector of frequencies, then you can enter it as follows.

Example: `'frequency',failurefreq`

Data Types: `single` | `double`

Confidence level for the confidence interval of the evaluated function, specified as the comma-separated pair consisting of `'alpha'` and a scalar value between in the range (0,1). Default is 0.05 for 95% confidence. For a given value `alpha`, the confidence level is `100(1-alpha)`%.

For instance, for a 99% confidence interval, you can specify the alpha value as follows.

Example: `'alpha',0.01`

Data Types: `single` | `double`

Type of function that `ecdf` evaluates and returns, specified as the comma-separated pair consisting of `'function'` and one of the following.

 `'cdf'` Default. Cumulative distribution function. `'survivor'` Survivor function. `'cumulative hazard'` Cumulative hazard function.

Example: `'function','cumulative hazard'`

Indicator for including bounds, specified as the comma-separated pair consisting of `'bounds'` and one of the following.

 `'off'` Default. Specify to omit bounds. `'on'` Specify to include bounds.

### Note

This name-value argument is used only for plotting.

Example: `'bounds','on'`

## Output Arguments

collapse all

Function values evaluated at the points in `x`, returned as a column vector.

Distinct observed points in data vector `y`, returned as a column vector.

Lower confidence bound for the evaluated function, returned as a column vector. `ecdf` computes the confidence bounds using Greenwood's formula. They are not simultaneous confidence bounds.

Upper confidence bound for the evaluated function, returned as a column vector. `ecdf` computes the confidence bounds using Greenwood's formula. They are not simultaneous confidence bounds.

collapse all

### Greenwood’s Formula

Approximation for the variance of Kaplan-Meier estimator.

The variance estimate is given by

`$V\left(S\left(t\right)\right)={S}^{2}\left(t\right)\sum _{{t}_{i}`

where ri is the number at risk at time ti, and di is the number of failures at time ti.

## References

[1] Cox, D. R., and D. Oakes. Analysis of Survival Data. London: Chapman & Hall, 1984.

[2] Lawless, J. F. Statistical Models and Methods for Lifetime Data. 2nd ed., Hoboken, NJ: John Wiley & Sons, Inc., 2003.