Nonparametric and Empirical Probability Distributions


In some situations, you cannot accurately describe a data sample using a parametric distribution. Instead, the probability density function (pdf) or cumulative distribution function (cdf) must be estimated from the data. Several options exist in Statistics Toolbox™ for estimating the pdf or cdf from sample data.

Kernel Distribution

Kernel Distribution produces a nonparametric probability density estimate that adapts itself to the data, rather than selecting a density with a particular parametric form and estimating the parameters. This distribution is defined by a kernel density estimator, a smoothing function that determines the shape of the curve used to generate the pdf, and a bandwidth value that controls the smoothness of the resulting density curve.

Piecewise Linear Distribution

A piecewise distribution is a probability distribution formed by estimating the pdf or cdf in pieces from the sample data. Piecewise Linear Distribution estimates an overall cdf by computing the cdf value at each individual point in the sample data, and linearly connects these values to form a continuous curve.

Empirical Cumulative Distribution Function

An empirical cumulative distribution function (ecdf) estimates the cdf of a random variable by assigning equal probability to each observation in a sample. It creates an exact match between the empirical cdf and the distribution of the sample data. However, the empirical cdf is not a smooth distribution, especially in the tails where data might be sparse. In this case, you can use Pareto tails to smooth the distribution in the tails.

For more information, see ecdf, paretotails, and coxphfit.

Pareto Tails

Pareto tails use a piecewise approach to improve the fit of a nonparametric cdf or pdf by smoothing the tails of the distribution. You can fit a kernel distribution, piecewise linear distribution, or empirical cdf to the middle data values, then fit generalized Pareto distributions (GPDs) to the tails. This technique is especially useful when the sample data is sparse in the tails.

For more information, see paretotails.

Triangular Distribution

Triangular Distribution provides a simplistic representation of the probability distribution when limited sample data is available. It uses the minimum, mode, and maximum value of the sample data as parameters, and linearly connects these points to estimate the pdf.

Was this topic helpful?