Products & Services Solutions Academia Support User Community Company

Learn more about Statistics Toolbox   

fitdist - Fit probability distribution to data

Syntax

PD = fitdist(X, DistName)
[PDCA, GN, GL] = fitdist(X, DistName, 'By', GroupVar)
... = fitdist(..., param1, val1, param2, val2, ...)

Description

PD = fitdist(X, DistName) fits the probability distribution specified by DistName to the data in the column vector X, and returns PD, an object representing the fitted distribution.

[PDCA, GN, GL] = fitdist(X, DistName, 'By', GroupVar) takes a grouping variable, GroupVar, fits the specified distribution to the data in X from each group, and returns PDCA, a cell array of the fitted probability distribution objects. GroupVar can also be a cell array of multiple grouping variables. GN is a cell array of group labels. GL is a cell array of grouping variable levels, with one column for each grouping variable.

... = fitdist(..., param1, val1, param2, val2, ...) specifies optional parameter name/value pairs, as described in the Parameter/Values table. Parameter and value names are case insensitive.

Inputs

X

A column vector of data.

    Note   Any NaN values in X are ignored by the fitting calculations. Additionally, any NaN values in the censoring vector or frequency vector will cause the corresponding values in X to be ignored by the fitting calculations.

DistName

A string specifying a distribution. Choices are:

  • 'kernel' — To fit a nonparametric kernel-smoothing distribution.

  • Any of the following to fit a parametric distribution:

    • 'beta'

    • 'binomial'

    • 'birnbaumsaunders'

    • 'exponential'

    • 'extreme value' or 'ev'

    • 'gamma'

    • 'generalized extreme value' or 'gev'

    • 'generalized pareto' or 'gp'

    • 'inversegaussian'

    • 'logistic'

    • 'loglogistic'

    • 'lognormal'

    • 'nakagami'

    • 'negative binomial' or 'nbin'

    • 'normal'

    • 'poisson'

    • 'rayleigh'

    • 'rician'

    • 'tlocationscale'

    • 'weibull' or 'wbl'

    For more information on these parametric distributions, see Distribution Reference.

GroupVar

A grouping variable or a cell array of multiple grouping variables. For more information on grouping variables, see Grouped Data.

ParameterValues
'censoring'

A Boolean vector the same size as X, containing 1s when the corresponding elements in X are right-censored observations and 0s when the corresponding elements are exact observations. Default is a vector of 0s.

    Note   Any NaN values in this censoring vector are ignored by the fitting calculations. Additionally, any NaN values in X or the frequency vector will cause the corresponding values in the censoring vector to be ignored by the fitting calculations.

'frequency'

A vector the same size as X, containing nonnegative integers specifying the frequencies for the corresponding elements in X. Default is a vector of 1s.

    Note   Any NaN values in this frequency vector are ignored by the fitting calculations. Additionally, any NaN values in X or the censoring vector will cause the corresponding values in the frequency vector to be ignored by the fitting calculations.

'options'

A structure created by the statset function to specify control parameters for the iterative fitting algorithm.

'n'

For 'binomial' distributions only, a positive integer specifying the N parameter (number of trials).

'theta'

For 'generalized pareto' distributions only, value specifying the theta (threshold) parameter for the generalized Pareto distribution. Default is 0.

'kernel'

For 'kernel' distributions only, a string specifying the type of kernel smoother to use. Choices are:

  • 'normal' (default)

  • 'box'

  • 'triangle'

  • 'epanechnikov'

'support'

For 'kernel' distributions only, any of the following to specify the support:

  • 'unbounded' — Default. If the density can extend over the whole real line.

  • 'positive' — To restrict it to positive values.

  • A two-element vector giving finite lower and upper limits for the support of the density.

'width'

For 'kernel' distributions only, a value specifying the bandwidth of the kernel smoothing window. The default is optimal for estimating normal densities, but you may want to choose a smaller value to reveal features such as multiple modes.

Outputs

PD

An object in either the ProbDistUnivKernel class or the ProbDistUnivParam class, which are derived from the ProbDist class.

PDCA

A cell array of the fitted probability distribution objects.

GN

A cell array of group labels.

GL

A cell array of grouping variable levels, with one column for each grouping variable.

Examples

Creating a ProbDistUnivKernel Object

  1. Load a MAT-file, included with the Statistics Toolbox software, which contains MPG, a column vector of data.

    load carsmall
    
  2. Create a ProbDistUnivKernel object by fitting a nonparametric kernel-smoothing distribution to the data:

    ksd = fitdist(MPG,'kernel')
    
    ksd = 
    
    kernel distribution
    
        Kernel = normal
        Bandwidth = 4.11428
        Support = unbounded

Creating a ProbDistUnivParam Object

  1. Load a MAT-file, included with the Statistics Toolbox software, which contains MPG, a column vector of data, and Origin, a cell array of seven grouping variables.

    load carsmall
    
  2. Create a cell array of ProbDistUnivParam objects by fitting a parametric distribution, namely a Weibull distribution, to the data, and also grouping the data. Since there is only one car from Italy, fitdist will return an error, since you cannot fit a distribution to a single observation.

    wd = fitdist(MPG,'weibull','by',Origin)

Algorithm

The fitdist function fits most distributions using maximum likelihood. Two exceptions are the normal and lognormal distributions with uncensored data. For the uncensored normal distribution, the estimated value of the sigma parameter is the square root of the unbiased estimate of the variance. For the uncensored lognormal distribution, the estimated value of the sigma parameter is the square root of the unbiased estimate of the variance of the log of the data.

References

[1] Johnson, N. L., S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions. Vol. 1, Hoboken, NJ: Wiley-Interscience, 1993.

[2] Johnson, N. L., S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions. Vol. 2, Hoboken, NJ: Wiley-Interscience, 1994.

[3] Bowman, A. W., and A. Azzalini. Applied Smoothing Techniques for Data Analysis. New York: Oxford University Press, 1997.

Alternatives

dfittool — Opens a graphical user interface for displaying fit distributions to data, or for fitting distributions to your data and displaying them over plots of the empirical distributions, by importing data from the workspace.

See Also

disttool
randtool
statset — Function that creates a structure that specifies control parameters for the iterative fitting algorithm
ProbDist class
ProbDistUnivKernel class
ProbDistUnivParam class
Distribution Reference — For more information on parametric distributions
Grouped Data — For more information on grouping variables
  


Recommended Products

Includes the most popular MATLAB recorded presentations with Q&A sessions led by MATLAB experts.

 © 1984-2009- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS