# fitdist

Fit probability distribution object to data

## Syntax

• `pd = fitdist(x,distname)` example
• `pd = fitdist(x,distname,Name,Value)` example
• ```[pdca,gn,gl] = fitdist(x,distname,'By',groupvar)``` example
• ```[pdca,gn,gl] = fitdist(x,distname,'By',groupvar,Name,Value)``` example

## Description

example

````pd = fitdist(x,distname)` creates a probability distribution object by fitting the distribution specified by `distname` to the data in column vector `x`.```

example

````pd = fitdist(x,distname,Name,Value)` creates the probability distribution object with additional options specified by one or more name-value pair arguments. For example, you can indicate censored data or specify control parameters for the iterative fitting algorithm.```

example

``````[pdca,gn,gl] = fitdist(x,distname,'By',groupvar)``` creates probability distribution objects by fitting the distribution specified by `distname` to the data in `x` based on the grouping variable `groupvar`. It returns a cell array of fitted probability distribution objects, `pdca`, a cell array of group labels, `gn`, and a cell array of grouping variable levels, `gl`.```

example

``````[pdca,gn,gl] = fitdist(x,distname,'By',groupvar,Name,Value)``` returns the above output arguments using additional options specified by one or more name-value pair arguments. For example, you can indicate censored data or specify control parameters for the iterative fitting algorithm.```

## Examples

collapse all

### Fit a Normal Distribution to Data

Load the sample data. Create a vector containing the patients' weight data.

```load hospital x = hospital.Weight; ```

Create a normal distribution object by fitting it to the data.

```pd = fitdist(x,'Normal') ```
```pd = NormalDistribution Normal distribution mu = 154 [148.728, 159.272] sigma = 26.5714 [23.3299, 30.8674] ```

Plot the pdf of the distribution.

```x_values = 50:1:250; y = pdf(pd,x_values); plot(x_values,y,'LineWidth',2) ```

### Fit a Kernel Distribution to Data

Load the sample data. Create a vector containing the patients' weight data.

```load hospital x = hospital.Weight; ```

Create a kernel distribution object by fitting it to the data. Use the Epanechnikov kernel function.

```pd = fitdist(x,'Kernel','Kernel','epanechnikov') ```
```pd = KernelDistribution Kernel = epanechnikov Bandwidth = 14.3792 Support = unbounded ```

Plot the pdf of the distribution.

```x_values = 50:1:250; y = pdf(pd,x_values); plot(x_values,y) ```

### Fit Normal Distributions to Grouped Data

Load the sample data. Create a vector containing the patients' weight data.

```load hospital x = hospital.Weight; ```

Create normal distribution objects by fitting them to the data, grouped by patient gender.

```gender = hospital.Sex; [pdca,gn,gl] = fitdist(x,'Normal','By',gender) ```
```pdca = [1x1 prob.NormalDistribution] [1x1 prob.NormalDistribution] gn = 'Female' 'Male' gl = 'Female' 'Male' ```

The cell array `pdca` contains two probability distribution objects, one for each gender group. The cell array `gn` contains two strings of the group labels. The cell array `gl` contains two strings of the group levels.

View each distribution in the cell array `pdca` to compare the mean, `mu`, and the standard deviation, `sigma`, grouped by patient gender.

```female = pdca{1} % Distribution for females ```
```female = NormalDistribution Normal distribution mu = 130.472 [128.183, 132.76] sigma = 8.30339 [6.96947, 10.2736] ```
```male = pdca{2} % Distribution for males ```
```male = NormalDistribution Normal distribution mu = 180.532 [177.833, 183.231] sigma = 9.19322 [7.63933, 11.5466] ```

Compute the pdf of each distribution.

```x_values = 50:1:250; femalepdf = pdf(female,x_values); malepdf = pdf(male,x_values); ```

Plot the pdfs for a visual comparison of weight distribution by gender.

```figure plot(x_values,femalepdf,'LineWidth',2) hold on plot(x_values,malepdf,'Color','r','LineStyle',':','LineWidth',2) legend(gn,'Location','NorthEast') hold off ```

### Fit Kernel Distributions to Grouped Data

Load the sample data. Create a vector containing the patients' weight data.

```load hospital x = hospital.Weight; ```

Create kernel distribution objects by fitting them to the data, grouped by patient gender. Use a triangular kernel function.

```gender = hospital.Sex; [pdca,gn,gl] = fitdist(x,'Kernel','By',gender,'Kernel','triangle'); ```

View each distribution in the cell array `pdca` to see the kernel distributions for each gender.

```female = pdca{1} % Distribution for females ```
```female = KernelDistribution Kernel = triangle Bandwidth = 4.25894 Support = unbounded ```
```male = pdca{2} % Distribution for males ```
```male = KernelDistribution Kernel = triangle Bandwidth = 5.08961 Support = unbounded ```

Compute the pdf of each distribution.

```x_values = 50:1:250; femalepdf = pdf(female,x_values); malepdf = pdf(male,x_values); ```

Plot the pdfs for a visual comparison of weight distribution by gender.

```figure plot(x_values,femalepdf,'LineWidth',2) hold on plot(x_values,malepdf,'Color','r','LineStyle',':','LineWidth',2) legend(gn,'Location','NorthEast') hold off ```

## Input Arguments

collapse all

### `x` — Input datacolumn vector

Input data, specified as a column vector. `fitdist` ignores `NaN` values in `x`. Additionally, any `NaN` values in the censoring vector or frequency vector causes `fitdist` to ignore the corresponding values in `x`.

Data Types: `single` | `double`

### `distname` — Distribution namestring

Distribution name, specified as one of the following strings. The distribution specified by `distname` determines the class type of the returned probability distribution object.

Distribution NameDescriptionDistribution Class
`'Beta'`Beta distribution`prob.BetaDistribution`
`'Binomial'`Binomial distribution`prob.BinomialDistribution`
`'BirnbaumSaunders'`Birnbaum-Saunders distribution`prob.BirnbaumSaundersDistribution`
`'Burr'`Burr distribution`prob.BurrDistribution`
`'Exponential'`Exponential distribution`prob.ExponentialDistribution`
`'ExtremeValue'`Extreme Value distribution`prob.ExtremeValueDistribution`
`'Gamma'`Gamma distribution`prob.GammaDistribution`
`'GeneralizedExtremeValue'`Generalized Extreme Value distribution`prob.GeneralizedExtremeValueDistribution`
`'GeneralizedPareto'`Generalized Pareto distribution`prob.GeneralizedParetoDistribution`
`'InverseGaussian'`Inverse Gaussian distribution`prob.InverseGaussianDistribution`
`'Kernel'`Kernel distribution`prob.KernelDistribution`
`'Logistic'`Logistic distribution`prob.LogisticDistribution`
`'Loglogistic'`Loglogistic distribution`prob.LoglogisticDistribution`
`'Lognormal'`Lognormal distribution`prob.LognormalDistribution`
`'Multinomial'`Multinomial distribution`prob.MultinomialDistribution`
`'Nakagami'`Nakagami distribution`prob.NakagamiDistribution`
`'NegativeBinomial'`Negative Binomial distribution`prob.NegativeBinomialDistribution`
`'Normal'`Normal distribution`prob.NormalDistribution`
`'Poisson'`Poisson distribution`prob.PoissonDistribution`
`'Rayleigh'`Rayleigh distribution`prob.RayleighDistribution`
`'Rician'`Rician distribution`prob.RicianDistribution`
`'tLocationScale'`t Location-Scale distribution`prob.tLocationScaleDistribution`
`'Weibull'`Weibull distribution`prob.WeibullDistribution`

### `groupvar` — Grouping variablecategorical array | logical or numeric vector | cell array of strings

Grouping variable, specified as a categorical array, logical or numeric vector, or cell array of strings. Each unique value in a grouping variable defines a group.

For example, if `Gender` is a cell array of strings with values `'Male'` and `'Female'`, you can use `Gender` as a grouping variable to fit a distribution to your data by gender.

More than one grouping variable can be used by specifying a cell array of grouping variable names. Observations are placed in the same group if they have common values of all specified grouping variables.

For example, if `Smoker` is a logical vector with values `0` for nonsmokers and `1` for smokers, then specifying the cell array `{Gender,Smoker}` divides observations into four groups: Male Smoker, Male Nonsmoker, Female Smoker, and Female Nonsmoker.

Example: `{Gender,Smoker}`

Data Types: `single` | `double` | `logical` | `cell` | `char`

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside single quotes (`' '`). You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `fitdist(x,'Kernel','Kernel','triangle')` fits a kernel distribution object to the data in `x` using a triangular kernel function.

### `'Censoring'` — Logical flag for censored data`0` (default) | vector of logical values

Logical flag for censored data, specified as the comma-separated pair consisting of `'Censoring'` and a vector of logical values that is the same size as input vector `x`. The value is `1` when the corresponding element in `x` is a right-censored observation and `0` when the corresponding elements is an exact observation. The default is a vector of `0`s, indicating that all observations are exact.

`fitdist` ignores any `NaN` values in this censoring vector. Additionally, any `NaN` values in `x` or the frequency vector causes `fitdist` to ignore the corresponding values in the censoring vector.

Data Types: `logical`

### `'Frequency'` — Observation frequency`1` (default) | vector of nonnegative integer values

Observation frequency, specified as the comma-separated pair consisting of `'Frequency'` and a vector of nonnegative integer values that is the same size as input vector `x`. Each element of the frequency vector specifies the frequencies for the corresponding elements in `x`. The default is a vector of `1`s, indicating that each value in `x` only appears once.

`fitdist` ignores any `NaN` values in this frequency vector are ignored by the fitting calculations. Additionally, any `NaN` values in `x` or the censoring vector causes `fitdist` to ignore the corresponding values in the frequency vector.

Data Types: `logical`

### `'Options'` — Control parametersstructure

Control parameters for the iterative fitting algorithm, specified as the comma-separated pair consisting of `'Options'` and a structure you create using `statset`.

Data Types: `struct`

### `'NTrials'` — Number of trialspositive integer value

Number of trials for the binomial distribution, specified as the comma-separated pair consisting of `'NTrials'` and a positive integer value. You must specify `distname` as `'Binomial'` to use this option.

Data Types: `single` | `double`

### `'Theta'` — Threshold parameter`0` (default) | scalar value

Threshold parameter for the generalized Pareto distribution, specified as the comma-separated pair consisting of `'Theta'` and a scalar value. You must specify `distname` as `'GeneralizedPareto'` to use this option.

Data Types: `single` | `double`

### `'Kernel'` — Kernel smoother type`'normal'` (default) | `'box'` | `'triangle'` | `'epanechnikov'`

Kernel smoother type, specified as the comma-separated pair consisting of `'Kernel'` and one of the following:

• `'normal'`

• `'box'`

• `'triangle'`

• `'epanechnikov'`

You must specify `distname` as `'Kernel'` to use this option.

### `'Support'` — Kernel density support`'unbounded'` (default) | `'positive'` | two-element vector

Kernel density support, specified as the comma-separated pair consisting of `'Support'` and a string or two-element vector. The string must be one of the following.

 `'unbounded'` Density can extend over the whole real line. `'positive'` Density is restricted to positive values.

Alternatively, you can specify a two-element vector giving finite lower and upper limits for the support of the density.

You must specify `distname` as `'Kernel'` to use this option.

Data Types: `single` | `double`

### `'Width'` — Bandwidth of kernel smoothing windowscalar value

Bandwidth of the kernel smoothing window, specified as the comma-separated pair consisting of `'Width'` and a scalar value. The default value used by `fitdist` is optimal for estimating normal densities, but you might want to choose a smaller value to reveal features such as multiple modes. You must specify `distname` as `'Kernel'` to use this option.

Data Types: `single` | `double`

## Output Arguments

collapse all

### `pd` — Probability distributionprobability distribution object

Probability distribution, returned as a probability distribution object. The distribution specified by `distname` determines the class type of the returned probability distribution object.

### `pdca` — Probability distribution objectscell array

Probability distribution objects of the type specified by `distname`, returned as a cell array.

### `gn` — Group labelscell array of strings

Group labels, returned as a cell array of strings.

### `gl` — Grouping variable levelscell array of strings

Grouping variable levels, returned as a cell array of strings containing one column for each grouping variable.

## Alternative Functionality

### App

The Distribution Fitting app opens a graphical user interface for you to import data from the workspace and interactively fit a probability distribution to that data. You can then save the distribution to the workspace as a probability distribution object. Open the Distribution Fitting app using `dfittool`, or click Distribution Fitting on the Apps tab.

collapse all

### Algorithms

The `fitdist` function fits most distributions using maximum likelihood estimation. Two exceptions are the normal and lognormal distributions with uncensored data.

• For the uncensored normal distribution, the estimated value of the sigma parameter is the square root of the unbiased estimate of the variance.

• For the uncensored lognormal distribution, the estimated value of the sigma parameter is the square root of the unbiased estimate of the variance of the log of the data.

## References

[1] Johnson, N. L., S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions. Vol. 1, Hoboken, NJ: Wiley-Interscience, 1993.

[2] Johnson, N. L., S. Kotz, and N. Balakrishnan. Continuous Univariate Distributions. Vol. 2, Hoboken, NJ: Wiley-Interscience, 1994.

[3] Bowman, A. W., and A. Azzalini. Applied Smoothing Techniques for Data Analysis. New York: Oxford University Press, 1997.