Fit Custom Distributions

This example shows how to fit a custom distribution to univariate data by using the mle function.

You can use the mle function to compute maximum likelihood parameter estimates and to estimate their precision for built-in distributions and custom distributions. To fit a custom distribution, you need to define a function for the custom distribution in a file or by using an anonymous function. In the simplest cases, you can write code to compute the probability density function (pdf) or logarithm of pdf for the distribution that you want to fit, and then call mle to fit the distribution. This example covers the following cases using the pdf or logarithm of pdf:

Fitting a distribution for truncated data
Fitting a mixture of two distributions
Fitting a weighted distribution
Finding accurate confidence intervals of parameter estimates for small-sized samples using parameter transformation

Note that you can use the TruncationBounds name-value argument of mle for truncated data instead of defining a custom function. Also, for a mixture of two normal distributions, you can use the fitgmdist function. This example uses the mle function and a custom function for these cases.

Fit Zero-Truncated Poisson Distribution

Count data is often modeled using a Poisson distribution, and you can use the poissfit or fitdist function to fit a Poisson distribution. However, in some situations, counts that are zero are not recorded in the data, so fitting a Poisson distribution is not straightforward because of the missing zeros. In this case, fit a Poisson distribution to zero-truncated data by using the mle function and a custom distribution function.

First, generate some random Poisson data.

rng(18,'twister') % For reproducibility
lambda = 1.75;
n = 75;
x1 = poissrnd(lambda,n,1);

Next, remove all the zeros from the data to simulate the truncation.

x1 = x1(x1 > 0);

Check the number of samples in x1 after truncation.

length(x1)

ans = 65

Plot a histogram of the simulated data.

histogram(x1,0:1:max(x1)+1)

The data looks like a Poisson distribution except it contains no zeros. You can use a custom distribution that is identical to a Poisson distribution on the positive integers, but has no probability at zero. By using a custom distribution, you can estimate the Poisson parameter lambda while accounting for the missing zeros.

You need to define the zero-truncated Poisson distribution by its probability mass function (pmf). Create an anonymous function to compute the probability for each point in x1, given a value for the Poisson distribution's mean parameter lambda. The pmf for a zero-truncated Poisson distribution is the Poisson pmf normalized so that it sums to one. With zero truncation, the normalization is 1–Probability(x1<0).

pf_truncpoiss = @(x1,lambda) poisspdf(x1,lambda)./(1-poisscdf(0,lambda));

For simplicity, assume that all the x1 values given to this function are positive integers, with no checks. For error checking or a more complicated distribution that takes more than a single line of code, you must define the function in a separate file.

Find a reasonable rough first guess for the parameter lambda. In this case, use the sample mean.

start = mean(x1)

start = 2.2154

Provide mle with the data, custom pmf function, initial parameter value, and lower bound of the parameter. Because the mean parameter of the Poisson distribution must be positive, you also need to specify a lower bound for lambda. The mle function returns the maximum likelihood estimate of lambda, and optionally, the approximate 95% confidence intervals for the parameters.

[lambdaHat,lambdaCI] = mle(x1,'pdf',pf_truncpoiss,'Start',start, ...
    'LowerBound',0)

lambdaHat = 1.8760

lambdaCI = 2×1

    1.4990
    2.2530

The parameter estimate is smaller than the sample mean. The maximum likelihood estimate accounts for the zeros not present in the data.

Alternatively, you can specify the truncation bounds by using the TruncationBounds name-value argument.

[lambdaHat2,lambdaCI2] = mle(x1,'Distribution','Poisson', ...
    'TruncationBounds',[0 Inf])

lambdaHat2 = 1.8760

lambdaCI2 = 2×1

    1.4990
    2.2530

You can also compute a standard error estimate for lambda by using the large-sample variance approximation returned by mlecov.

avar = mlecov(lambdaHat,x1,'pdf',pf_truncpoiss);
stderr = sqrt(avar)

stderr = 0.1923

To visually check the fit, plot the fitted pmf against a normalized histogram of the raw data.

histogram(x1,'Normalization','pdf')
xgrid = min(x1):max(x1);
pmfgrid = pf_truncpoiss(xgrid,lambdaHat);
hold on
plot(xgrid,pmfgrid,'-')
xlabel('x1')
ylabel('Probability')
legend('Sample Data','Fitted pmf','Location','best')
hold off

Fit Upper-Truncated Normal Distribution

Continuous data can sometimes be truncated. For example, observations larger than some fixed value might not be recorded because of limitations in data collection.

In this case, simulate data from a truncated normal distribution. First, generate some random normal data.

n = 500;
mu = 1;
sigma = 3;
rng('default') % For reproducibility
x2 = normrnd(mu,sigma,n,1);

Next, remove any observations that fall beyond the truncation point xTrunc. Assume that xTrunc is a known value that you do not need to estimate.

xTrunc = 4;
x2 = x2(x2 < xTrunc);

Check the number of samples in x2 after truncation.

length(x2)

ans = 430

Create a histogram of the simulated data.

histogram(x2)

Fit the simulated data with a custom distribution that is identical to a normal distribution for x2 < xTrunc, but has zero probability above xTrunc. By using a custom distribution, you can estimate the normal parameters mu and sigma while accounting for the missing tail.

Define the truncated normal distribution by its pdf. Create an anonymous function to compute the probability density value for each point in x, given values for the parameters mu and sigma. With the truncation point fixed and known, the pdf for a truncated normal distribution is the pdf truncated and then normalized so that it integrates to one. The normalization is the cdf evaluated at xTrunc. For simplicity, assume that all x2 values are less than xTrunc, without checking.

pdf_truncnorm = @(x2,mu,sigma) ...
    normpdf(x2,mu,sigma)./normcdf(xTrunc,mu,sigma);

Because you do not need to estimate the truncation point xTrunc, it is not included with the input distribution parameters of the custom pdf function. xTrunc is also not part of the data vector input argument. An anonymous function can access variables in the workspace, so you do not have to pass xTrunc to the anonymous function as an additional argument.

Provide a rough starting guess for the parameter estimates. In this case, because the truncation is not extreme, use the sample mean and standard deviation.

start = [mean(x2),std(x2)]

start = 1×2

    0.1585    2.4125

Provide mle with the data, custom pdf function, initial parameter values, and lower bounds of the parameters. Because sigma must be positive, you also need to specify lower parameter bounds. mle returns the maximum likelihood estimates of mu and sigma as a single vector, as well as a matrix of approximate 95% confidence intervals for the two parameters.

[paramEsts,paramCIs] = mle(x2,'pdf',pdf_truncnorm,'Start',start, ...
    'LowerBound',[-Inf 0])

paramEsts = 1×2

    1.1298    3.0884

paramCIs = 2×2

    0.5713    2.7160
    1.6882    3.4607

The estimates of mu and sigma are larger than the sample mean and standard deviation. The model fit accounts for the missing upper tail of the distribution.

Alternatively, you can specify the truncation bounds by using the TruncationBounds name-value argument.

[paramEsts2,paramCIs2] = mle(x2,'Distribution','Normal', ...
    'TruncationBounds',[-Inf xTrunc])

paramEsts2 = 1×2

    1.1297    3.0884

paramCIs2 = 2×2

    0.5713    2.7160
    1.6882    3.4607

You can compute an approximate covariance matrix for the parameter estimates using mlecov. The approximation typically works well for large samples, and you can approximate the standard errors by the square roots of the diagonal elements.

acov = mlecov(paramEsts,x2,'pdf',pdf_truncnorm)

acov = 2×2

    0.0812    0.0402
    0.0402    0.0361

stderr = sqrt(diag(acov))

stderr = 2×1

    0.2849
    0.1900

To visually check the fit, plot the fitted pdf against a normalized histogram of the raw data.

histogram(x2,'Normalization','pdf')
xgrid = linspace(min(x2),max(x2));
pdfgrid = pdf_truncnorm(xgrid,paramEsts(1),paramEsts(2));
hold on
plot(xgrid,pdfgrid,'-')
xlabel('x2')
ylabel('Probability Density')
legend('Sample Data','Fitted pdf','Location','best')
hold off

Fit Mixture of Two Normal Distributions

Some data sets exhibit bimodality, or even multimodality, and fitting a standard distribution to such data is usually not appropriate. However, a mixture of simple unimodal distributions can often model such data very well.

In this case, fit a mixture of two normal distributions to simulated data. Consider simulated data with the following constructive definition:

First, flip a biased coin.
If the coin lands on heads, pick a value at random from a normal distribution with mean $μ_{1}$ and standard deviation $σ_{1}$ .
If the coin lands on tails, pick a value at random from a normal distribution with mean $μ_{2}$ and standard deviation $σ_{2}$ .

Generate a data set from a mixture of Student's t distributions instead of using the same model that you are fitting. By using different distributions, similar to a technique used in a Monte-Carlo simulation, you can test how robust a fitting method is to departures from the assumptions of the model being fit.

rng(10) % For reproducibility
x3 = [trnd(20,1,50) trnd(4,1,100)+3];
histogram(x3)

Define the model to fit by creating an anonymous function that computes the probability density. The pdf for a mixture of two normal distributions is a weighted sum of the pdfs of the two normal components, weighted by the mixture probability. The anonymous function takes six inputs: a vector of data at which to evaluate the pdf and five distribution parameters. Each component has parameters for its mean and standard deviation.

pdf_normmixture = @(x3,p,mu1,mu2,sigma1,sigma2) ...
    p*normpdf(x3,mu1,sigma1) + (1-p)*normpdf(x3,mu2,sigma2);

You also need an initial guess for the parameters. Defining a starting point becomes more important as the number of model parameters increases. Here, start with an equal mixture (p = 0.5) of normal distributions, centered at the two quartiles of the data, with equal standard deviations. The starting value for the standard deviation comes from the formula for the variance of a mixture in terms of the mean and variance of each component.

pStart = .5;
muStart = quantile(x3,[.25 .75])

muStart = 1×2

    0.3351    3.3046

sigmaStart = sqrt(var(x3) - .25*diff(muStart).^2)

sigmaStart = 1.1602

start = [pStart muStart sigmaStart sigmaStart];

Specify bounds of zero and one for the mixing probability, and lower bounds of zero for the standard deviations. Set the remaining elements of the bounds vectors to +Inf or –Inf, to indicate no restrictions.

lb = [0 -Inf -Inf 0 0];
ub = [1 Inf Inf Inf Inf];
paramEsts = mle(x3,'pdf',pdf_normmixture,'Start',start, ...
    'LowerBound',lb,'UpperBound',ub)

Warning: Maximum likelihood estimation did not converge.  Iteration limit exceeded.

paramEsts = 1×5

    0.3273   -0.2263    2.9914    0.9067    1.2059

The warning message indicates that the function does not converge with the default iteration settings. Display the default options.

statset('mlecustom')

ans = struct with fields:
          Display: 'off'
      MaxFunEvals: 400
          MaxIter: 200
           TolBnd: 1.0000e-06
           TolFun: 1.0000e-06
       TolTypeFun: []
             TolX: 1.0000e-06
         TolTypeX: []
          GradObj: 'off'
         Jacobian: []
        DerivStep: 6.0555e-06
      FunValCheck: 'on'
           Robust: []
     RobustWgtFun: []
           WgtFun: []
             Tune: []
      UseParallel: []
    UseSubstreams: []
          Streams: {}
        OutputFcn: []

The default maximum number of iterations for custom distributions is 200. Override the default to increase the number of iterations, using an options structure created with the statset function. Also, increase the maximum function evaluations.

options = statset('MaxIter',300,'MaxFunEvals',600);
paramEsts = mle(x3,'pdf',pdf_normmixture,'Start',start, ...
    'LowerBound',lb,'UpperBound',ub,'Options',options)

paramEsts = 1×5

    0.3273   -0.2263    2.9914    0.9067    1.2059

The final iterations to convergence are significant only in the last few digits of the result. However, a best practice is to always make sure that convergence is reached.

To visually check the fit, plot the fitted density against a probability histogram of the raw data.

histogram(x3,'Normalization','pdf')
hold on
xgrid = linspace(1.1*min(x3),1.1*max(x3),200);
pdfgrid = pdf_normmixture(xgrid, ...
    paramEsts(1),paramEsts(2),paramEsts(3),paramEsts(4),paramEsts(5));
plot(xgrid,pdfgrid,'-')
hold off
xlabel('x3')
ylabel('Probability Density')
legend('Sample Data','Fitted pdf','Location','best')

Alternatively, for a mixture of normal distributions, you can use the fitgmdist function. The estimates can be different due to initial estimates and settings of the iterative algorithm.

Mdl = fitgmdist(x3',2)

Mdl = 

Gaussian mixture distribution with 2 components in 1 dimensions
Component 1:
Mixing proportion: 0.329180
Mean:   -0.2200

Component 2:
Mixing proportion: 0.670820
Mean:    2.9975

Mdl.Sigma

ans = 
ans(:,:,1) =

    0.8274


ans(:,:,2) =

    1.4437

Fit Weighted Normal Distribution to Data with Unequal Precisions

Assume that you have 10 data points, where each point is actually the average of anywhere from one to eight observations. The original observations are not available, but the number of observations for each data point is known. The precision of each point depends on its corresponding number of observations. You need to estimate the mean and standard deviation of the raw data.

x4 = [0.25 -1.24 1.38 1.39 -1.43 2.79 3.52 0.92 1.44 1.26]';
m = [8 2 1 3 8 4 2 5 2 4]';

The variance of each data point is inversely proportional to its corresponding number of observations, so use 1/m to weight the variance of each data point in a maximum likelihood fit.

w = 1./m;

In this model, you can define the distribution by its pdf. However, using a logarithm of pdf is more suitable, because the normal pdf has the form

  c .* exp(-0.5 .* z.^2),

and mle takes the log of the pdf to compute the loglikelihood. So, instead, create a function that computes the logarithm of pdf directly.

The logarithm of pdf function must compute the logarithm of the probability density for each point in x, given normal distribution parameters mu and sigma. It also needs to account for the different variance weights. Define a function named helper_logpdf_wn1 in a separate file helper_logpdf_wn1.m.

function logy = helper_logpdf_wn1(x,m,mu,sigma)
%HELPER_LOGPDF_WN1 Logarithm of pdf for a weight normal distribution
% This function supports only the example Fit Custom Distributions 
% (customdist1demo.mlx) and might change in a future release.
v = sigma.^2 ./ m;
logy = -(x-mu).^2 ./ (2.*v) - .5.*log(2.*pi.*v);
end

Provide a rough first guess for the parameter estimates. In this case, use the unweighted sample mean and standard deviation.

start = [mean(x4),std(x4)]

start = 1×2

    1.0280    1.5490

Because sigma must be positive, you need to specify lower parameter bounds.

[paramEsts1,paramCIs1] = mle(x4,'logpdf', ...
    @(x,mu,sigma)helper_logpdf_wn1(x,m,mu,sigma), ...
    'Start',start,'LowerBound',[-Inf,0])

paramEsts1 = 1×2

    0.6244    2.8823

paramCIs1 = 2×2

   -0.2802    1.6191
    1.5290    4.1456

The estimate of mu is less than two-thirds of the estimate of the sample mean. The estimate is influenced by the most reliable data points, that is, the points based on the largest number of raw observations. In this data set, those points tend to pull the estimate down from the unweighted sample mean.

Fit Normal Distribution Using Parameter Transformation

The mle function computes confidence intervals for the parameters using a large-sample normal approximation for the distribution of the estimators if an exact method is not available. For small sample sizes, you can improve the normal approximation by transforming one or more parameters. In this case, transform the scale parameter of a normal distribution to its logarithm.

First, define a new log pdf function named helper_logpdf_wn2 that uses a transformed parameter for sigma.

function logy = helper_logpdf_wn2(x,m,mu,logsigma)
%HELPER_LOGPDF_WN2 Logarithm of pdf for a weight normal distribution with 
% log(sigma) parameterization
% This function supports only the example Fit Custom Distributions 
% (customdist1demo.mlx) and might change in a future release.
v = exp(logsigma).^2 ./ m;
logy = -(x-mu).^2 ./ (2.*v) - .5.*log(2.*pi.*v);
end

Use the same starting point transformed to the new parameterization for sigma, that is, the log of the sample standard deviation.

start = [mean(x4),log(std(x4))]

start = 1×2

    1.0280    0.4376

Because sigma can be any positive value, log(sigma) is unbounded, and you do not need to specify lower or upper bounds.

[paramEsts2,paramCIs2] = mle(x4,'logpdf', ...
    @(x,mu,sigma)helper_logpdf_wn2(x,m,mu,sigma), ...
    'Start',start)

paramEsts2 = 1×2

    0.6244    1.0586

paramCIs2 = 2×2

   -0.2802    0.6203
    1.5290    1.4969

Because the parameterization uses log(sigma), you have to transform back to the original scale to get an estimate and confidence interval for sigma.

sigmaHat = exp(paramEsts2(2))

sigmaHat = 2.8823

sigmaCI = exp(paramCIs2(:,2))

sigmaCI = 2×1

    1.8596
    4.4677

The estimates for both mu and sigma are the same as in the first fit, because maximum likelihood estimates are invariant to parameterization. The confidence interval for sigma is slightly different from paramCIs1(:,2).