Statistics Toolbox 

This example shows how to use some more advanced techniques with the Statistics Toolbox™ function mle to fit custom distributions to univariate data. The techniques include fitting models to censored data, and illustration of some of the numerical details of fitting with custom distributions.
See Fitting Custom Univariate Distributions for additional examples of fitting custom distributions to univariate data.
On this page… 

Fitting Custom Distributions with Censored Data 
Fitting Custom Distributions with Censored Data
The extreme value distribution is often used to model failure times of mechanical parts, and experiments in such cases are sometimes only run for a fixed length of time. If not all of the experimental units have failed within that time, then the data are rightcensored, that is, the value of some failure times are not known exactly, but only known to be larger than a certain value.
The Statistics Toolbox includes the function evfit, which fits an extreme value distribution to data, including data with censoring. However, for the purposes of this example, we will ignore evfit, and show how to use mle and custom distributions to fit a model to censored data, using the extreme value distribution.
Because the values for the censored data are not known exactly, maximum likelihood estimation becomes more difficult. In particular, both the PDF and the CDF are needed to compute the loglikelihood. Therefore, you must provide mle with functions for both of those in order to fit censored data. The Statistics Toolbox includes the functions evpdf and evcdf, so for this example, the work of writing the code has already been done.
We'll fit the model to simulated data. The first step is to generate some uncensored extreme value data.
rng(0,'twister');
n = 50;
mu = 5;
sigma = 2.5;
x = evrnd(mu,sigma,n,1);
Next, we censor any values that are larger than a predetermined cutoff, by replacing those values with the cutoff value. This is known as Type II censoring, and with the cutoff at 7, about 12% of the original data end up being censored.
c = (x > 7); x(c) = 7; sum(c)/length(c)
ans = 0.1200
We can plot a histogram of these data, including a stacked bar to represent the censored observations.
[uncensCnts,binCtrs] = hist(x(~c));
censCnts = hist(x(c),binCtrs);
bar(binCtrs,[uncensCnts' censCnts'],'stacked');
Although there is censoring, the fraction of censored observations is relatively small, and so the method of moments can provide a reasonable starting point for the parameter estimates. We compute the values of mu and sigma that correspond to the observed mean and standard deviation of the uncensored data.
sigma0 = sqrt(6)*std(x(~c))./pi mu0 = mean(x(~c))psi(1).*sigma0
sigma0 = 2.3495 mu0 = 3.5629
In addition to passing the data, x, and handles to the PDF and CDF functions into mle, we also use the 'censoring' parameter to pass in the censoring vector, c. Because the scale parameter, sigma, must be positive, we specify lower parameter bounds. mle returns the maximum likelihood estimates of the two extreme value distribution parameters, mu and sigma, as well as approximate 95% confidence intervals.
[paramEsts,paramCIs] = mle(x, 'censoring',c, 'pdf',@evpdf, 'cdf',@evcdf, ... 'start',[mu0 sigma0], 'lower',[Inf,0])
paramEsts = 4.5530 3.0215 paramCIs = 3.6455 2.2937 5.4605 3.7494
Some Numerical Issues in Fitting Custom Distributions
Fitting a custom distribution requires an initial guess for the parameters, and it's often difficult to know how good or bad a starting point is a priori. In the previous example, if we had picked a starting point that was farther away from the maximum likelihood estimates, some of the observations could have been very far out in the tails of the extreme value distribution corresponding to the starting point. One of two things might then have happened.
First, one of the PDF values might have become so small that it underflowed to zero in double precision arithmetic. Second, one of the CDF values might have become so close to 1 that it rounded up in double precision. (It's also possible that a CDF value might have become so small as to underflow, but that turns out not to be a problem.)
Either of these conditions causes problems when mle computes the loglikelihood, because they lead to loglikelihood values of Inf, and the optimization algorithm in mle can not normally be expected to step out of such regions.
Knowing what the maximum likelihood estimates are, let's see what happens with a different starting point.
start = [1 1]; try [paramEsts,paramCIs] = mle(x, 'censoring',c, 'pdf',@evpdf, 'cdf',@evcdf, ... 'start',start, 'lower',[Inf,0]) catch ME disp(ME.message) end
The CDF function returned values greater than or equal to 1.
In this case, the second problem has occurred: Some of the CDF values at the initial parameter guess are computed as exactly 1, and so the loglikelihood is infinite. We could try setting mle's 'FunValCheck' control parameter to 'off', which would disable checking for nonfinite likelihood values, and then hope for the best. But the right way to solve this numerical problem is at its root, and in this case it's not hard to do.
Notice that the extreme value CDF is of the form
p = 1  exp( exp((xmu)./sigma) )
The contribution of the censored observations to the loglikelihood is the log of their survival function (SF) values, i.e., log(1CDF). For the extreme value distribution, the log of the SF is just exp((xmu)./sigma). If we could compute the loglikelihood using the log SF directly, (instead of, in effect, computing log(1  (1exp(logSF)))) we would avoid the rounding issues with the CDF. That's because observations whose CDF values are not distinguishable from 1 in double precision have log SF values that are still easily representable as nonzero values. For example, a CDF value of (1  1e20) rounds to 1 in double precision, because double precision eps is about 2e16.
SFval = 1e20; CDFval = 1  SFval
CDFval = 1
However, the log of the corresponding SF value, i.e. log(1CDF), is easily represented.
log(SFval)
ans = 46.0517
A similar observation can be made about using the log PDF rather than the PDF itself  the contribution of uncensored observations to the loglikelihood is the log of their PDF values. Using the log PDF directly (instead of, in effect, computing log(exp(logPDF))) avoids underflow problems where the PDF is not distinguishable from zero in double precision, but the log PDF is still easily representable as a finite negative number. For example, a PDF value of 1e400 underflows in double precision, because double precision realmin is about 2e308.
logPDFval = 921; PDFval = exp(logPDFval)
PDFval = 0
mle provides a syntax for specifying a custom distribution using the log PDF and the log SF (rather than the PDF and CDF), via the 'logpdf' and 'logsf' parameters. Unlike the PDF and CDF functions, there are no existing functions, so we'll create anonymous functions that compute these values:
evlogpdf = @(x,mu,sigma) ((x  mu) ./ sigma  exp((x  mu) ./ sigma))  log(sigma); evlogsf = @(x,mu,sigma) exp((xmu)./sigma);
Using the same starting point, the alternate logPDF/logSF specification of the extreme value distribution makes the problem solvable:
start = [1 1]; [paramEsts,paramCIs] = mle(x, 'censoring',c, 'logpdf',evlogpdf, 'logsf',evlogsf, ... 'start',start, 'lower',[Inf,0])
paramEsts = 4.5530 3.0215 paramCIs = 3.6455 2.2937 5.4605 3.7494
However, this strategy cannot always mitigate a poor starting point, and a careful choice of starting point is always recommended.
By default, mle uses the function fminsearch to find parameter values that maximize the loglikelihood for the data. fminsearch uses an optimization algorithm that is derivativefree, and is often a good choice.
However, for some problems, choosing an optimization algorithm that uses the derivatives of the loglikelihood function can make the difference between converging to the maximum likelihood estimates or not, especially when the starting point is far away from the final answer. Providing the derivatives can also sometimes speed up the convergence.
If your MATLAB® installation includes the Optimization Toolbox™, mle allows you to use the function fmincon, which includes optimization algorithms that can use derivative information. To take best advantage of the algorithms in fmincon, you can specify a custom distribution using a loglikelihood function, written to return not only the loglikelihood itself, but its gradient as well. The gradient of the loglikelihood function is simply the vector of its partial derivatives with respect to its parameters.
This strategy requires extra preparation, to write code that computes both the loglikelihood and its gradient. For this example, we've the created code to do that for the extreme value distribution as a separate file evnegloglike.mevnegloglike.m.
type evnegloglike.m
function [nll,ngrad] = evnegloglike(params,x,cens,freq) %EVNEGLOGLIKE Negative loglikelihood for the extreme value distribution. % Copyright 19842004 The MathWorks, Inc. mu = params(1); sigma = params(2); nunc = sum(1cens); z = (x  mu) ./ sigma; expz = exp(z); nll = sum(expz)  sum(z(~cens)) + nunc.*log(sigma); if nargout > 1 ngrad = [sum(expz)./sigma + nunc./sigma, ... sum(z.*expz)./sigma + sum(z(~cens))./sigma + nunc./sigma]; end
Notice that the function evnegloglike returns the negative of both the loglikelihood values and of the gradient values, because MLE minimizes that negative loglikelihood.
To compute the maximum likelihood estimates using a gradientbased optimization algorithm, we use the 'nloglf' parameter, specifying that we are providing a handle to a function that computes the negative loglikelihood, and the 'optimfun' parameter, specifying fmincon as the optimization function. mle will automatically detect that evnegloglike can return both the negative loglikelihood and its gradient.
start = [1 1]; [paramEsts,paramCIs] = mle(x, 'censoring',c, 'nloglf',@evnegloglike, ... 'start',start, 'lower',[Inf,0], 'optimfun','fmincon')
paramEsts = 4.5530 3.0215 paramCIs = 3.6455 2.2937 5.4605 3.7494