## Maximum Likelihood Estimation with Missing Data

### Introduction

Suppose that a portion of the sample data is missing, where
missing values are represented as `NaN`

s. If the
missing values are missing-at-random and ignorable, where Little and
Rubin [7] have precise definitions
for these terms, it is possible to use a version of the Expectation
Maximization, or EM, algorithm of Dempster, Laird, and Rubin [3] to estimate the parameters of the
multivariate normal regression model. The algorithm used in Financial Toolbox™ software
is the ECM (Expectation Conditional Maximization) algorithm of Meng
and Rubin [8] with enhancements
by Sexton and Swensen [9].

Each sample *z** _{k}* for

*k*= 1, ...,

*m*, is either complete with no missing values, empty with no observed values, or incomplete with both observed and missing values. Empty samples are ignored since they contribute no information.

To understand the missing-at-random and ignorable conditions, consider an example of stock
price data before an IPO. For a counterexample, censored data, in which all values
greater than some cutoff are replaced with `NaN`

s, does not satisfy
these conditions.

In sample *k*, let *x** _{k}*
represent the missing values in

*z*

*and*

_{k }*y*

*represent the observed values. Define a permutation matrix*

_{k}*P*

_{k }so that

$${z}_{k}={P}_{k}\left[\begin{array}{c}{x}_{k}\\ {y}_{k}\end{array}\right]$$

for *k* = 1, ..., *m*.

### ECM Algorithm

The ECM algorithm has two steps – an E, or expectation step, and a CM, or conditional
maximization, step. As with maximum likelihood estimation, the parameter estimates
evolve according to an iterative process, where estimates for the parameters after
*t* iterations are denoted as *b** ^{(}^{t}^{)}* and

*C*

*.*

^{(}^{t}^{)}The *E* step forms conditional expectations for the elements of missing data
with

$$\begin{array}{l}E\left[{X}_{k}{|Y}_{k}={y}_{k};\text{\hspace{0.17em}}{b}^{\left(t\right)},\text{\hspace{0.17em}}{C}^{\left(t\right)}\right]\\ cov\left[{X}_{k}{|Y}_{k}={y}_{k};\text{\hspace{0.17em}}{b}^{\left(t\right)},\text{\hspace{0.17em}}{C}^{\left(t\right)}\right]\end{array}$$

for each sample $$k\in \left\{1,\dots ,m\right\}$$ that has missing data.

The CM step proceeds in the same manner as the maximum likelihood procedure without missing data. The main difference is that missing data moments are imputed from the conditional expectations obtained in the E step.

The E and CM steps are repeated until the log-likelihood function ceases to increase. One of the important properties of the ECM algorithm is that it is always guaranteed to find a maximum of the log-likelihood function and, under suitable conditions, this maximum can be a global maximum.

### Standard Errors

The negative of the expected Hessian of the log-likelihood function and the Fisher information matrix are identical if no data is missing. However, if data is missing, the Hessian, which is computed over available samples, accounts for the loss of information due to missing data. So, the Fisher information matrix provides standard errors that are a Cramér-Rao lower bound whereas the Hessian matrix provides standard errors that may be greater if there is missing data.

### Data Augmentation

The ECM functions do not “fill in” missing values as they estimate model parameters. In some cases, you may want to fill in the missing values. Although you can fill in the missing values in your data with conditional expectations, you would get optimistic and unrealistic estimates because conditional estimates are not random realizations.

Several approaches are possible, including resampling methods and multiple imputation (see Little and Rubin [7] and Shafer [10] for details). A somewhat informal sampling method for data augmentation is to form random samples for missing values based on the conditional distribution for the missing values. Given parameter estimates for $$X\subset {R}^{n}$$ and $$\widehat{C}$$, each observation has moments

$$E\left[{Z}_{k}\right]={H}_{k}\widehat{b}$$

and

$$cov\left({Z}_{k}\right)={H}_{k}\widehat{C}{H}_{k}{}^{T}$$

for *k* = 1, ..., *m*, where
you have dropped the parameter dependence on the left sides for notational
convenience.

For observations with missing values partitioned into missing values *X** _{k}* and
observed values

*Y*

*=*

_{k}*y*

*, you can form conditional estimates for any subcollection of random variables within a given observation. Thus, given estimates*

_{k}*[*

*E**Z*

*] and*

_{k}*cov*(

*Z*

*) based on the parameter estimates, you can create conditional estimates*

_{k}$$E\left[{X}_{k}{|y}_{k}\right]$$

and

$$cov\left({X}_{k}{|y}_{k}\right)$$

using standard multivariate normal distribution theory. Given these conditional estimates, you can simulate random samples for the missing values from the conditional distribution

$${X}_{k}\sim N\left(E\left[{X}_{k}|{y}_{k}\right],\text{\hspace{0.17em}}cov\left({X}_{k}|{y}_{k}\right)\right).$$

The samples from this distribution reflect
the pattern of missing and nonmissing values for observations *k* =
1, ..., *m*. You must sample from conditional distributions
for each observation to preserve the correlation structure with the
nonmissing values at each observation.

If you follow this procedure, the resultant filled-in values are random and generate mean and covariance estimates that are asymptotically equivalent to the ECM-derived mean and covariance estimates. Note, however, that the filled-in values are random and reflect likely samples from the distribution estimated over all the data and may not reflect “true” values for a particular observation.

## See Also

`mvnrmle`

| `mvnrstd`

| `mvnrfish`

| `mvnrobj`

| `ecmmvnrmle`

| `ecmmvnrstd`

| `ecmmvnrfish`

| `ecmmvnrobj`

| `ecmlsrmle`

| `ecmlsrobj`

| `ecmmvnrstd`

| `ecmmvnrfish`

| `ecmnmle`

| `ecmnstd`

| `ecmnfish`

| `ecmnhess`

| `ecmnobj`

| `convert2sur`

| `ecmninit`