# Documentation

## Maximum Likelihood Estimation with Missing Data

### Introduction

Suppose that a portion of the sample data is missing, where missing values are represented as `NaN`s. If the missing values are missing-at-random and ignorable, where Little and Rubin [7] have precise definitions for these terms, it is possible to use a version of the Expectation Maximization, or EM, algorithm of Dempster, Laird, and Rubin [3] to estimate the parameters of the multivariate normal regression model. The algorithm used in Financial Toolbox™ software is the ECM (Expectation Conditional Maximization) algorithm of Meng and Rubin [8] with enhancements by Sexton and Swensen [9].

Each sample zk for k = 1, ..., m, is either complete with no missing values, empty with no observed values, or incomplete with both observed and missing values. Empty samples are ignored since they contribute no information.

To understand the missing-at-random and ignorabable conditions, consider an example of stock price data before an IPO. For a counterexample, censored data, in which all values greater than some cutoff are replaced with `NaN`s, does not satisfy these conditions.

In sample k, let xk represent the missing values in zk , and yk represent the observed values. Define a permutation matrix Pk so that

${z}_{k}={P}_{k}\left[\begin{array}{c}{x}_{k}\\ {y}_{k}\end{array}\right]$

for k = 1, ..., m.

### ECM Algorithm

The ECM algorithm has two steps – an E, or expectation step, and a CM, or conditional maximization, step. As with maximum likelihood estimation, the parameter estimates evolve according to an iterative process, where estimates for the parameters after t iterations are denoted as b(t) and C(t).

The E step forms conditional expectations for the elements of missing data with

$\begin{array}{l}E\left[{X}_{k}{|Y}_{k}={y}_{k};\text{\hspace{0.17em}}{b}^{\left(t\right)},\text{\hspace{0.17em}}{C}^{\left(t\right)}\right]\\ cov\left[{X}_{k}{|Y}_{k}={y}_{k};\text{\hspace{0.17em}}{b}^{\left(t\right)},\text{\hspace{0.17em}}{C}^{\left(t\right)}\right]\end{array}$

for each sample $k\in \left\{1,\dots ,m\right\}$ that has missing data.

The CM step proceeds in the same manner as the maximum likelihood procedure without missing data. The main difference is that missing data moments are imputed from the conditional expectations obtained in the E step.

The E and CM steps are repeated until the log-likelihood function ceases to increase. One of the important properties of the ECM algorithm is that it is always guaranteed to find a maximum of the log-likelihood function and, under suitable conditions, this maximum can be a global maximum.

### Standard Errors

The negative of the expected Hessian of the log-likelihood function and the Fisher information matrix are identical if no data is missing. However, if data is missing, the Hessian, which is computed over available samples, accounts for the loss of information due to missing data. So, the Fisher information matrix provides standard errors that are a Cramér-Rao lower bound whereas the Hessian matrix provides standard errors that may be greater if there is missing data.

### Data Augmentation

The ECM functions do not "fill in" missing values as they estimate model parameters. In some cases, you may want to fill in the missing values. Although you can fill in the missing values in your data with conditional expectations, you would get optimistic and unrealistic estimates because conditional estimates are not random realizations.

Several approaches are possible, including resampling methods and multiple imputation (see Little and Rubin [7] and Shafer [10] for details). A somewhat informal sampling method for data augmentation is to form random samples for missing values based on the conditional distribution for the missing values. Given parameter estimates for $X\subset {R}^{n}$ and $\stackrel{^}{C}$, each observation has moments

$E\left[{Z}_{k}\right]={H}_{k}\stackrel{^}{b}$

and

$cov\left({Z}_{k}\right)={H}_{k}\stackrel{^}{C}{H}_{k}$

for k = 1, ..., m, where you have dropped the parameter dependence on the left sides for notational convenience.

For observations with missing values partitioned into missing values Xk and observed values Yk = yk, you can form conditional estimates for any subcollection of random variables within a given observation. Thus, given estimates E[ Zk ] and cov(Zk) based on the parameter estimates, you can create conditional estimates

$E\left[{X}_{k}{|y}_{k}\right]$

and

$cov\left({X}_{k}{|y}_{k}\right)$

using standard multivariate normal distribution theory. Given these conditional estimates, you can simulate random samples for the missing values from the conditional distribution

${X}_{k}\sim N\left(E\left[{X}_{k}|{y}_{k}\right],\text{\hspace{0.17em}}cov\left({X}_{k}|{y}_{k}\right)\right).$

The samples from this distribution reflect the pattern of missing and nonmissing values for observations k = 1, ..., m. You must sample from conditional distributions for each observation to preserve the correlation structure with the nonmissing values at each observation.

If you follow this procedure, the resultant filled-in values are random and generate mean and covariance estimates that are asymptotically equivalent to the ECM-derived mean and covariance estimates. Note, however, that the filled-in values are random and reflect likely samples from the distribution estimated over all the data and may not reflect "true" values for a particular observation.

### Multivariate Normal Regression Functions

Financial Toolbox software has a number of functions for multivariate normal regression with or without missing data. The toolbox functions solve four classes of regression problems with functions to estimate parameters, standard errors, log-likelihood functions, and Fisher information matrices. The four classes of regression problems are:

Additional support functions are also provided, see Support Functions.

In all functions, the MATLAB® representation for the number of observations (or samples) is `NumSamples = `m, the number of data series is `NumSeries = `n, and the number of model parameters is `NumParams = `p. The moment estimation functions have `NumSeries = NumParams`.

The collection of observations (or samples) is stored in a MATLAB matrix `Data` such that

for `k = 1, ..., NumSamples`, where `Data` is a `NumSamples`-by-`NumSeries` matrix.

For the multivariate normal regression or least-squares functions, an additional required input is the collection of design matrices that is stored as either a MATLAB matrix or a vector of cell arrays denoted as `Design`.

If `Numseries = 1`, `Design` can be a `NumSamples`-by-`NumParams` matrix. This is the "standard" form for regression on a single data series.

If `Numseries = 1`, `Design` can be either a cell array with a single cell or a cell array with `NumSamples` cells. Each cell in the cell array contains a `NumSeries`-by-`NumParams `matrix such that

$\text{Design}\left\{\text{k}\right\}={H}_{k}$

for` k = 1, ..., NumSamples`. If `Design` has a single cell, it is assumed to be the same `Design` matrix for each sample such that

$\text{Design}\left\{1\right\}={H}_{1}=\dots ={H}_{m}.$

Otherwise, `Design` must contain individual design matrices for each sample.

The main distinction among the four classes of regression problems depends upon how missing values are handled and where missing values are represented as the MATLAB value `NaN`. If a sample is to be ignored given any missing values in the sample, the problem is said to be a problem "without missing data." If a sample is to be ignored if and only if every element of the sample is missing, the problem is said to be a problem "with missing data" since the estimation must account for possible `NaN` values in the data.

In general, `Data` may or may not have missing values and `Design` should have no missing values. In some cases, however, if an observation in `Data` is to be ignored, the corresponding elements in `Design` are also ignored. Consult the function reference pages for details.

### Multivariate Normal Regression Without Missing Data

You can use the following functions for multivariate normal regression without missing data.

 Estimate model parameters, residuals, and the residual covariance. Estimate standard errors of model and covariance parameters. Estimate the Fisher information matrix. Calculate the log-likelihood function.

The first two functions are the main estimation functions. The second two are supporting functions that can be used for more detailed analyses.

### Multivariate Normal Regression with Missing Data

You can use the following functions for multivariate normal regression with missing data.

 Estimate model parameters, residuals, and the residual covariance. `ecmmvnrstd`` ` Estimate standard errors of model and covariance parameters. Estimate the Fisher information matrix. Calculate the log-likelihood function.

The first two functions are the main estimation functions. The second two are supporting functions used for more detailed analyses.

### Least-Squares Regression with Missing Data

You can use the following functions for least-squares regression with missing data or for covariance-weighted least-squares regression with a fixed covariance matrix.

 Estimate model parameters, residuals, and the residual covariance. Calculate the least-squares objective function (pseudo log-likelihood).

To compute standard errors and estimates for the Fisher information matrix, the multivariate normal regression functions with missing data are used.

 `ecmmvnrstd`` ` Estimate standard errors of model and covariance parameters. Estimate the Fisher information matrix.

### Multivariate Normal Parameter Estimation with Missing Data

You can use the following functions to estimate the mean and covariance of multivariate normal data.

 Estimate the mean and covariance of the data. Estimate standard errors of the mean and covariance of the data. Estimate the Fisher information matrix. Estimate the Fisher information matrix using the Hessian. Calculate the log-likelihood function.

These functions behave slightly differently from the more general regression functions since they solve a specialized problem. Consult the function reference pages for details.

### Support Functions

Two support functions are included.

 Convert a multivariate normal regression model into an SUR model. Obtain initial estimates for the mean and covariance of a `Data` matrix.

The `convert2sur` function converts a multivariate normal regression model into a seemingly unrelated regression, or SUR, model. The second function `ecmninit` is a specialized function to obtain initial ad hoc estimates for the mean and covariance of a `Data` matrix with missing data. (If there are no missing values, the estimates are the maximum likelihood estimates for the mean and covariance.)