### Model

The general model is

$$Z\sim N\left(Mean,\text{\hspace{0.17em}}Covariance\right),$$

where each row of `Data`

is an observation
of *Z*.

Each observation of *Z* is assumed to be
iid (independent, identically distributed) multivariate normal, and
missing values are assumed to be missing at random (MAR). See Little
and Rubin [1] for a precise definition of MAR.

This routine estimates the mean and covariance from given data.
If data values are missing, the routine implements the ECM algorithm
of Meng and Rubin [2] with enhancements by Sexton and Swensen [3].

If a record is empty (every value in a sample is `NaN`

),
this routine ignores the record because it contributes no information.
If such records exist in the data, the number of nonempty samples
used in the estimation is ≤ `NumSamples`

.

The estimate for the covariance is a biased maximum likelihood
estimate (MLE). To convert to an unbiased estimate, multiply the covariance
by `Count`

/(`Count`

– 1), where `Count`

is the number of nonempty
samples used in the estimation.

### Requirements

This routine requires consistent values for `NUMSAMPLES`

and `NUMSERIES`

with `NUMSAMPLES`

> `NUMSERIES`

.
It must have enough nonmissing values to converge. Finally, it must
have a positive-definite covariance matrix. Although the references
provide some necessary and sufficient conditions, general conditions
for existence and uniqueness of solutions in the missing-data case
do not exist. The main failure mode is an ill-conditioned covariance
matrix estimate. Nonetheless, this routine works for most cases that
have less than 15% missing data (a typical upper bound for financial
data).

### Initialization Methods

This routine has three initialization methods that cover most
cases, each with its advantages and disadvantages. The ECM algorithm
always converges to a minimum of the observed negative log-likelihood
function. If you override the initialization methods, you must ensure
that the initial estimate for the covariance matrix is positive-definite.

The following is a guide to the supported initialization methods.

### nanskip

The `nanskip`

method works well with small
problems (fewer than 10 series or with monotone missing data patterns).
It skips over any records with `NaN`

s and estimates
initial values from complete-data records only. This initialization
method tends to yield fastest convergence of the ECM algorithm. This
routine switches to the `twostage`

method if it determines
that significant numbers of records contain `NaN`

.

### twostage

The `twostage`

method is the best choice for
large problems (more than 10 series). It estimates the mean for each
series using all available data for each series. It then estimates
the covariance matrix with missing values treated as equal to the
mean rather than as `NaN`

s. This initialization method
is quite robust but tends to result in slower convergence of the ECM
algorithm.

### diagonal

The `diagonal`

method is a worst-case approach
that deals with problematic data, such as disjoint series and excessive
missing data (more than 33% of data missing). Of the three initialization
methods, this method causes the slowest convergence of the ECM algorithm.
If problems occur with this method, use display mode to examine convergence
and modify either `MaxIterations`

or `Tolerance`

,
or try alternative initial estimates with `Mean0`

and `Covar0`

.
If all else fails, try

Mean0 = zeros(NumSeries);
Covar0 = eye(NumSeries,NumSeries);

Given estimates for mean and covariance from this routine, you
can estimate standard errors with the companion routine `ecmnstd`

.

### Convergence

The ECM algorithm does not work for all patterns of missing
values. Although it works in most cases, it can fail to converge if
the covariance becomes singular. If this occurs, plots of the log-likelihood
function tend to have a constant upward slope over many iterations
as the log of the negative determinant of the covariance goes to zero.
In some cases, the objective fails to converge due to machine precision
errors. No general theory of missing data patterns exists to determine
these cases. An example of a known failure occurs when two time series
are proportional wherever both series contain nonmissing values.