# ecmlsrmle

Least-squares regression with missing data

## Syntax

``[Parameters,Covariance,Resid,Info] = ecmlsrmle(Data,Design)``
``[Parameters,Covariance,Resid,Info] = ecmlsrmle(___,MaxIterations,TolParam,TolObj,Param0,Covar0,CovarFormat)``

## Description

````[Parameters,Covariance,Resid,Info] = ecmlsrmle(Data,Design)` estimates a least-squares regression model with missing data. The model has the form$Dat{a}_{k}\sim N\left(Desig{n}_{k}×Parameters,\text{\hspace{0.17em}}Covariance\right)$for samples k = 1, ... , `NUMSAMPLES`.`ecmlsrmle` estimates a `NUMPARAMS`-by-`1` column vector of model parameters called `Parameters`, and a `NUMSERIES`-by-`NUMSERIES` matrix of covariance parameters called `Covariance`.`ecmlsrmle(Data,Design)` with no output arguments plots the log-likelihood function for each iteration of the algorithm.```
````[Parameters,Covariance,Resid,Info] = ecmlsrmle(___,MaxIterations,TolParam,TolObj,Param0,Covar0,CovarFormat)` estimates a least-squares regression model with missing data using optional arguments.```

## Input Arguments

collapse all

Data sample, specified as an `NUMSAMPLES`-by-`NUMSERIES` matrix with `NUMSAMPLES` samples of a `NUMSERIES`-dimensional random vector. If a data sample has missing values, represented as `NaN`s, the sample is ignored. (Use `mvnrmle` to handle missing data.)

Data Types: `double`

Model design, specified as a matrix or a cell array that handles two model structures:

• If `NUMSERIES = 1`, `Design` is a `NUMSAMPLES`-by-`NUMPARAMS` matrix with known values. This structure is the standard form for regression on a single series.

• If `NUMSERIES``1`, `Design` is a cell array. The cell array contains either one or `NUMSAMPLES` cells. Each cell contains a `NUMSERIES`-by-`NUMPARAMS` matrix of known values.

If `Design` has a single cell, it is assumed to have the same `Design` matrix for each sample. If `Design` has more than one cell, each cell contains a `Design` matrix for each sample.

These points concern how `Design` handles missing data:

• Although `Design` should not have `NaN` values, ignored samples due to `NaN` values in `Data` are also ignored in the corresponding `Design` array.

• If `Design` is a `1`-by-`1` cell array, which has a single `Design` matrix for each sample, no `NaN` values are permitted in the array. A model with this structure must have `NUMSERIES``NUMPARAMS` with ```rank(Design{1}) = NUMPARAMS```.

• `ecmlsrmle` is more strict than `mvnrmle` about the presence of `NaN` values in the `Design` array.

Data Types: `double` | `cell`

(Optional) Maximum number of iterations for the estimation algorithm, specified as a numeric. The default value is `100`.

Data Types: `double`

(Optional) Convergence tolerance for estimation algorithm based on changes in model parameter estimates, specified as a numeric. The Default value is `sqrt(eps)` which is about 1.0e-8 for double precision. The convergence test for changes in model parameters is

`$‖Para{m}_{k}-Para{m}_{k-1}‖`

where `Param` represents the output `Parameters`, and iteration k = 2, 3, ... . Convergence is assumed when both the `TolParam` and `TolObj` conditions are satisfied. If both `TolParam``0` and `TolObj``0`, do the maximum number of iterations (`MaxIterations`), whatever the results of the convergence tests.

Data Types: `double` | `table` | `timetable`

(Optional) Convergence tolerance for estimation algorithm based on changes in the objective function, specified as a numeric. The default value is eps ∧ 3/4 which is about 1.0e-12 for double precision. The convergence test for changes in the objective function is

`$|Ob{j}_{k}-Ob{j}_{k-1}|<\text{\hspace{0.17em}}TolObj×\left(1+|Ob{j}_{k}|\right)$`

for iteration k = 2, 3, ... . Convergence is assumed when both the `TolParam` and `TolObj` conditions are satisfied. If both `TolParam``0` and `TolObj``0`, do the maximum number of iterations (`MaxIterations`), whatever the results of the convergence tests.

Data Types: `double`

(Optional) User-supplied initial estimate for the parameters of the regression model, specified as an `NUMPARAMS`-by-`1` column vector.

Data Types: `double`

(Optional) User-supplied initial or known estimate for the covariance matrix of the regression residuals, specified as an `NUMSERIES`-by-`NUMSERIES` matrix.

For covariance-weighted least-squares calculations, this matrix corresponds with weights for each series in the regression. The matrix also serves as an initial guess for the residual covariance in the expectation conditional maximization (ECM) algorithm.

Data Types: `double`

(Optional) Format for the covariance matrix, specified as a character vector. The choices are:

• `'full'` — This is the default method that computes the full covariance matrix.

• `'diagonal'` — This forces the covariance matrix to be a diagonal matrix.

Data Types: `char`

## Output Arguments

collapse all

Parameters of the regression model, returned as an `NUMPARAMS`-by-`1` column vector of estimates for the parameters of the regression model.

Covariance of the regression model's residuals, returned as an `NUMSERIES`-by-`NUMSERIES` matrix of estimates for the covariance of the regression model's residuals.

Residuals from the regression, returned as an `NUMSAMPLES`-by-`NUMSERIES` matrix of residuals from the regression.

Structure containing additional information from the regression, returned as a structure. The structure has these fields:

• `Info.Obj` – A variable-extent column vector, with no more than `MaxIterations` elements, that contain each value of the objective function at each iteration of the estimation algorithm. The last value in this vector, `Obj``(end)`, is the terminal estimate of the objective function. If you do least squares, the objective function is the least squares objective function.

• `Info.PrevParameters``NUMPARAMS`-by-`1` column vector of estimates for the model parameters from the iteration just before the terminal iteration.

• `Info.PrevCovariance``NUMSERIES`-by-`NUMSERIES` matrix of estimates for the covariance parameters from the iteration just before the terminal iteration.

Use the estimates in the output structure `Info` for diagnostic purposes.

## References

[1] Dempster A, P., N.M. Laird, and D. B. Rubin. “Maximum Likelihood from Incomplete Data via the EM Algorithm.” Journal of the Royal Statistical Society. Series B, Vol. 39, No. 1, 1977, pp. 1–37.

[2] Roderick J., A. Little, and Donald B. Rubin. Statistical Analysis with Missing Data., 2nd Edition. John Wiley & Sons, Inc., 2002.

[3] Sexton J. and Anders Rygh Swensen. “ECM Algorithms that Converge at the Rate of EM.” Biometrika. Vol. 87, No. 3, 2000, pp. 651–662.

[4] Xiao-Li Meng and Donald B. Rubin. “Maximum Likelihood Estimation via the ECM Algorithm.” Biometrika. Vol. 80, No. 2, 1993, pp. 267–278.

## Version History

Introduced in R2006a