ecmlsrmle - Least-squares regression with missing data

Syntax

[Parameters, Covariance, Resid, Info] = ecmlsrmle(Data, Design, 
MaxIterations, TolParam, TolObj, Param0, Covar0, CovarFormat)

Arguments

Data

NUMSAMPLES-by-NUMSERIES matrix with NUMSAMPLES samples of a NUMSERIES-dimensional random vector. Missing values are represented as NaNs. Only samples that are entirely NaNs are ignored. (To ignore samples with at least one NaN, use mvnrmle.)

Design

A matrix or a cell array that handles two model structures:

  • If NUMSERIES = 1, Design is a NUMSAMPLES-by-NUMPARAMS matrix with known values. This structure is the standard form for regression on a single series.

  • If NUMSERIES 1, Design is a cell array. The cell array contains either one or NUMSAMPLES cells. Each cell contains a NUMSERIES-by-NUMPARAMS matrix of known values.

    If Design has a single cell, it is assumed to have the same Design matrix for each sample. If Design has more than one cell, each cell contains a Design matrix for each sample.

MaxIterations

(Optional) Maximum number of iterations for the estimation algorithm. Default value is 100.

TolParam

(Optional) Convergence tolerance for estimation algorithm based on changes in model parameter estimates. Default value is sqrt(eps) which is about 1.0e-8 for double precision. The convergence test for changes in model parameters is

 

 

where Param represents the output Parameters, and iteration k = 2, 3, ... . Convergence is assumed when both the TolParam and TolObj conditions are satisfied. If both TolParam 0 and TolObj 0, do the maximum number of iterations (MaxIterations), whatever the results of the convergence tests.

TolObj

(Optional) Convergence tolerance for estimation algorithm based on changes in the objective function. Default value is which is about 1.0e-12 for double precision. The convergence test for changes in the objective function is

for iteration k = 2, 3, ... . Convergence is assumed when both the TolParam and TolObj conditions are satisfied. If both TolParam 0 and TolObj 0, do the maximum number of iterations (MaxIterations), whatever the results of the convergence tests.

Param0

(Optional) NUMPARAMS-by-1 column vector that contains a user-supplied initial estimate for the parameters of the regression model. Default is a zero vector.

Covar0

(Optional) NUMSERIES-by-NUMSERIES matrix that contains a user-supplied initial or known estimate for the covariance matrix of the regression residuals. Default is an identity matrix.


For covariance-weighted least-squares calculations, this matrix corresponds with weights for each series in the regression. The matrix also serves as an initial guess for the residual covariance in the expectation conditional maximization (ECM) algorithm.

CovarFormat

(Optional) String that specifies the format for the covariance matrix. The choices are:

  • 'full' - Default method. Compute the full covariance matrix.

  • 'diagonal' - Force the covariance matrix to be a diagonal matrix.

Description

[Parameters, Covariance, Resid, Info] = ecmlsrmle(Data, Design, MaxIterations, TolParam, TolObj, Param0, Covar0, CovarFormat) estimates a least-squares regression model with missing data. The model has the form

for samples k = 1, ... , NUMSAMPLES.

ecmlsrmle estimates a NUMPARAMS-by-1 column vector of model parameters called Parameters, and a NUMSERIES-by-NUMSERIES matrix of covariance parameters called Covariance.

ecmlsrmle(Data, Design) with no output arguments plots the log-likelihood function for each iteration of the algorithm.

To summarize the outputs of ecmlsrmle:

Another output, Info, is a structure that contains additional information from the regression. The structure has these fields:

Notes

If doing covariance-weighted least-squares, Covar0 should usually be a diagonal matrix. Series with greater influence should have smaller diagonal elements in Covar0 and series with lesser influence should have larger diagonal elements. Note that if doing CWLS, Covar0 need not be a diagonal matrix even if CovarFormat = 'diagonal'.

You can configure Design as a matrix if NUMSERIES = 1 or as a cell array if NUMSERIES  1.

These points concern how Design handles missing data:

Use the estimates in the optional output structure Info for diagnostic purposes.

Examples

See Multivariate Normal Regression, Least-Squares Regression, Covariance-Weighted Least Squares, Feasible Generalized Least Squares, and Seemingly Unrelated Regression.

References

Roderick J. A. Little and Donald B. Rubin, Statistical Analysis with Missing Data, 2nd ed., John Wiley & Sons, Inc., 2002.

Xiao-Li Meng and Donald B. Rubin, "Maximum Likelihood Estimation via the ECM Algorithm," Biometrika, Vol. 80, No. 2, 1993, pp. 267-278.

Joe Sexton and Anders Rygh Swensen, "ECM Algorithms that Converge at the Rate of EM," Biometrika, Vol. 87, No. 3, 2000, pp. 651-662.

A. P. Dempster, N.M. Laird, and D. B. Rubin, "Maximum Likelihood from Incomplete Data via the EM Algorithm," Journal of the Royal Statistical Society, Series B, Vol. 39, No. 1, 1977, pp. 1-37.

See Also

ecmlsrobj, ecmmvnrmle, mvnrmle

  


 © 1984-2008- The MathWorks, Inc.    -   Site Help   -   Patents   -   Trademarks   -   Privacy Policy   -   Preventing Piracy   -   RSS