ecmlsrmle

Least-squares regression with missing data

Syntax

[Parameters,Covariance,Resid,Info] = ecmlsrmle(Data,Design)

[Parameters,Covariance,Resid,Info] = ecmlsrmle(___,MaxIterations,TolParam,TolObj,Param0,Covar0,CovarFormat)

Description

[Parameters,Covariance,Resid,Info] = ecmlsrmle(Data,Design) estimates a least-squares regression model with missing data. The model has the form

$D a t a_{k} \sim N (D e s i g n_{k} \times P a r a m e t e r s, C o v a r i a n c e)$

for samples k = 1, ... , NUMSAMPLES.

ecmlsrmle estimates a NUMPARAMS-by-1 column vector of model parameters called Parameters, and a NUMSERIES-by-NUMSERIES matrix of covariance parameters called Covariance.

ecmlsrmle(Data,Design) with no output arguments plots the log-likelihood function for each iteration of the algorithm.

[Parameters,Covariance,Resid,Info] = ecmlsrmle(___,MaxIterations,TolParam,TolObj,Param0,Covar0,CovarFormat) estimates a least-squares regression model with missing data using optional arguments.

Input Arguments

collapse all

`Data` — Data sample
matrix

Data sample, specified as an NUMSAMPLES-by-NUMSERIES matrix with NUMSAMPLES samples of a NUMSERIES-dimensional random vector. If a data sample has missing values, represented as NaNs, the sample is ignored. (Use mvnrmle to handle missing data.)

Data Types: double

`Design` — Model design
matrix | cell array of character vectors

Model design, specified as a matrix or a cell array that handles two model structures:

If NUMSERIES = 1, Design is a NUMSAMPLES-by-NUMPARAMS matrix with known values. This structure is the standard form for regression on a single series.
If NUMSERIES ≥ 1, Design is a cell array. The cell array contains either one or NUMSAMPLES cells. Each cell contains a NUMSERIES-by-NUMPARAMS matrix of known values.
If Design has a single cell, it is assumed to have the same Design matrix for each sample. If Design has more than one cell, each cell contains a Design matrix for each sample.

These points concern how Design handles missing data:

Although Design should not have NaN values, ignored samples due to NaN values in Data are also ignored in the corresponding Design array.
If Design is a 1-by-1 cell array, which has a single Design matrix for each sample, no NaN values are permitted in the array. A model with this structure must have NUMSERIES ≥ NUMPARAMS with rank(Design{1}) = NUMPARAMS.
ecmlsrmle is more strict than mvnrmle about the presence of NaN values in the Design array.

Data Types: double | cell

`MaxIterations` — Maximum number of iterations for the estimation algorithm
`100` (default) | numeric

(Optional) Maximum number of iterations for the estimation algorithm, specified as a numeric. The default value is 100.

Data Types: double

`TolParam` — Convergence tolerance for estimation algorithm
`sqrt(eps)` (default) | numeric

(Optional) Convergence tolerance for estimation algorithm based on changes in model parameter estimates, specified as a numeric. The Default value is sqrt(eps) which is about 1.0e-8 for double precision. The convergence test for changes in model parameters is

$‖ P a r a m_{k} - P a r a m_{k - 1} ‖ < T o l P a r a m \times (1 + ‖ P a r a m_{k} ‖)$

where Param represents the output Parameters, and iteration k = 2, 3, ... . Convergence is assumed when both the TolParam and TolObj conditions are satisfied. If both TolParam ≤ 0 and TolObj ≤ 0, do the maximum number of iterations (MaxIterations), whatever the results of the convergence tests.

Data Types: double | table | timetable

`TolObj` — Convergence tolerance for estimation algorithm based on changes in objective function
eps ∧ 3/4 (default) | numeric

(Optional) Convergence tolerance for estimation algorithm based on changes in the objective function, specified as a numeric. The default value is eps ∧ 3/4 which is about 1.0e-12 for double precision. The convergence test for changes in the objective function is

$| O b j_{k} - O b j_{k - 1} | < T o l O b j \times (1 + | O b j_{k} |)$

for iteration k = 2, 3, ... . Convergence is assumed when both the TolParam and TolObj conditions are satisfied. If both TolParam ≤ 0 and TolObj ≤ 0, do the maximum number of iterations (MaxIterations), whatever the results of the convergence tests.

Data Types: double

`Param0` — User-supplied initial estimate for the parameters of regression model
zero vector (default) | vector

(Optional) User-supplied initial estimate for the parameters of the regression model, specified as an NUMPARAMS-by-1 column vector.

Data Types: double

`Covar0` — User-supplied initial or known estimate for the covariance matrix of the regression residuals
identity matrix (default) | matrix

(Optional) User-supplied initial or known estimate for the covariance matrix of the regression residuals, specified as an NUMSERIES-by-NUMSERIES matrix.

For covariance-weighted least-squares calculations, this matrix corresponds with weights for each series in the regression. The matrix also serves as an initial guess for the residual covariance in the expectation conditional maximization (ECM) algorithm.

Data Types: double

`CovarFormat` — Format for covariance matrix
`'full'` (default) | character vector with value `'full'` or `'diagonal'`

(Optional) Format for the covariance matrix, specified as a character vector. The choices are:

'full' — This is the default method that computes the full covariance matrix.
'diagonal' — This forces the covariance matrix to be a diagonal matrix.

Data Types: char

Output Arguments

collapse all

`Parameters` — Parameters of the regression model
vector

Parameters of the regression model, returned as an NUMPARAMS-by-1 column vector of estimates for the parameters of the regression model.

`Covariance` — Covariance of the regression model's residuals
matrix

Covariance of the regression model's residuals, returned as an NUMSERIES-by-NUMSERIES matrix of estimates for the covariance of the regression model's residuals.

`Resid` — Residuals from the regression
matrix

Residuals from the regression, returned as an NUMSAMPLES-by-NUMSERIES matrix of residuals from the regression.

`Info` — Structure containing additional information from the regression
structure

Structure containing additional information from the regression, returned as a structure. The structure has these fields:

Info.Obj – A variable-extent column vector, with no more than MaxIterations elements, that contain each value of the objective function at each iteration of the estimation algorithm. The last value in this vector, Obj(end), is the terminal estimate of the objective function. If you do least squares, the objective function is the least squares objective function.
Info.PrevParameters – NUMPARAMS-by-1 column vector of estimates for the model parameters from the iteration just before the terminal iteration.
Info.PrevCovariance – NUMSERIES-by-NUMSERIES matrix of estimates for the covariance parameters from the iteration just before the terminal iteration.

Use the estimates in the output structure Info for diagnostic purposes.

More About

collapse all

Least Squares Regression

Least squares regression is a statistical method used to estimate the relationships between variables.

Least squares regression is commonly employed in linear regression analysis to find the best-fitting line through a set of data points. The goal is to minimize the sum of the squares of the differences (residuals) between the observed values and the values predicted by the model.

References

[1] Dempster A, P., N.M. Laird, and D. B. Rubin. “Maximum Likelihood from Incomplete Data via the EM Algorithm.” Journal of the Royal Statistical Society. Series B, Vol. 39, No. 1, 1977, pp. 1–37.

[2] Roderick J., A. Little, and Donald B. Rubin. Statistical Analysis with Missing Data., 2nd Edition. John Wiley & Sons, Inc., 2002.

[3] Sexton J. and Anders Rygh Swensen. “ECM Algorithms that Converge at the Rate of EM.” Biometrika. Vol. 87, No. 3, 2000, pp. 651–662.

[4] Xiao-Li Meng and Donald B. Rubin. “Maximum Likelihood Estimation via the ECM Algorithm.” Biometrika. Vol. 80, No. 2, 1993, pp. 267–278.

Version History

Introduced in R2006a

ecmlsrmle

Syntax

Description

Input Arguments

`Data` — Data sample
matrix

`Design` — Model design
matrix | cell array of character vectors

`MaxIterations` — Maximum number of iterations for the estimation algorithm
`100` (default) | numeric

`TolParam` — Convergence tolerance for estimation algorithm
`sqrt(eps)` (default) | numeric

`TolObj` — Convergence tolerance for estimation algorithm based on changes in objective function
eps ∧ 3/4 (default) | numeric

`Param0` — User-supplied initial estimate for the parameters of regression model
zero vector (default) | vector

`Covar0` — User-supplied initial or known estimate for the covariance matrix of the regression residuals
identity matrix (default) | matrix

`CovarFormat` — Format for covariance matrix
`'full'` (default) | character vector with value `'full'` or `'diagonal'`

Output Arguments

`Parameters` — Parameters of the regression model
vector

`Covariance` — Covariance of the regression model's residuals
matrix

`Resid` — Residuals from the regression
matrix

`Info` — Structure containing additional information from the regression
structure

More About

Least Squares Regression

References

Version History

See Also

Topics

ecmlsrmle

Syntax

Description

Input Arguments

Data — Data sample matrix

Design — Model design matrix | cell array of character vectors

MaxIterations — Maximum number of iterations for the estimation algorithm 100 (default) | numeric

TolParam — Convergence tolerance for estimation algorithm sqrt(eps) (default) | numeric

TolObj — Convergence tolerance for estimation algorithm based on changes in objective function eps ∧ 3/4 (default) | numeric

Param0 — User-supplied initial estimate for the parameters of regression model zero vector (default) | vector

Covar0 — User-supplied initial or known estimate for the covariance matrix of the regression residuals identity matrix (default) | matrix

CovarFormat — Format for covariance matrix 'full' (default) | character vector with value 'full' or 'diagonal'

Output Arguments

Parameters — Parameters of the regression model vector

Covariance — Covariance of the regression model's residuals matrix

Resid — Residuals from the regression matrix

Info — Structure containing additional information from the regression structure

More About

Least Squares Regression

References

Version History

See Also

Topics

`Data` — Data sample
matrix

`Design` — Model design
matrix | cell array of character vectors

`MaxIterations` — Maximum number of iterations for the estimation algorithm
`100` (default) | numeric

`TolParam` — Convergence tolerance for estimation algorithm
`sqrt(eps)` (default) | numeric

`TolObj` — Convergence tolerance for estimation algorithm based on changes in objective function
eps ∧ 3/4 (default) | numeric

`Param0` — User-supplied initial estimate for the parameters of regression model
zero vector (default) | vector

`Covar0` — User-supplied initial or known estimate for the covariance matrix of the regression residuals
identity matrix (default) | matrix

`CovarFormat` — Format for covariance matrix
`'full'` (default) | character vector with value `'full'` or `'diagonal'`

`Parameters` — Parameters of the regression model
vector

`Covariance` — Covariance of the regression model's residuals
matrix

`Resid` — Residuals from the regression
matrix

`Info` — Structure containing additional information from the regression
structure