*Presample data* comes from time points
before the beginning of the observation period. In Econometrics Toolbox™,
you can specify your own presample data or use generated presample
data.

In a conditional mean model, the distribution of *ε _{t}* is
conditional on historical information. Historical information includes
past responses, $${y}_{1},{y}_{2},\dots ,{y}_{t-1}$$, past innovations, $${\epsilon}_{1},{\epsilon}_{2},\dots ,{\epsilon}_{t-1}$$, and, if you include them in
the model, past and present exogenous covariates, $${x}_{1},{x}_{2},\dots ,{x}_{t-1},{x}_{t}$$.

The number of past responses and innovations that a current innovation depends on is determined by the degree of the AR or MA operators, and any differencing. For example, in an AR(2) model, each innovation depends on the two previous responses,

$${\epsilon}_{t}={y}_{t}-c-{\varphi}_{1}{y}_{t-1}-{\varphi}_{2}{y}_{t-2}.$$

In ARIMAX models, the current innovation also depends
on the *current value* of the exogenous covariate
(unlike distributed lag models). For example, in an ARX(2) model with
one exogenous covariate, each innovation depends on the previous two
responses and the current value of the covariate,

$${\epsilon}_{t}={y}_{t}-c-{\varphi}_{1}{y}_{t-1}-{\varphi}_{2}{y}_{t-2}+{x}_{t}.$$

In general, the likelihood contribution of the first few innovations is conditional on historical information that might not be observable. How do you estimate the parameters without all the data? In the ARX(2) example, $${\epsilon}_{2}$$ explicitly depends on $${y}_{1},$$ $${y}_{0},$$ and $${x}_{2},$$ and $${\epsilon}_{1}$$ explicitly depends on $${y}_{0},$$ $${y}_{-1},$$ and $${x}_{1}$$. Implicitly, $${\epsilon}_{2}$$ depends on $${x}_{1}$$ and $${x}_{0},$$ and $${\epsilon}_{1}$$ depends on $${x}_{0}$$ and $${x}_{-1}.$$ However, you cannot observe $${y}_{0},$$ $${y}_{-1},$$ $${x}_{0},$$ and $${x}_{-1}.$$

The amount of presample data that you need to initialize a
model depends on the degree of the model. The property `P`

of
an `arima`

model specifies the number of presample
responses and exogenous data that you need to initialize the AR portion
of a conditional mean model. For example, `P = 2`

in
an ARX(2) model. Therefore, you need two responses and two data points
from *each* exogenous covariate series to initialize
the model.

One option is to use the first `P`

data from
the response and exogenous covariate series as your presample, and
then fit your model to the remaining data. This results in some loss
of sample size. If you plan to compare multiple potential models,
be aware that you can only use likelihood-based measures of fit (including
the likelihood ratio test and information criteria) to compare models
fit to the same data (of the same sample size). If you specify your
own presample data, then you must use the largest required number
of presample responses across all models that you want to compare.

The property `Q`

of an `arima`

model
specifies the number of presample innovations needed to initialize
the MA portion of a conditional mean model. You can get presample
innovations by dividing your data into two parts. Fit a model to the
first part, and infer the innovations. Then, use the inferred innovations
as presample innovations for estimating the second part of the data.

For a model with both an autoregressive and moving average component, you can specify both presample responses and innovations, one or the other, or neither.

By default, `estimate`

generates automatic
presample response and innovation data. The software:

Generates presample responses by backward forecasting.

Sets presample innovations to zero.

Does

*not*generate presample exogenous data. One option is to backward forecast each exogenous series to generate a presample during data preprocessing.

Was this topic helpful?