Accelerating the pace of engineering and science

# Documentation

## Compare Alternative ARIMA Model Representations

### regARIMA to ARIMAX Model Conversion

ARIMAX models and regression models with ARIMA errors are closely related, and the choice of which to use is generally dictated by your goals for the analysis. If your objective is to fit a parsimonious model to data and forecast responses, then there is very little difference between the two models.

If you are more interested in preserving the usual interpretation of a regression coefficient as a measure of sensitivity, i.e., the effect of a unit change in a predictor variable on the response, then use a regression model with ARIMA errors. Regression coefficients in ARIMAX models do not possess that interpretation because of the dynamic dependence on the response [1].

Suppose that you have the parameter estimates from a regression model with ARIMA errors, and you want to see how the model structure compares to ARIMAX model. Or, suppose you want some insight as to the underlying relationship between the two models.

The ARIMAX model is (t = 1,...,T):

 $Η\left(L\right){y}_{t}=c+{X}_{t}\beta +Ν\left(L\right){\epsilon }_{t},$ (4-11)

where

• yt is the univariate response series.

• Xt is row t of X, which is the matrix of concatenated predictor series. That is, Xt is observation t of each predictor series.

• β is the regression coefficient.

• c is the regression model intercept.

• $Η\left(L\right)=\varphi \left(L\right){\left(1-L\right)}^{D}\Phi \left(L\right)\left(1-{L}^{s}\right)=1-{\eta }_{1}L-{\eta }_{2}{L}^{2}-...-{\eta }_{P}{L}^{P},$ which is the degree P lag operator polynomial that captures the combined effect of the seasonal and nonseasonal autoregressive polynomials, and the seasonal and nonseasonal integration polynomials. For more details on notation, see Multiplicative ARIMA Model.

• $Ν\left(L\right)=\theta \left(L\right)\Theta \left(L\right)=1+{\nu }_{1}L+{\nu }_{2}{L}^{2}+...+{\nu }_{Q}{L}^{Q},$ which is the degree Q lag operator polynomial that captures the combined effect of the seasonal and nonseasonal moving average polynomials.

• εt is a white noise innovation process.

The regression model with ARIMA errors is (t = 1,...,T)

 $\begin{array}{l}{y}_{t}=c+{X}_{t}\beta +{u}_{t}\\ A\left(L\right){u}_{t}=B\left(L\right){\epsilon }_{t},\end{array}$ (4-12)

where

• ut is the unconditional disturbances process.

• $A\left(L\right)=\varphi \left(L\right){\left(1-L\right)}^{D}\Phi \left(L\right)\left(1-{L}^{s}\right)=1-{a}_{1}L-{a}_{2}{L}^{2}-...-{a}_{P}{L}^{P},$ which is the degree P lag operator polynomial that captures the combined effect of the seasonal and nonseasonal autoregressive polynomials, and the seasonal and nonseasonal integration polynomials.

• $B\left(L\right)=\theta \left(L\right)\Theta \left(L\right)=1+{b}_{1}L+{b}_{2}{L}^{2}+...+{b}_{Q}{L}^{Q},$ which is the degree Q lag operator polynomial that captures the combined effect of the seasonal and nonseasonal moving average polynomials.

The values of the variables defined in Equation 4-12 are not necessarily equivalent to the values of the variables in Equation 4-11, even though the notation might be similar.

### Illustrate regARIMA to ARIMAX Model Conversion

Consider Equation 4-12, the regression model with ARIMA errors. Use the following operations to convert the regression model with ARIMA errors to its corresponding ARIMAX model.

1. Solve for ut..

$\begin{array}{l}{y}_{t}=c+{X}_{t}\beta +{u}_{t}\\ {u}_{t}=\frac{B\left(L\right)}{A\left(L\right)}{\epsilon }_{t}.\end{array}$

2. Substitute ut into the regression equation.

$\begin{array}{c}{y}_{t}=c+{X}_{t}\beta +\frac{B\left(L\right)}{A\left(L\right)}{\epsilon }_{t}\\ A\left(L\right){y}_{t}=A\left(L\right)c+A\left(L\right){X}_{t}\beta +B\left(L\right){\epsilon }_{t}.\end{array}$

3. Solve for yt.

 $\begin{array}{c}{y}_{t}=A\left(L\right)c+A\left(L\right){X}_{t}\beta +\sum _{k=1}^{P}{a}_{k}{y}_{t-k}+B\left(L\right){\epsilon }_{t}\\ =A\left(L\right)c+{Z}_{t}\Gamma +\sum _{k=1}^{P}{a}_{k}{y}_{t-k}+B\left(L\right){\epsilon }_{t}.\end{array}$ (4-13)
• A(L)c = (1 – a1a2 –...– aP)c. That is, the constant in the ARIMAX model is the intercept in the regression model with ARIMA errors with a nonlinear constraint. Though applications, such as simulate, handle this constraint, estimate cannot incorporate such a constraint. In the latter case, the models are equivalent when you fix the intercept and constant to 0.

• In the term A(L)Xtβ, the lag operator polynomial A(L) filters the T-by-1 vector Xtβ, which is the linear combination of the predictors weighted by the regression coefficients. This filtering process requires P presample observations of the predictor series.

• arima constructs the matrix Zt as follows:

• Each column of Zt corresponds to each term in A(L).

• The first column of Zt is the vector Xtβ.

• The second column of Zt is a sequence of d2 NaNs (d2 is the degree of the second term in A(L)), followed by the product ${L}^{{d}_{j}}{X}_{t}\beta$. That is, the software attaches d2 NaNs at the beginning of the T-by-1 column, attaches Xtβ after the NaNs, but truncates the end of that product by d2 observations.

• The jth column of Zt is a sequence of dj NaNs (dj is the degree of the jth term in A(L)), followed by the product ${L}^{{d}_{j}}{X}_{t}\beta$. That is, the software attaches dj NaNs at the beginning of the T-by-1 column, attaches Xtβ after the NaNs, but truncates the end of that product by dj observations.

.

• Γ = [1 –a1 –a2 ... –aP]'.

The arima converter removes all zero-valued autoregressive coefficients of the difference equation. Subsequently, the arima converter does not associate zero-valued autoregressive coefficients with columns in Zt, nor does it include corresponding, zero-valued coefficients in Γ.

4. Rewrite Equation 4-13,

${y}_{t}=\left(1-\sum _{k=1}^{P}{a}_{k}\right)c+{X}_{t}\beta -\sum _{k=1}^{P}{a}_{k}{X}_{t-k}\beta +\sum _{k=1}^{P}{a}_{k}{y}_{t-k}+{\epsilon }_{t}+\sum _{k=1}^{Q}{\epsilon }_{t-k}.$

For example, consider the following regression model whose errors are ARMA(2,1):

 $\begin{array}{c}{y}_{t}=0.2+0.5{X}_{t}+{u}_{t}\\ \left(1-0.8L+0.4{L}^{2}\right){u}_{t}=\left(1+0.3L\right){\epsilon }_{t}.\end{array}$ (4-14)

The equivalent ARMAX model is:

$\begin{array}{c}{y}_{t}=0.12+\left(0.5-0.4L+0.2{L}^{2}\right){X}_{t}+0.8{y}_{t-1}-0.4{y}_{t-2}+\left(1+0.3L\right){\epsilon }_{t}\\ =0.12+{Z}_{t}\Gamma +0.8{y}_{t-1}-0.4{y}_{t-2}+\left(1+0.3L\right){\epsilon }_{t},\end{array}$

or

$\left(1-0.8L+0.4{L}^{2}\right){y}_{t}=0.12+{Z}_{t}\Gamma +\left(1+0.3L\right){\epsilon }_{t},$

where Γ = [1 –0.8 0.4]' and

${Z}_{t}=0.5\left[\begin{array}{ccc}{x}_{1}& NaN& NaN\\ {x}_{2}& {x}_{1}& NaN\\ {x}_{3}& {x}_{2}& {x}_{1}\\ ⋮& ⋮& ⋮\\ {x}_{T}& {x}_{T-1}& {x}_{T-2}\end{array}\right].$

This model is not integrated because all of the eigenvalues associated with the AR polynomial are within the unit circle, but the predictors might affect the otherwise stable process. Also, you need presample predictor data going back at least 2 periods to, for example, fit the model to data.

You can illustrate this further through simulation and estimation.

1. Specify the regression model with ARIMA errors in Equation 4-14.

Mdl1 = regARIMA('Intercept',0.2,'AR',{0.8 -0.4},...
'MA',0.3,'Beta',[0.3 -0.2],'Variance',0.2);

2. Generate presample observations and predictor data.

rng(1);   % For reproducibility
T = 100;
maxPQ = max(Mdl1.P,Mdl1.Q);
numObs  = T + maxPQ;...
% Adjust number of observations to account for presample
X1 = randn(numObs,2); % Simulate predictor data
u0 = randn(maxPQ,1);  % Presample unconditional disturbances u(t)
e0 = randn(maxPQ,1);  % Presample innovations e(t)

3. Simulate data from Mdl1.

rng(100) % For reproducibility
[y1,e1,u1] = simulate(Mdl1,T,'U0',u0,...
'E0',e0,'X',X1);

4. Convert Mdl1 to an ARIMAX model.

[Mdl2,X2] = arima(Mdl1,'X',X1);
Mdl2

Mdl2 =

ARIMAX(2,0,1) Model:
---------------------
Distribution: Name = 'Gaussian'
P: 2
D: 0
Q: 1
Constant: 0.12
AR: {0.8 -0.4} at Lags [1 2]
SAR: {}
MA: {0.3} at Lags [1]
SMA: {}
Beta: [1 -0.8 0.4]
Variance: 0.2

5. Generate presample responses for the ARIMAX model to ensure consistency with Mdl1. Simulate data from Mdl2.

y0 = Mdl1.Intercept + X1(1:maxPQ,:)*Mdl1.Beta' + u0;
rng(100)
y2 = simulate(Mdl2,T,'Y0',y0,'E0',e0,'X',X2);

figure
plot(y1,'LineWidth',3)
hold on
plot(y2,'r:','LineWidth',2.5)
hold off
title('{\bf Simulated Paths for Both Models}')
legend('regARIMA Model','ARIMAX Model','Location','Best')


The simulated paths are equivalent because the arima converter enforces the nonlinear constraint when it converts the regression model intercept to the ARIMAX model constant.

6. Fit a regression model with ARIMA errors to the simulated data.

ToEstMdl1 = regARIMA('ARLags',[1 2],'MALags',1);
EstMdl1 = estimate(ToEstMdl1,y1,'E0',e0,'U0',u0,'X',X1);


Regression with ARIMA(2,0,1) Error Model:
------------------------------------------
Conditional Probability Distribution: Gaussian

Standard          t
Parameter       Value          Error       Statistic
-----------   -----------   ------------   -----------
Intercept       0.140736      0.101405        1.38787
AR{1}       0.830611      0.137504        6.04065
AR{2}      -0.454025      0.116397       -3.90067
MA{1}       0.428031      0.151453        2.82616
Beta1       0.295519     0.0229383        12.8832
Beta2      -0.176007     0.0306069       -5.75057
Variance       0.182313     0.0277648        6.56633

7. Fit an ARIMAX model to the simulated data.

ToEstMdl2 = arima('ARLags',[1 2],'MALags',1);
EstMdl2 = estimate(ToEstMdl2,y2,'E0',e0,'Y0',...
y0,'X',X2);


ARIMAX(2,0,1) Model:
---------------------
Conditional Probability Distribution: Gaussian

Standard          t
Parameter       Value          Error       Statistic
-----------   -----------   ------------   -----------
Constant      0.0849961     0.0642166        1.32359
AR{1}       0.831361      0.136345        6.09748
AR{2}      -0.455993       0.11788       -3.86828
MA{1}          0.426      0.157526        2.70431
Beta1        1.05303      0.136849        7.69485
Beta2        -0.6904      0.192617       -3.58432
Beta3       0.453993      0.153522        2.95718
Variance       0.181119     0.0288359        6.28103

8. Convert EstMdl1 to an ARIMAX model.

ConvertedMdl2 = arima(EstMdl1,'X',X1)

ConvertedMdl2 =

ARIMAX(2,0,1) Model:
---------------------
Distribution: Name = 'Gaussian'
P: 2
D: 0
Q: 1
Constant: 0.087737
AR: {0.830611 -0.454025} at Lags [1 2]
SAR: {}
MA: {0.428031} at Lags [1]
SMA: {}
Beta: [1 -0.830611 0.454025]
Variance: 0.182313


The estimated ARIMAX model constant is not equivalent to the ARIMAX model constant converted from the regression model with ARIMA errors. In other words, EstMdl2.Constant = 0.0849961 and ConvertedMdl2.Constant = 0.087737. This is because estimate does not enforce the nonlinear constraint that the arima converter enforces. As a result, the other estimates are not equivalent either, albeit close.

## References

[1] Hyndman, R. J. (2010, October). "The ARIMAX Model Muddle." Rob J. Hyndman. Retrieved February 7, 2013 from http://robjhyndman.com/researchtips/arimax/.