# Model Seasonal Lag Effects Using Indicator Variables

This example shows how to estimate a seasonal ARIMA model:

- Model the seasonal effects using a multiplicative seasonal model.
- Use indicator variables as a regression component for the seasonal effects, called seasonal dummies.

Subsequently, their forecasts show that the methods produce similar results. The time series is monthly international airline passenger numbers from 1949 to 1960.

## Contents

## Step 1. Load the data.

Load the data set `Data_Airline`, and plot the natural log of the monthly passenger totals counts.

load(fullfile(matlabroot,'examples','econ','Data_Airline.mat')) dat = log(Data); % Transform to logarithmic scale T = size(dat,1); y = dat(1:103); % estimation sample

`y` is the part of `dat` used for estimation, and the rest of `dat` is the holdout sample to compare the two models' forecasts.

## Step 2. Define and fit the model specifying seasonal lags.

Create an model

where is an independent and identically distributed normally distributed series with mean 0 and variance . Use `estimate` to fit `model1` to `y`.

model1 = arima('MALags', 1, 'D', 1, 'SMALags', 12,... 'Seasonality',12, 'Constant', 0); fit1 = estimate(model1,y);

ARIMA(0,1,1) Model Seasonally Integrated with Seasonal MA(12): --------------------------------------------------------------- Conditional Probability Distribution: Gaussian Standard t Parameter Value Error Statistic ----------- ----------- ------------ ----------- Constant 0 Fixed Fixed MA{1} -0.357317 0.088031 -4.05899 SMA{12} -0.614686 0.0962493 -6.38639 Variance 0.00130504 0.000152696 8.54666

The fitted model is

where is an iid normally distributed series with mean 0 and variance 0.0013.

## Step 3. Define and fit the model using seasonal dummies.

Create an ARIMAX(0,1,1) model with period 12 seasonal differencing and a regression component,

is a series of *T* column vectors having length 12 that indicate in which month observation was measured. A 1 in row *i* of indicates that the observation was measured in month *i*, the rest of the elements are 0s.

Note that if you include an additive constant in the model, then the *T* rows of the design matrix `X` are composed of the row vectors . Therefore, `X` is rank deficient, and one regression coefficient is not identifiable. A constant is left out of this example to avoid distraction from the main purpose. Format the in-sample X matrix

X = dummyvar(repmat((1:12)', 12, 1)); % Format the presample X matrix X0 = [zeros(1,11) 1 ; dummyvar((1:12)')]; model2 = arima('MALags', 1, 'D', 1, 'Seasonality',... 12, 'Constant', 0); fit2 = estimate(model2,y, 'X', [X0 ; X]);

ARIMAX(0,1,1) Model Seasonally Integrated: ------------------------------------------- Conditional Probability Distribution: Gaussian Standard t Parameter Value Error Statistic ----------- ----------- ------------ ----------- Constant 0 Fixed Fixed MA{1} -0.407106 0.0843875 -4.82425 Beta1 -0.00257697 0.0251683 -0.10239 Beta2 -0.00577689 0.0318848 -0.18118 Beta3 -0.00220339 0.0305268 -0.0721787 Beta4 0.000947373 0.0198667 0.0476865 Beta5 -0.0012146 0.0179806 -0.0675506 Beta6 0.00486998 0.018374 0.265047 Beta7 -0.00879439 0.0152852 -0.575354 Beta8 0.00483464 0.0124836 0.387279 Beta9 0.00143697 0.0182453 0.0787581 Beta10 0.00927404 0.0147513 0.628693 Beta11 0.00736654 0.0105 0.701577 Beta12 0.000988406 0.0142945 0.0691458 Variance 0.00177152 0.000246566 7.18475

The fitted model is

where is an iid normally distributed series with mean 0 and variance 0.0017 and is a column vector with the values `Beta1` - `Beta12`. Note that the estimates `MA{1}` and `Variance` between `model1` and `model2` are not equal.

## Step 4. Forecast using both models.

Use `forecast` to forecast both models 41 periods into the future from July 1957. Plot the holdout sample using these forecasts.

yF1 = forecast(fit1,41,'Y0',y); yF2 = forecast(fit2,41,'Y0',y,'X0',X(1:103,:),... 'XF',X(104:end,:)); l1 = plot(100:T,dat(100:end),'k','LineWidth',3); hold on l2 = plot(104:144,yF1,'-r','LineWidth',2); l3 = plot(104:144,yF2,'-b','LineWidth',2); hold off title('Passenger Data: Actual vs. Forecasts') xlabel('Month') ylabel('Logarithm of Monthly Passenger Data') legend({'Actual Data','Polynomial Forecast',... 'Regression Forecast'},'Location','NorthWest')

Though they overpredict the holdout observations, the forecasts of both models are almost equivalent. One main difference between the models is that `model1` is more parsimonious than `model2`.

References:

Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. *Time Series Analysis: Forecasting and Control*. 3rd ed. Englewood Cliffs, NJ: Prentice Hall, 1994.