Documentation |
On this page… |
---|
A unit root process is a data-generating process whose first difference is stationary. In other words, a unit root process y_{t} has the form
y_{t} = y_{t–1} + stationary process.
A unit root test attempts to determine whether a given time series is consistent with a unit root process.
The next section gives more details of unit root processes, and suggests why it is important to detect them.
There are two basic models for economic data with linear growth characteristics:
Trend-stationary process (TSP): y_{t} = c + δt + stationary process
Unit root process, also called a difference-stationary process (DSP): Δy_{t} = δ + stationary process
Here Δ is the differencing operator, Δy_{t} = y_{t} – y_{t–1} = (1 – L)y_{t}, where L is the lag operator defined by L^{i}y_{t} = y_{t – i}.
The processes are indistinguishable for finite data. In other words, there are both a TSP and a DSP that fit a finite data set arbitrarily well. However, the processes are distinguishable when restricted to a particular subclass of data-generating processes, such as AR(p) processes. After fitting a model to data, a unit root test checks if the AR(1) coefficient is 1.
There are two main reasons to distinguish between these types of processes:
A TSP and a DSP produce different forecasts. Basically, shocks to a TSP return to the trend line c + δt as time increases. In contrast, shocks to a DSP might be persistent over time.
For example, consider the simple trend-stationary model
y_{1,t} = 0.9y_{1,t – 1} + 0.02t + ε_{1,t}
and the difference-stationary model
y_{2,t} = 0.2 + y_{2,t – 1} + ε_{2,t}.
In these models, ε_{1,t} and ε_{2,t} are independent innovation processes. For this example, the innovations are independent and distributed N(0,1).
Both processes grow at rate 0.2. To calculate the growth rate for the TSP, which has a linear term 0.02t, set ε_{1}(t) = 0. Then solve the model y_{1}(t) = c + δt for c and δ:
c + δt = 0.9(c + δ(t–1)) + 0.02t.
The solution is c = –1.8, δ = 0.2.
A plot for t = 1:1000 shows the TSP stays very close to the trend line, while the DSP has persistent deviations away from the trend line.
T = 1000; % Sample size t = (1:T)'; % Period vector rng(5); % For reproducibility randm = randn(T,2); % Innovations y = zeros(T,2); % Columns of y are data series % Build trend stationary series y(:,1) = .02*t + randm(:,1); for ii = 2:T y(ii,1) = y(ii,1) + y(ii-1,1)*.9; end % Build difference stationary series y(:,2) = .2 + randm(:,2); y(:,2) = cumsum(y(:,2)); figure plot(y(:,1),'b') hold on plot(y(:,2),'g') plot((1:T)*0.2,'k--') legend('Trend Stationary','Difference Stationary',... 'Trend Line','Location','NorthWest') hold off
Forecasts based on the two series are different. To see this difference, plot the predicted behavior of the two series using vgxpred. The following plot shows the last 100 data points in the two series and predictions of the next 100 points, including confidence bounds.
Mdl = vgxset('AR',zeros(2),'ARSolve',... [true false;false true],'nx',1,'Constant',... true,'n',2); % Model for independent processes tcell = cell(1000,1); % Time as exogenous input for i=1:1000 tcell{i} = [i;0]; end MdlFitted = vgxvarx(Mdl,y,tcell); MdlFitted = vgxset(MdlFitted,'Series',... {'Trend stationary','Difference stationary'}); fx = cell(100,1); for i = 1:100 fx{i} = [i+1000;0]; % Future times for prediction end [ynew,ycov] = vgxpred(MdlFitted,100,fx,y); % This generates predictions for 100 time steps figure; vgxplot(MdlFitted,y(end-100:end,:),ynew,ycov); subplot(2,1,1); hold on; plot((T-100:T+100)*0.2,'k--'); axis tight; subplot(2,1,2); hold on; plot((T-100:T+100)*0.2,'k--'); axis tight;
Examine the fitted parameters by executing vgxdisp(specfitted) and you find vgxvarx did an excellent job.
The TSP has confidence intervals that do not grow with time, whereas the DSP has confidence intervals that grow. Furthermore, the TSP goes to the trend line quickly, while the DSP does not tend towards the trend line y = 0.2t asymptotically.
The presence of unit roots can lead to false inferences in regressions between time series.
Suppose x_{t} and y_{t} are unit root processes with independent increments, such as random walks with drift
x_{t} = c_{1} + x_{t–1} + ε_{1}(t)
y_{t} = c_{2} + y_{t–1} + ε_{2}(t),
where ε_{i}(t) are independent innovations processes. Regressing y on x results, in general, in a nonzero regression coefficient, and significant coefficient of determination R^{2}. This result holds despite x_{t} and y_{t} being independent random walks.
If both processes have trends (c_{i} ≠ 0), there is a correlation between x and y because of their linear trends. However, even if the c_{i} = 0, the presence of unit roots in the x_{t} and y_{t} processes yields correlation. For more information on spurious regression, see Granger and Newbold [1].
There are four Econometrics Toolbox™ tests for unit roots. These functions test for the existence of a single unit root. When there are two or more unit roots, the results of these tests might not be valid.
adftest performs the augmented Dickey-Fuller test. pptest performs the Phillips-Perron test. These two classes of tests have a null hypothesis of a unit root process of the form
y_{t} = y_{t–1} + c + δt + ε_{t},
which the functions test against an alternative model
y_{t} = γy_{t–1} + c + δt + ε_{t},
where γ < 1. The null and alternative models for a Dickey-Fuller test are like those for a Phillips-Perron test. The difference is adftest extends the model with extra parameters accounting for serial correlation among the innovations:
y_{t} = c + δt + γy_{t – 1} + ϕ_{1}Δy_{t – 1} + ϕ_{2}Δy_{t – 2} +...+ ϕ_{p}Δy_{t – p } + ε_{t,}
where
L is the lag operator: Ly_{t} = y_{t–1}.
Δ = 1 – L, so Δy_{t} = y_{t} – y_{t–1}.
ε_{t} is the innovations process.
Phillips-Perron adjusts the test statistics to account for serial correlation.
There are three variants of both adftest and pptest, corresponding to the following values of the 'model' parameter:
'AR' assumes c and δ, which appear in the preceding equations, are both 0; the 'AR' alternative has mean 0.
'ARD' assumes δ is 0. The 'ARD' alternative has mean c/(1–γ).
'TS' makes no assumption about c and δ.
For information on how to choose the appropriate value of 'model', see Choose Models to Test.
The KPSS test, kpsstest, is an inverse of the Phillips-Perron test: it reverses the null and alternative hypotheses. The KPSS test uses the model:
y_{t} = c_{t} + δt + u_{t},
with
c_{t} = c_{t–1} + v_{t}.
Here u_{t} is a stationary process, and v_{t} is an i.i.d. process with mean 0 and variance σ^{2}. The null hypothesis is that σ^{2} = 0, so that the random walk term c_{t} becomes a constant intercept. The alternative is σ^{2} > 0, which introduces the unit root in the random walk.
The variance ratio test, vratiotest, is based on the fact that the variance of a random walk increases linearly with time. vratiotest can also take into account heteroscedasticity, where the variance increases at a variable rate with time. The test has a null hypotheses of a random walk:
Δy_{t} = ε_{t}.
Transform your time series to be approximately linear before testing for a unit root. If a series has exponential growth, take its logarithm. For example, GDP and consumer prices typically have exponential growth, so test their logarithms for unit roots.
If you want to transform your data to be stationary instead of approximately linear, unit root tests can help you determine whether to difference your data, or to subtract a linear trend. For a discussion of this topic, see What Is a Unit Root Test?
For adftest or pptest, choose model in as follows:
If your data shows a linear trend, set model to 'TS'.
If your data shows no trend, but seem to have a nonzero mean, set model to 'ARD'.
If your data shows no trend and seem to have a zero mean, set model to 'AR' (the default).
For kpsstest, set trend to true (default) if the data shows a linear trend. Otherwise, set trend to false.
For vratiotest, set IID to true if you want to test for independent, identically distributed innovations (no heteroscedasticity). Otherwise, leave IID at the default value, false. Linear trends do not affect vratiotest.
Setting appropriate lags depends on the test you use:
adftest — One method is to begin with a maximum lag, such as the one recommended by Schwert [2]. Then, test down by assessing the significance of the coefficient of the term at lag p_{max}. Schwert recommends a maximum lag of
$${p}_{\mathrm{max}}=\text{maximumlag}=\lfloor 12{\left(T/100\right)}^{1/4}\rfloor ,$$
where $$\lfloor x\rfloor $$ is the integer part of x. The usual t statistic is appropriate for testing the significance of coefficients, as reported in the reg output structure.
Another method is to combine a measure of fit, such as SSR, with information criteria such as AIC, BIC, and HQC. These statistics also appear in the reg output structure. Ng and Perron [3] provide further guidelines.
kpsstest — One method is to begin with few lags, and then evaluate the sensitivity of the results by adding more lags. For consistency of the Newey-West estimator, the number of lags must go to infinity as the sample size increases. Kwiatkowski et al. [4] suggest using a number of lags on the order of T^{1/2}, where T is the sample size.
For an example of choosing lags for kpsstest, see Test Time Series Data for a Unit Root.
pptest — One method is to begin with few lags, and then evaluate the sensitivity of the results by adding more lags. Another method is to look at sample autocorrelations of y_{t} – y_{t–1}; slow rates of decay require more lags. The Newey-West estimator is consistent if the number of lags is O(T^{1/4}), where T is the effective sample size, adjusted for lag and missing values. White and Domowitz [5] and Perron [6] provide further guidelines.
For an example of choosing lags for pptest, see Test Time Series Data for a Unit Root.
vratiotest does not use lags.
Run multiple tests simultaneously by entering a vector of parameters for lags, alpha, model, or test. All vector parameters must have the same length. The test expands any scalar parameter to the length of a vector parameter. For an example using this technique, see Test Time Series Data for a Unit Root.
[1] Granger, C. W. J., and P. Newbold. "Spurious Regressions in Econometrics." Journal of Econometrics. Vol 2, 1974, pp. 111–120.
[2] Schwert, W. "Tests for Unit Roots: A Monte Carlo Investigation." Journal of Business and Economic Statistics. Vol. 7, 1989, pp. 147–159.
[3] Ng, S., and P. Perron. "Unit Root Tests in ARMA Models with Data-Dependent Methods for the Selection of the Truncation Lag." Journal of the American Statistical Association. Vol. 90, 1995, pp. 268–281.
[4] Kwiatkowski, D., P. C. B. Phillips, P. Schmidt and Y. Shin. "Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root." Journal of Econometrics. Vol. 54, 1992, pp. 159–178.
[5] White, H., and I. Domowitz. "Nonlinear Regression with Dependent Observations." Econometrica. Vol. 52, 1984, pp. 143–162.
[6] Perron, P. "Trends and Random Walks in Macroeconomic Time Series: Further Evidence from a New Approach." Journal of Economic Dynamics and Control. Vol. 12, 1988, pp. 297–332.
adftest | kpsstest | pptest | vgxpred | vratiotest