AR Order Selection with Partial Autocorrelation Sequence

This example shows how to assess the order of an autoregressive model using the partial autocorrelation sequence. For these processes, you can use the partial autocorrelation sequence to help with model order selection. For a stationary time series with values X(1),X(2),X(3),...,X(k+1), the partial autocorrelation sequence at lag k is the correlation between X(1) and X(k+1) after regressing X(1) and X(k+1) on the intervening observations X(2),X(3),X(4),...,X(k). For a moving average process, you can use the autocorrelation sequence to assess the order. However, for an autoregressive (AR) or autoregressive moving average (ARMA) process, the autocorrelation sequence does not help in order selection. Consider the AR(2) process defined by

X(n)+1.5X(n1)+0.75X(n2)=ε(n)

where ɛ(n) is an N(0,1)Gaussian white noise process. The following example

  • simulates a realization of the AR(2) process

  • graphically explores the correlation between lagged values of the time series

  • examines the sample autocorrelation sequence of the time series

  • fits an AR(15) model to the time series by solving the Yule-Walker equations (aryule)

  • uses the reflection coefficients returned by aryule to compute the partial autocorrelation sequence

  • examines the partial autocorrelation sequence to select the model order

Simulate a time series 1,000 samples in length from the AR(2) process defined by the difference equation. Set the random number generator to the default settings for reproducible results.

A = [1 1.5 0.75];
rng default
x = filter(1,A,randn(1000,1));

View the frequency response of the AR(2) process.

[H,W] = freqz(1,A);
plot(W,20*log10(abs(H)),'linewidth',2); grid on;
axis tight;
xlabel('Radians/sample'); ylabel('dB');

The AR(2) process acts like a highpass filter in this case.

Graphically examine the correlation in x by producing scatter plots of X(1) vs X(n) for n = 2, 3, 4, 5.

x12 = x(1:end-1);
x21 = x(2:end);
x13 = x(1:end-2);
x31 = x(3:end);
x14 = x(1:end-3);
x41 = x(4:end);
x15 = x(1:end-4);
x51 = x(5:end);
subplot(2,2,1)
plot(x12,x21,'b*');
xlabel('X_1'); ylabel('X_2');
subplot(2,2,2)
plot(x13,x31,'b*');
xlabel('X_1'); ylabel('X_3');
subplot(2,2,3)
plot(x14,x41,'b*');
xlabel('X_1'); ylabel('X_4');
subplot(2,2,4)
plot(x15,x51,'b*');
xlabel('X_1'); ylabel('X_5');

In the scatter plot, you see there is a linear relationship between X(1),X(2) and X(1),X(3), but not between X(1) and X(4) or X(5).

The points in the top row scatter plots fall approximately on a line with a negative slope in the top left panel and positive slope in the top right panel. The scatter plots in the bottom two panels do not show any apparent linear relationship.

The negative correlation between X(1) and X(2) and positive correlation between X(1) and X(3) are explained by the fact that the AR(2) process in this example acts like a highpass filter.

Find the sample autocorrelation sequence out to lag 50 and plot the result.

[xc,lags] = xcorr(x,50,'coeff');
stem(lags(51:end),xc(51:end),'markerfacecolor',[0 0 1])
xlabel('Lag'); ylabel('ACF');
title('Sample Autocorrelation Sequence');

The sample autocorrelation sequence shows a negative value at lag 1 and positive value at lag 2. Based on the scatter plot, this is the expected result. However, you cannot determine from the sample autocorrelation sequence what order is appropriate for the AR model.

Fit an AR(15) model using aryule. Return the reflection coefficients. The negative of the reflection coefficients is the partial autocorrelation sequence.

[arcoefs,E,K] = aryule(x,15);

Plot the partial autocorrelation sequence along with the large-sample 95% confidence intervals. If the data are generated by an autoregressive process of order p, the values of the sample partial autocorrelation sequence for lags greater than p follow a N(0,1/N) distribution where N is the length of the time series.

pacf = -K;
lag = 1:15;
stem(lag,pacf,'markerfacecolor',[0 0 1]);
xlabel('Lag'); ylabel('Partial Autocorrelation');
set(gca,'xtick',1:1:15)
lconf = -1.96/sqrt(1000)*ones(length(lag),1);
uconf = 1.96/sqrt(1000)*ones(length(lag),1);
hold on;
line(lag,lconf,'color',[1 0 0]);
line(lag,uconf,'color',[1 0 0]);

The only values of the partial autocorrelation sequence outside the 95% confidence bounds occur at lags 1 and 2. This indicates that the correct model order for the AR process is 2. In this example, you generated the time series to simulate an AR(2) process, so the partial autocorrelation sequence only confirms the result. In practice, you have only the observed time series without any a priori information about model order. In a realistic scenario, the partial autocorrelation is an important tool for appropriate model order selection in stationary autoregressive time series.

Was this topic helpful?