Documentation Center

  • Trial Software
  • Product Updates

forecast

Class: ssm

Forecast states and observations of state-space models

Syntax

  • [Y,YMSE] = forecast(Mdl,numPeriods,Y0) example
  • [Y,YMSE] = forecast(Mdl,numPeriods,Y0,Name,Value) example
  • [Y,YMSE,X,XMSE] = forecast(___) example

Description

example

[Y,YMSE] = forecast(Mdl,numPeriods,Y0) forecasts the state-space model Mdl using a numPeriods forecast horizon and in-sample observations Y0. Y contains the forecasted observations and YMSE contains the corresponding variances of the forecasted observations. If Mdl is time invariant, then the output arguments are matrices. Otherwise, the output arguments are cell vectors of numeric vectors.

example

[Y,YMSE] = forecast(Mdl,numPeriods,Y0,Name,Value) forecasts the state-space model Mdl with additional options specified by one or more Name,Value pair arguments.

For example, for state-space models that include a linear regression component in the observation model, include in-sample predictor data, predictor data for the forecast horizon, and the regression coefficient.

example

[Y,YMSE,X,XMSE] = forecast(___) additionally returns the state forecasts X and the variances of the state forecasts XMSE using any of the input arguments in the previous syntaxes.

Input Arguments

expand all

Mdl — State-space modelssm model

State-space model, specified as a model returned by ssm or estimate.

If Mdl is not fully specified (that is, Mdl contains unknown parameters), then you must set scalar values to the unknown parameters using the Name,Value pair argument Params.

numPeriods — Forecast horizonpositive integer

Forecast horizon, specified as a positive integer. That is, the software returns 1,..,numPeriods forecasts.

Data Types: double

Y0 — In-sample, observed responsescell vector of numeric vectors | matrix

In-sample, observed responses, specified as a cell vector of numeric vectors or a matrix.

  • If Mdl is time invariant, then Y0 is a T-by-n matrix, where each row corresponds to a period and each column corresponds to a particular observation in the model. Therefore, T is the sample size and m is the number of observations per period. The last row of Y contains the latest observations.

  • If Mdl is time varying with respect to the observation equation, then Y is a T-by-1 cell vector. Each element of the cell vector corresponds to a period and contains an nt-dimensional vector of observations for that period. The corresponding dimensions of the coefficient matrices in Mdl.C{t} and Mdl.D{t} must be consistent with the matrix in Y{t} for all periods. The last cell of Y contains the latest observations.

If Mdl is an estimated state-space model (that is, returned by estimate), then it is best practice to set Y0 to the same data set that you used to fit Mdl.

Data Types: double | cell

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'A' — Forecast-horizon, state-transition, coefficient matricescell vector of matrices

Forecast-horizon, state-transition, coefficient matrices, specified as the comma-separated pair consisting of 'A' and a cell vector of matrices.

A must contain at least numPeriods cells. Each cell must contain a matrix specifying how the states transition in the forecast horizon. If the length of A is greater than numPeriods, then the software uses the first numPeriods cells. The last cell indicates the latest period in the forecast horizon.

The matrices in A cannot contain NaN values.

If Mdl is time invariant with respect to the states, then each cell of A must contain an m-by-m matrix, where m is the number of the in-sample states per period. By default, the software uses Mdl.A throughout the forecast horizon.

If Mdl is time varying with respect to the states, then the dimensions of the matrices in the cells of A may vary, but the dimensions of each matrix must be consistent with the matrices in B and C in the corresponding periods. By default, the software uses Mdl.A{end} throughout the forecast horizon.

Data Types: cell

'B' — Forecast-horizon, state-disturbance-loading, coefficient matricescell vector of matrices

Forecast-horizon, state-disturbance-loading, coefficient matrices, specified as the comma-separated pair consisting of 'B' and a cell vector of matrices.

B must contain at least numPeriods cells. Each cell must contain a matrix specifying how the states transition in the forecast horizon. If the length of B is greater than numPeriods, then the software uses the first numPeriods cells. The last cell indicates the latest period in the forecast horizon.

The matrices in B cannot contain NaN values.

If Mdl is time invariant with respect to the states and state disturbances, then each cell of B must contain an m-by-k matrix, where m is the number of the in-sample states per period, and k is the number of in-sample, state disturbances per period. By default, the software uses Mdl.B throughout the forecast horizon.

If Mdl is time varying, then the dimensions of the matrices in the cells of B may vary, but the dimensions of each matrix must be consistent with the matrices in A in the corresponding periods. By default, the software uses Mdl.B{end} throughout the forecast horizon.

Data Types: cell

'C' — Forecast-horizon, measurement-sensitivity, coefficient matricescell vector of matrices

Forecast-horizon, measurement-sensitivity, coefficient matrices, specified as the comma-separated pair consisting of 'C' and a cell vector of matrices.

C must contain at least numPeriods cells. Each cell must contain a matrix specifying how the states transition in the forecast horizon. If the length of C is greater than numPeriods, then the software uses the first numPeriods cells. The last cell indicates the latest period in the forecast horizon.

The matrices in C cannot contain NaN values.

If Mdl is time invariant with respect to the states and the observations, then each cell of C must contain an n-by-m matrix, where n is the number of the in-sample observations per period, and m is the number of in-sample states per period. By default, the software uses Mdl.C throughout the forecast horizon.

If Mdl is time varying with respect to the states or the observations, then the dimensions of the matrices in the cells of C may vary, but the dimensions of each matrix must be consistent with the matrices in A and D in the corresponding periods. By default, the software uses Mdl.C{end} throughout the forecast horizon.

Data Types: cell

'D' — Forecast-horizon, observation-innovation, coefficient matricescell vector of matrices

Forecast-horizon, observation-innovation, coefficient matrices, specified as the comma-separated pair consisting of 'D' and a cell vector of matrices.

D must contain at least numPeriods cells. Each cell must contain a matrix specifying how the states transition in the forecast horizon. If the length of D is greater than numPeriods, then the software uses the first numPeriods cells. The last cell indicates the latest period in the forecast horizon.

The matrices in D cannot contain NaN values.

If Mdl is time invariant with respect to the observations and the observation innovations, then each cell of D must contain an n-by-h matrix, where n is the number of the in-sample observations per period, and h is the number of in-sample, observation innovations per period. By default, the software uses Mdl.D throughout the forecast horizon.

If Mdl is time varying with respect to the observations or the observation innovations, then the dimensions of the matrices in the cells of D may vary, but the dimensions of each matrix must be consistent with the matrices in C in the corresponding periods. By default, the software uses Mdl.D{end} throughout the forecast horizon.

Data Types: cell

'Beta' — Regression coefficients[] (default) | matrix

Regression coefficients corresponding to predictor variables, specified as the comma-separated pair consisting of 'Beta' and a matrix.

In general, each column of Beta corresponds to an observation in the state-space model, and each row corresponds to a predictor. Therefore, Beta must have as many columns as the dimension of the observation model, and rows equal to the number of predictors.

If you specify Beta, then you must also specify Predictors0 and PredictorsF.

'Predictors0' — In-sample, predictor variables in state-space model observation equation[] (default) | matrix

In-sample, predictor variables in the state-space model observation equation, specified as the comma-separated pair consisting of 'Predictors0' and a matrix. The columns of Predictors0 correspond to individual predictor variables. Predictors0 must have T rows, where row t corresponds to the observed predictors at period t (Zt). The expanded observation equation is

that is, the software deflates the observations using the regression component. β is the time-invariant vector of regression coefficients that the software estimates with all other parameters.

If there are n observations per period, then the software regresses all predictor series onto each observation.

If you specify Predictors0, then Mdl must be time invariant. Otherwise, the software returns an error.

If you specify Predictors0, then you must also specify Beta and PredictorsF.

If Mdl is an estimated state-space model (that is, returned by estimate), then it is best practice to set Predictors0 to the same predictor data set that you used to fit Mdl.

By default, the software excludes a regression component from the state-space model.

Data Types: double

'PredictorsF' — Forecast-horizon, predictor variables in state-space model observation equation[] (default) | matrix

Forecast-horizon, predictor variables in the state-space model observation equation, specified as the comma-separated pair consisting of 'PredictorsF' and a matrix. The columns of PredictorsF correspond to individual predictor variables. PredictorsF must have numPeriods rows, where row t corresponds to the observed predictors at period t in the forecast horizon (Zt). The expanded observation equation is

that is, the software deflates the observations using the regression component. β is the time-invariant vector of regression coefficients that the software estimates with all other parameters.

If there are n observations per period, then the software regresses all predictor series onto each observation.

If you specify PredictorsF, then Mdl must be time invariant. Otherwise, the software returns an error.

If you specify PredictorsF, then you must also specify Beta and Predictors0.

By default, the software excludes a regression component from the state-space model.

Data Types: double

Output Arguments

expand all

Y — Forecasted observationsmatrix | cell vector of numeric vectors

Forecasted observations, returned as a matrix or a cell vector of numeric vectors.

If Mdl is a time-invariant, state-space model with respect to the observations, then Y is a numPeriods-by-n matrix.

If Mdl is a time-varying, state-space model with respect to the observations, then Y is a numPeriods-by-1 cell vector of numeric vectors. Cell t of Y contains an nt-by-1 numeric vector of forecasted observations for period t.

Data Types: cell | double

YMSE — Error variances of forecasted observationsmatrix | cell vector of numeric vectors

Error variances of forecasted observations, returned as a matrix or a cell vector of numeric vectors.

If Mdl is a time-invariant, state-space model with respect to the observations, then YMSE is a numPeriods-by-n matrix.

If Mdl is a time-varying, state-space model with respect to the observations, then YMSE is a numPeriods-by-1 cell vector of numeric vectors. Cell t of YMSE contains an nt-by-1 numeric vector of error variances for the corresponding forecasted observations for period t.

Data Types: cell | double

X — State forecastsmatrix | cell vector of numeric vectors

State forecasts, returned as a matrix or a cell vector of numeric vectors.

If Mdl is a time-invariant, state-space model with respect to the states, then X is a numPeriods-by-m matrix.

If Mdl is a time-varying, state-space model with respect to the states, then X is a numPeriods-by-1 cell vector of numeric vectors. Cell t of X contains an mt-by-1 numeric vector of forecasted observations for period t.

Data Types: cell | double

XMSE — Error variances of state forecastsmatrix | cell vector of numeric vectors

Error variances of state forecasts, returned as a matrix or a cell vector of numeric vectors.

If Mdl is a time-invariant, state-space model with respect to the states, then XMSE is a numPeriods-by-m matrix.

If Mdl is a time-varying, state-space model with respect to the states, then XMSE is a numPeriods-by-1 cell vector of numeric vectors. Cell t of XMSE contains an mt-by-1 numeric vector of error variances for the corresponding forecasted observations for period t.

Data Types: cell | double

Definitions

Forecasted Observations

s-step-ahead, forecasted observations are estimates of the observations at period t using all information (for example, the observed responses) up to period ts.

The nt-by-1 vector of 1-step-ahead, forecasted observations at period t is . The estimated vector of forecasted observations is

where is the mt-by-1 estimated vector of state forecasts at period t.

At period t, the 1-step-ahead, forecasted observations have variance-covariance matrix

where is the estimated variance-covariance matrix of the state forecasts at period t, given all information up to period t – 1.

In general, the s-step-ahead, vector of state forecasts is . The s-step-ahead, forecasted observation vector is

State Forecasts

s-step-ahead, state forecasts are estimates of the states at period t using all information (for example, the observed responses) up to period ts.

The mt-by-1 vector of 1-step-ahead, state forecasts at period t is . The estimated vector of state forecasts is

where is the mt – 1-by-1 filtered state vector at period t – 1.

At period t, the 1-step-ahead, state forecasts have the variance-covariance matrix

where is the estimated variance-covariance matrix of the filtered states at period t – 1, given all information up to period t – 1.

The corresponding 1-step-ahead forecasted observation is , and its variance-covariance matrix is

In general, the s-step-ahead, forecasted state vector is . The s-step-ahead, vector of state forecasts is

and the s-step-ahead, forecasted observation vector is

State-Space Model

A state-space model is a discrete-time, stochastic model that contains two sets of equations: one describing how a latent process transitions in time (the state equation), and another describing how an observer measures the latent process at each period (the observation equation).

Symbolically, you can write a linear, multivariate, Gaussian state-space model using the following system of equations

for t = 1,...,T.

  • , which is an mt-dimensional state vector describing the dynamics of some, possibly unobservable, phenomenon at period t.

  • , which is an nt-dimensional observation vector describing how the states are measured by observers at period t.

  • At is the mt-by-mt – 1 state-transition matrix describing how the states at time t transition to the states at period t – 1.

  • Bt is the mt-by-kt state-disturbance-loading matrix describing how the states at period t combine with the innovations at period t.

  • Ct is the nt-by-mt measurement-sensitivity matrix describing how the observations at period t relate to the states at period t.

  • Dt is the nt-by-ht observation-innovation matrix describing how the observations at period t combine with the observation errors at period t.

  • The matrices At, Bt, Ct, and Dt are referred to as coefficient matrices, and might contain unknown parameters.

  • , which is a kt-dimensional, Gaussian, white-noise, unit-variance vector of state disturbances at period t.

  • , which is an ht-dimensional, Gaussian, white-noise, unit-variance vector of observation innovations at period t.

  • εt and ut are uncorrelated.

To write a time-invariant state-space model, drop the t subscripts of all coefficient matrices and dimensions.

Time-Invariant Model

In a time-invariant state-space model:

  • The coefficient matrices are equivalent for all periods.

  • The number of states, state disturbances, observations, and observation innovations are the same for all periods.

For example, for all t, the following system of equations

represents a time-invariant state-space model.

Time-Varying Model

In a time-varying state-space model:

  • The coefficient matrices might change from period to period.

  • The number of states, state disturbances, observations, and observation innovations might change from period to period. For example, this might happen if there is a regime shift or one of the states or observations cannot be measured during the sampling time frame. Also, you can model seasonality using time-varying models.

To illustrate a regime shift, suppose, for t = 1,..,10

for t = 11

and for t = 1,..,T

There are three sets of state transition matrices, whereas there are only two sets of the other coefficient matrices.

Examples

expand all

Forecast Observations of a Known, Time-Invariant, State-Space Model

Suppose that a latent process is an AR(1). Subsequently, the state equation is

$$x_t = 0.5x_{t-1} + u_t,$$

where $u_t$ is Gaussian with mean 0 and standard deviation 1.

Genrate a random series of 100 observations from $x_t$ , assuming that the series starts at 1.5.

T = 100;
ARMdl = arima('AR',0.5,'Constant',0,'Variance',1);
x0 = 1.5;
rng(1); % For reproducibility
x = simulate(ARMdl,T,'Y0',x0);

Suppose further that the latent process is subject to additive measurement error. Subsequently, the observation equation is

$$y_t = x_t + \varepsilon_t,$$

where $\varepsilon_t$ is Gaussian with mean 0 and standard deviation 0.75. Together, the latent process and observation equations compose a state-space model.

Use the random latent state process (x) and the observation equation to generate observations.

y = x + 0.75*randn(T,1);

Specify the four coefficient matrices.

A = 0.5;
B = 1;
C = 1;
D = 0.75;

Specify the state-space model using the coefficient matrices.

Mdl = ssm(A,B,C,D)
Mdl = 


State vector length: 1
Observation vector length: 1
State disturbance vector length: 1
Observation innovation vector length: 1
Sample size supported by model: Unlimited

State variables: x1, x2,...
State disturbances: u1, u2,...
Observation series: y1, y2,...
Observation innovations: e1, e2,...

State equation:
x1(t) = (0.50)x1(t-1) + u1(t)

Observation equation:
y1(t) = x1(t) + (0.75)e1(t)

Initial state distribution:

Initial state means
 x1 
  0 

Initial state covariance matrix
     x1   
 x1  1.33 

State types
     x1     
 Stationary 

Mdl is an ssm model. Verify that the model is correctly specified using the display in the Command Window. The software infers that the state process is stationary. Subsequently, the software sets the initial state mean and covariance to the mean and variance of the stationary distribution of an AR(1) model.

Forecast the observations 10 periods into the future, and estimate their variances.

numPeriods = 10;
[ForecastedY,YMSE] = forecast(Mdl,numPeriods,y);

Plot the forecasts with the in-sample responses, and 95% Wald-type forecast intervals.

ForecastIntervals(:,1) = ForecastedY - 1.96*sqrt(YMSE);
ForecastIntervals(:,2) = ForecastedY + 1.96*sqrt(YMSE);

figure
plot(T-20:T,y(T-20:T),'-k',T+1:T+numPeriods,ForecastedY,'-.r',...
    T+1:T+numPeriods,ForecastIntervals,'-.b',...
    T:T+1,[y(end)*ones(3,1),[ForecastedY(1);ForecastIntervals(1,:)']],':k',...
    'LineWidth',2)
hold on
title({'Observed Responses and Their Forecasts'})
xlabel('Period')
ylabel('Responses')
legend({'Observations','Forecasted observations','95% forecast intervals'},...
    'Location','Best')
hold off

Forecast Observations of a State-Space Model That Includes a Regression Component

Suppose that the linear relationship between the change in the unemployment rate and the nominal gross national product (nGNP) growth rate is of interest. Suppose further that the first difference of the unemployment rate is an ARMA(1,1) series. Symbolically, and in state-space form, the model is

$$\begin{array}{l}
\left[ {\begin{array}{*{20}{c}}
{{x_{1,t}}}\\
{{x_{2,t}}}
\end{array}} \right] = \left[ {\begin{array}{*{20}{c}}
\phi &\theta \\
0&0
\end{array}} \right]\left[ {\begin{array}{*{20}{c}}
{{x_{1,t - 1}}}\\
{{x_{2,t - 1}}}
\end{array}} \right] + \left[ {\begin{array}{*{20}{c}}
1\\
1
\end{array}} \right]u_{1,t}\\
{y_t} - \beta {Z_t} = {x_{1,t}} + \sigma\varepsilon_t,
\end{array}$$

where:

  • $x_{1,t}$ is the change in the unempoyment rate at time t.

  • $x_{2,t}$ is a dummy state for the MA(1) effect.

  • $y_{1,t}$ is the observed change in the unemployment rate being deflated by the growth rate of nGNP ( $Z_t$ ).

  • $u_{1,t}$ is the Gaussian series of state disturbances having mean 0 and standard deviation 1.

  • $\varepsilon_t$ is the Gaussian series of observation innovations having mean 0 and standard deviation $\sigma$ .

Load the Nelson-Plosser data set, which contains the unemployment rate and nGNP series, among other things.

load Data_NelsonPlosser

Preprocess the data by taking the natural logarithm of the nGNP series, and the first difference of each series. Also, remove the starting NaN values from each series.

gnpn = Dataset.GNPN(~isnan(Dataset.GNPN)); % Remove NaNs
u = Dataset.UR(~isnan(Dataset.GNPN));      % The effective nGNP series is shorter
T = size(gnpn,1);                          % Sample size
Z = [ones(T-1,1) diff(log(gnpn))];
y = diff(u);

Though this example removes missing values, the software can accommodate series containing missing values in the Kalman filter framework.

To determine how well the model forecasts observations, remove the last 10 observations for comparison.

numPeriods = 10;                   % Forecast horizon
isY = y(1:end-numPeriods);         % In-sample observations
oosY = y(end-numPeriods+1:end);    % Out-of-sample observations
ISZ = Z(1:end-numPeriods,:);       % In-sample predictors
OOSZ = Z(end-numPeriods+1:end,:);  % Out-of-sample predictors

Specify the coefficient matrices.

A = [NaN NaN; 0 0];
B = [1; 1];
C = [1 0];
D = NaN;

Specify the state-space model using ssm.

Mdl = ssm(A,B,C,D);

Estimate the model parameters, and use a random set of initial parameter values for optimization. Specify the regression component and its initial value for optimization using the 'Predictors' and 'Beta0' name-value pair arguments, respectively. Restrict the estimate of $\sigma$ to all positive, real numbers. For numerical stability, specify the Hessian when the software computes the parameter covariance matrix, using the 'CovMethod' name-value pair argument.

rng(1);
params0 = rand(3,1);
params0(3) = abs(params0(3));
[EstMdl,estParams] = estimate(Mdl,isY,params0,'Predictors',ISZ,...
    'Beta0',randn(2,1),'lb',[-Inf,-Inf,0,-Inf,-Inf],'CovMethod','hessian');
Method: Maximum likelihood (fmincon)
Sample size: 51
Logarithmic  likelihood:     -87.2409
Akaike   info criterion:      180.482
Bayesian info criterion:      186.277
           |      Coeff       Std Err    t Stat     Prob  
----------------------------------------------------------
 c(1)      |  -0.31780       0.19429    -1.63571  0.10190 
 c(2)      |   1.21242       0.48882     2.48031  0.01313 
 c(3)      |   0.45583       0.63931     0.71301  0.47584 
 y <- z(1) |   1.32407       0.26313     5.03201   0      
 y <- z(2) | -24.48732       1.90115   -12.88024   0      
           |                                              
           |    Final State   Std Dev     t Stat    Prob  
 x(1)      |  -0.38117       0.42842    -0.88971  0.37363 
 x(2)      |   0.23402       0.66222     0.35339  0.72380 

EstMdl is an ssm model, and you can access its properties using dot notation.

Forecast observations over the forecast horizon. EstMdl does not store the data set, so you must pass it in appropriate name-value pair arguments.

[fY,yMSE] = forecast(EstMdl,numPeriods,isY,'Predictors0',ISZ,...
    'PredictorsF',OOSZ,'Beta',estParams(end-1:end));

fY is a 10-by-1 vector containing the forecasted observations, and yMSE is a 10-by-1 vector containing the variances of the forecasted observations.

Obtain 95% Wald-type forecast intervals. Plot the forecasted observations with their true values and the forecast intervals.

ForecastIntervals(:,1) = fY - 1.96*sqrt(yMSE);
ForecastIntervals(:,2) = fY + 1.96*sqrt(yMSE);

figure
h = plot(dates(end-numPeriods-9:end-numPeriods),isY(end-9:end),'-k',...
    dates(end-numPeriods+1:end),oosY,'-k',...
    dates(end-numPeriods+1:end),fY,'--r',...
    dates(end-numPeriods+1:end),ForecastIntervals,':b',...
    dates(end-numPeriods:end-numPeriods+1),...
    [isY(end)*ones(3,1),[oosY(1);ForecastIntervals(1,:)']],':k',...
    'LineWidth',2);
xlabel('Period')
ylabel('Change in the unemployment rate')
legend(h([1,3,4]),{'Observations','Forecasted responses',...
    '95% forecast intervals'})
title('Observed and Forecasted Changes in the Unemployment Rate')

This model seems to forecast the changes in the unemployment rate well.

Forecast States and Observations of a Fitted, Time-Varying, State-Space Model

This example generates data from a known model, fits a state-space model to the data, and then forecasts states and observations states.

Suppose that a set of latent processes is comprised of an AR(2) and an MA(1) model. There are 50 periods, and the MA(1) process drops out of the model for the final 25 periods. Subsequently, the state equation for the first 25 periods is

and for the last 25 periods, it is

where u1,t and u2,t are Gaussian with mean 0 and standard deviation 1.

Generate a random series of 50 observations from x1,t and x2,t, assuming that the series starts at 1.5 and 1, respectively.

T = 50;
ARMdl = arima('AR',{0.7,-0.2},'Constant',0,'Variance',1);
MAMdl = arima('MA',0.6,'Constant',0,'Variance',1);
x0 = [1.5 1; 1.5 1];
rng(1);
x = [simulate(ARMdl,T,'Y0',x0(:,1)),...
    [simulate(MAMdl,T/2,'Y0',x0(:,2));nan(T/2,1)]];

The last 25 values for the simulated MA(1) data are NaNs.

Suppose further that the latent processes are measured using

for the first 25 periods, and

for the last 25 periods, where εt is Gaussian with mean 0 and standard deviation 1.

Use the random latent state process (x) and the observation equation to generate observations.

y = 2*nansum(x')'+randn(T,1);

Together, the latent process and observation equations compose a state-space model. Supposing that the coefficients are unknown parameters, the state-space model is

for the first 25 periods,

for period 26, and

for the last 24 periods.

Create the function Ar2MAParamMap.m, which specifies how the parameters in params map to the state-space model matrices, the initial state values, and the type of state.

function [A,B,C,D,Mean0,Cov0,StateType] = AR2MAParamMap(params,T)
%AR2MAParamMap Time-variant state-space model parameter mapping function
%
% This function maps the vector params to the state-space matrices (A, B,
% C, and D), the initial state value and the initial state variance (Mean0
% and Cov0), and the type of state (StateType). From periods 1 to T/2, the
% state model is an AR(2) and an MA(1) model, and the observation model is
% the sum of the two states. From periods T/2 + 1 to T, the state model is
% just the AR(2) model.
    A1 = {[params(1) params(2) 0 0; 1 0 0 0; 0 0 0 params(3); 0 0 0 0]};
    B1 = {[1 0; 0 0; 0 1; 0 1]};
    C1 = {params(4)*[1 0 1 0]};
    Mean0 = ones(4,1);
    Cov0 = 10*eye(4);
    StateType = [0 0 0 0];
    A2 = {[params(1) params(2) 0 0; 1 0 0 0]};
    B2 = {[1; 0]};
    A3 = {[params(1) params(2); 1 0]};
    B3 = {[1; 0]};
    C3 = {params(5)*[1 0]};
    A = [repmat(A1,T/2,1);A2;repmat(A3,(T-2)/2,1)];
    B = [repmat(B1,T/2,1);B2;repmat(B3,(T-2)/2,1)];
    C = [repmat(C1,T/2,1);repmat(C3,T/2,1)];
    D = 1;
end

Assume that the state and observation models do not change for the next five periods.

Specify the state-space model by passing the function AR2MAParamMap as a function handle to ssm.

Mdl = ssm(@(params)AR2MAParamMap(params,T));

The software implicitly defines the state-space model. In most cases, you cannot verify state-space models that you implicitly define.

Pass the observed responses (y) to estimate to estimate the parameters. Set positive, random initial values for the unknown parameters.

params0 = rand(5,1);
EstMdl = estimate(Mdl,y,params0);
Method: Maximum likelihood (fminunc)
Sample size: 50
Logarithmic  likelihood:     -114.957
Akaike   info criterion:      239.913
Bayesian info criterion:      249.473
      |     Coeff       Std Err   t Stat     Prob  
---------------------------------------------------
 c(1) |  0.47870       0.26634    1.79733  0.07229 
 c(2) |  0.00809       0.27179    0.02976  0.97626 
 c(3) |  0.55735       0.80958    0.68844  0.49118 
 c(4) |  1.62679       0.41622    3.90849  0.00009 
 c(5) |  1.90022       0.49564    3.83390  0.00013 
      |                                            
      |   Final State   Std Dev    t Stat    Prob  
 x(1) | -0.81229       0.46815   -1.73511  0.08272 
 x(2) | -0.31449       0.45917   -0.68490  0.49341 

EstMdl is an ssm model containing the estimated coefficients. Likelihood surfaces of state-space models might contain local maxima. Therefore, it is good practice to try several initial parameter values, or consider using refine.

Forecast observations and states five periods into the future. Also, obtain measures of variability for the forecasts.

numPeriods = 5;
[fY,yMSE,FX,XMSE]= forecast(EstMdl,numPeriods,y);

forecast uses EstMdl.A{end}, ..., EstMdl.D{end} to forecast the state-space model. fY and yMSE are numPeriods-by-1 numeric vectors of forecasted observations and variances of the forecasted observations, respectively. FX and XMSE are numPeriods-by-2 matrices of state forecasts and variances of the state forecasts. The columns indicate the state, and the rows indicate the period. For all output arguments, the last row corresponds to the latest forecast.

Plot the observations, true states, forecasted observations, and state forecasts.

figure
plot(T-10:T,x(T-10:T,1),'-k',T+1:T+numPeriods,FX(:,1),'-r',...
    T-10:T,y(T-10:T),'--g',T+1:T+numPeriods,fY,'--b',...
    T:T+1,[y(T),fY(1);x(T,1),FX(1,1)]',':k','LineWidth',2);
xlabel('Period')
ylabel('States and Observations')
legend({'True state values','State forecasts',...
    'Observed responses','Forecasted responses'});

Forecast a Time-Varying, State-Space Model with a Regime Change in the Forecast Horizon

Suppose that you observed a multivariate process for 75 periods, and you want to forecast the process 25 periods into the future. Also, suppose that you can specify the process as a state-space model. For periods 1 through 50, the state-space model has one state: a stationary AR(2) model with a constant term. At period 51, the state-space model includes a random walk. The states are observed unbiasedly, but with additive measurement error. Symbolically, the model is

$$\begin{array}{l}&#xA;\left[ {\begin{array}{*{20}{c}}&#xA;{{x_{1,t}}}\\&#xA;{{x_{2,t}}}\\&#xA;{{x_{3,t}}}\\&#xA;{{x_{4,t}}}&#xA;\end{array}} \right] = \left[ {\begin{array}{*{20}{c}}&#xA;{0.5}&{ - 0.2}&amp;1&amp;0\\&#xA;1&amp;0&amp;0&amp;0\\&#xA;0&amp;0&amp;1&amp;0\\&#xA;0&amp;0&amp;0&amp;1&#xA;\end{array}} \right]\left[ {\begin{array}{*{20}{c}}&#xA;{{x_{1,t - 1}}}\\&#xA;{{x_{2,t - 1}}}\\&#xA;{{x_{3,t - 1}}}\\&#xA;{{x_{4,t - 1}}}&#xA;\end{array}} \right] + \left[ {\begin{array}{*{20}{c}}&#xA;{0.1}&amp;0\\&#xA;0&amp;0\\&#xA;0&amp;0\\&#xA;0&{0.5}&#xA;\end{array}} \right]\left[ {\begin{array}{*{20}{c}}&#xA;{{u_{1,t}}}\\&#xA;{{u_{2,t}}}\\&#xA;{{u_{3,t}}}\\&#xA;{{u_{4,t}}}&#xA;\end{array}} \right]\\&#xA;{y_t} = \left[ {\begin{array}{*{20}{c}}&#xA;1&amp;0&amp;0&amp;0\\&#xA;0&amp;0&amp;0&amp;1&#xA;\end{array}} \right]\left[ {\begin{array}{*{20}{c}}&#xA;{{x_{1,t}}}\\&#xA;{{x_{2,t}}}\\&#xA;{{x_{3,t}}}\\&#xA;{{x_{4,t}}}&#xA;\end{array}} \right] + \left[ {\begin{array}{*{20}{c}}&#xA;{0.3}&amp;0\\&#xA;0&{0.2}&#xA;\end{array}} \right]\left[ {\begin{array}{*{20}{c}}&#xA;{{\varepsilon _{1,t}}}\\&#xA;{{\varepsilon _{2,t}}}&#xA;\end{array}} \right]&#xA;\end{array}.$$

For periods 1 through 50, the random walk process is not in the model.

Specify the in-sample, coefficient matrices.

A1 = {[0.5 0.2 1; 1 0 0; 0 0 1]};                % A for periods 1 - 50
A2 = {[0.5 0.2 1; 1 0 0; 0 0 1; 0 0 0]};         % A for period 51
A3 = {[0.5 0.2 1 0; 1 0 0 0; 0 0 1 0; 0 0 0 1]}; % A for periods 51 - 75
A = [repmat(A1,50,1); A2; repmat(A3,24,1)];

B1 = {[0.1; 0; 0]};              % B for periods 1 - 50
B3 = {[0.1 0; 0 0; 0 0; 0 0.5]}; % B for periods 51 - 75
B = [repmat(B1,50,1); repmat(B3,25,1)];

C1 = {[1 0 0]};            % C for periods 1 - 50
C3 = {[1 0 0 0; 0 0 0 1]}; % C for periods 51 - 75
C = [repmat(C1,50,1); repmat(C3,25,1)];

D1 = {0.3};            % D for periods 1 - 50
D3 = {[0.3 0; 0 0.2]}; % D for periods 51 - 75
D = [repmat(D1,50,1); repmat(D3,25,1)];

Specify the state space model, and the initial state means and covariance matrix. It is best practice to specify the types of each state using the 'StateType' name-value pair argument. Only specify the initial state parameters for the three states that start the state-space model.

Mean0 = [1/(1-0.5-0.2); 1/(1-0.5-0.2); 1];
Cov0 = [0.02 0.01 0; 0.01 0.02 0; 0 0 0];
StateType = [0; 0; 1];
Mdl = ssm(A,B,C,D,'Mean0',Mean0,'Cov0',Cov0,'StateType',StateType)
Mdl = 


State vector length: Time-varying
Observation vector length: Time-varying
State disturbance vector length: Time-varying
Observation innovation vector length: Time-varying
Sample size supported by model: 75

State variables: x1, x2,...
State disturbances: u1, u2,...
Observation series: y1, y2,...
Observation innovations: e1, e2,...

State equations of period 1, 2, 3,..., 50:
x1(t) = (0.50)x1(t-1) + (0.20)x2(t-1) + x3(t-1) + (0.10)u1(t)
x2(t) = x1(t-1)
x3(t) = x3(t-1)

State equations of period 51:
x1(t) = (0.50)x1(t-1) + (0.20)x2(t-1) + x3(t-1) + (0.10)u1(t)
x2(t) = x1(t-1)
x3(t) = x3(t-1)
x4(t) = (0.50)u2(t)

State equations of period 52, 53, 54,..., 75:
x1(t) = (0.50)x1(t-1) + (0.20)x2(t-1) + x3(t-1) + (0.10)u1(t)
x2(t) = x1(t-1)
x3(t) = x3(t-1)
x4(t) = x4(t-1) + (0.50)u2(t)


Observation equation of period 1, 2, 3,..., 50:
y1(t) = x1(t) + (0.30)e1(t)

Observation equations of period 51, 52, 53,..., 75:
y1(t) = x1(t) + (0.30)e1(t)
y2(t) = x4(t) + (0.20)e2(t)


Initial state distribution:

Initial state means
  x1    x2   x3 
 3.33  3.33   1 

Initial state covariance matrix
     x1    x2    x3 
 x1  0.02  0.01   0 
 x2  0.01  0.02   0 
 x3   0     0     0 

State types
     x1          x2         x3    
 Stationary  Stationary  Constant 

Mdl is a time-varying, ssm model without unknown parameters. The software sets initial state means and covariane values based on the type of state.

Simulate 75 observations from Mdl.

rng(1); % For reproducibility
Y = simulate(Mdl,75);

y is a 75-by-1 cell vector. Cells 1 through 50 contain scalars, and cells 51 through 75 contain 2-by-1 numeric vectors. Cell j corresponds to the observations of period j, specified by the observation model.

Plot the simulated responses.

y1 = cell2mat(Y(51:75));              % Observations for periods 1 - 50
d1 = cell2mat(Y(51:75));
y2 = [d1(((1:25)*2)-1) d1((1:25)*2)]; % Observations for periods 51 - 75

figure
plot(1:75,[y1;y2(:,1)],'-k',1:75,[nan(50,1);y2(:,2)],'-r','LineWidth',2')
title('In-sample Observations')
ylabel('Observations')
xlabel('Period')
legend({'AR(2)','Random walk'})

Suppose that the random walk process drops out of the state space in the 20th period of the forecast horizon.

Specify the coefficient matrices for the forecast period.

A4 = {[0.5 0.2 1 0; 1 0 0 0; 0 0 1 0; 0 0 0 1]};  % A for periods 76 - 95
A5 = {[0.5 0.2 1 0; 1 0 0 0; 0 0 1 0]};           % A for period 96
A6 = {[0.5 0.2 1; 1 0 0; 0 0 1]};                 % A for periods 97 - 100
fhA = [repmat(A4,20,1); A5; repmat(A6,4,1)];

B4 = {[0.1 0; 0 0; 0 0; 0 0.5]}; % B for periods 76 - 95
B6 = {[0.1; 0; 0]};              % B for periods 96 - 100
fhB = [repmat(B4,20,1); repmat(B6,5,1)];

C4 = {[1 0 0 0; 0 0 0 1]}; % C for periods 76 - 95
C6 = {[1 0 0]};            % C for periods 96 - 100
fhC = [repmat(C4,20,1); repmat(C6,5,1)];

D4 = {[0.3 0; 0 0.2]}; % D for periods 76 - 95
D6 = {0.3};            % D for periods 96 - 100
fhD = [repmat(D4,20,1); repmat(D6,5,1)];

Forecast observations over the forecast horizon.

FY = forecast(Mdl,25,Y,'A',fhA,'B',fhB,'C',fhC,'D',fhD);

FY is a 25-by-1 cell vector. Cells 1 through 20 contain 2-by-1 numeric vectors, and cells 51 through 75 contain scalars. Cell j corresponds to the observations of period j, specified by the forecast-horizon, observation model.

Plot the forecasts with the in-sample observations.

d2 = cell2mat(FY(1:20));
FY1 = [d2(((1:20)*2)-1) d2((1:20)*2)]; % Forecasts for periods 76 - 95
FY2 = cell2mat(FY(21:25));             % Forecasts for periods 96 - 100

figure
plot(1:75,[y1;y2(:,1)],'-k',1:75,[nan(50,1);y2(:,2)],'-r',...
    76:100,[FY1(:,1); FY2],'.-k',76:100,[FY1(:,2); nan(5,1)],'.-r',...
    75:76,[y2(end,1) FY1(1,1)],':k',75:76,[y2(end,2) FY1(1,2)],':r',...
    'LineWidth',2')
title('In-sample and Forecasted Observations')
ylabel('Observations')
xlabel('Period')
xlim([50,100])
legend({'In-sample AR(2)','In-sample random walk',...
    'AR(2), forecasted observations',...
    'Random walk, forecasted observations'},'Location','Best')

Tips

  • The software accommodates missing data. Indicate missing data using NaN values in the observed responses (Y).

  • To estimate in-sample forecasts, use filter.

References

[1] Durbin J., and S. J. Koopman. Time Series Analysis by State Space Methods. 2nd ed. Oxford: Oxford University Press, 2012.

See Also

| | |

Was this topic helpful?