Documentation |
The multivariate time series models used in Econometrics Toolbox™ functions are based on linear, autoregressive models. The basic models are:
Model Name | Abbreviation | Equation |
---|---|---|
Vector Autoregressive | VAR(p) | $${y}_{t}=a+{\displaystyle \sum _{i=1}^{p}{A}_{i}{y}_{t-i}}+{\epsilon}_{t}$$ |
Vector Moving Average | VMA(q) | $${y}_{t}=a+{\displaystyle \sum _{j=1}^{q}{B}_{j}{\epsilon}_{t-j}}+{\epsilon}_{t}$$ |
Vector Autoregressive Moving Average | VARMA(p, q) | $${y}_{t}=a+{\displaystyle \sum _{i=1}^{p}{A}_{i}{y}_{t-i}}+{\displaystyle \sum _{j=1}^{q}{B}_{j}{\epsilon}_{t-j}}+{\epsilon}_{t}$$ |
Vector Autoregressive Moving Average with eXogenous inputs | VARMAX(p, q, r) | $${y}_{t}=a+{X}_{t}\cdot b+{\displaystyle \sum _{i=1}^{p}{A}_{i}{y}_{t-i}}+{\displaystyle \sum _{j=1}^{q}{B}_{j}{\epsilon}_{t-j}}+{\epsilon}_{t}$$ |
Structural Vector Autoregressive Moving Average with eXogenous inputs | SVARMAX(p, q, r) | $${A}_{0}{y}_{t}=a+{X}_{t}\cdot b+{\displaystyle \sum _{i=1}^{p}{A}_{i}{y}_{t-i}}+{\displaystyle \sum _{j=1}^{q}{B}_{j}{\epsilon}_{t-j}}+{B}_{0}{\epsilon}_{t}$$ |
The following variables appear in the equations:
y_{t} is the vector of response time series variables at time t. y_{t} has n elements.
a is a constant vector of offsets, with n elements.
A_{i} are n-by-n matrices for each i. The A_{i} are autoregressive matrices. There are p autoregressive matrices.
ε_{t} is a vector of serially uncorrelated innovations, vectors of length n. The ε_{t} are multivariate normal random vectors with a covariance matrix Q, where Q is an identity matrix, unless otherwise specified.
B_{j} are n-by-n matrices for each j. The B_{j} are moving average matrices. There are q moving average matrices.
X_{t} is an n-by-r matrix representing exogenous terms at each time t. r is the number of exogenous series. Exogenous terms are data (or other unmodeled inputs) in addition to the response time series y_{t}.
b is a constant vector of regression coefficients of size r. So the product X_{t}·b is a vector of size n.
Generally, the time series y_{t} and X_{t} are observable. In other words, if you have data, it represents one or both of these series. You do not always know the offset a, coefficient b, autoregressive matrices A_{i}, and moving average matrices B_{j}. You typically want to fit these parameters to your data. See the vgxvarx function reference page for ways to estimate unknown parameters. The innovations ε_{t} are not observable, at least in data, though they can be observable in simulations.
There is an equivalent representation of the linear autoregressive equations in terms of lag operators. The lag operator L moves the time index back by one: Ly_{t} = y_{t–1}. The operator L^{m} moves the time index back by m: L^{m}y_{t} = y_{t–m}.
In lag operator form, the equation for a SVARMAX(p, q, r) model becomes
$$\left({A}_{0}-{\displaystyle \sum _{i=1}^{p}{A}_{i}{L}^{i}}\right){y}_{t}=a+{X}_{t}b+\left({B}_{0}+{\displaystyle \sum _{j=1}^{q}{B}_{j}{L}^{j}}\right){\epsilon}_{t}.$$
This equation can be written as
$$A(L){y}_{t}=a+{X}_{t}b+B(L){\epsilon}_{t},$$
where
$$A(L)={A}_{0}-{\displaystyle \sum _{i=1}^{p}{A}_{i}{L}^{i}}\text{and}B(L)={B}_{0}+{\displaystyle \sum _{j=1}^{q}{B}_{j}{L}^{j}}.$$
$$\mathrm{det}\left({I}_{n}-{A}_{1}z-{A}_{2}{z}^{2}-\mathrm{...}-{A}_{p}{z}^{p}\right)\ne 0\text{for}\left|z\right|\le 1,$$
This condition implies that, with all innovations equal to zero, the VAR process converges to a as time goes on. See Lütkepohl [69] Chapter 2 for a discussion.
$$\mathrm{det}\left({I}_{n}+{B}_{1}z+{B}_{2}{z}^{2}+\mathrm{...}+{B}_{q}{z}^{q}\right)\ne 0\text{for}\left|z\right|\le 1.$$
This condition implies that the pure VAR representation of the process is stable. For an explanation of how to convert between VAR and VMA models, see Changing Model Representations. See Lütkepohl [69] Chapter 11 for a discussion of invertible VMA models.
A VARMA model is stable if its VAR part is stable. Similarly, a VARMA model is invertible if its VMA part is invertible.
There is no well-defined notion of stability or invertibility for models with exogenous inputs (e.g., VARMAX models). An exogenous input can destabilize a model.
To understand a multiple time series model, or multiple time series data, you generally perform the following steps:
Import and preprocess data.
Specify a model.
Specifying Models to set up a model using vgxset:
Specification Structures with Known Parameters to specify a model with known parameters
Specification Structures with No Parameter Values to specify a model when you want MATLAB^{®} to estimate the parameters
Specification Structures with Selected Parameter Values to specify a model where you know some parameters, and want MATLAB to estimate the others
Determining an Appropriate Number of Lags to determine an appropriate number of lags for your model
Fit the model to data.
Fitting Models to Data to use vgxvarx to estimate the unknown parameters in your models. This can involve:
Changing Model Representations to change your model to a type that vgxvarx handles
Estimating Structural Matrices
Analyze and forecast using the fitted model. This can involve:
Checking Stability to determine whether your model is stable and invertible
Forecasting with vgxpred to forecast directly from models
Forecasting with vgxsim to simulate a model
Generate Impulse Responses for a VAR model to calculate impulse responses, which give forecasts based on an assumed change in an input to a time series
Comparing Forecasts with Forecast Period Data to compare the results of your model's forecasts to your data
Your application need not involve all of the steps in this workflow. For example, you might not have any data, but want to simulate a parameterized model. In that case, you would perform only steps 2 and 4 of the generic workflow.
You might iterate through some of these steps.
Often, the first step in creating a multiple time series model is to obtain data. There are two types of multiple time series data:
Response data. Response data corresponds to y_{t} in the multiple time series models defined in Types of VAR Models.
Exogenous data. Exogenous data corresponds to X_{t} in the multiple time series models defined in Types of VAR Models. For details and examples on structuring preparing exogenous data for multivariate model analysis, see Multivariate Time Series Models with Regression Terms.
Before using Econometrics Toolbox functions with the data, put the data into the required form. Use standard MATLAB commands, or preprocess the data with a spreadsheet program, database program, PERL, or other utilities.
There are several freely available sources of data sets, such as the St. Louis Federal Reserve Economics Database (known as FRED): http://research.stlouisfed.org/fred2/. If you have a license, you can use Datafeed Toolbox™ functions to access data from various sources.
Response data for multiple time series models must be in the form of a matrix. Each row of the matrix represents one time, and each column of the matrix represents one time series. The earliest data is the first row, the latest data is the last row. The data represents y_{t} in the notation of Types of VAR Models. If there are T times and n time series, put the data in the form of a T-by-n matrix:
$$\left[\begin{array}{cccc}Y{1}_{1}& Y{2}_{1}& \cdots & Y{n}_{1}\\ Y{1}_{2}& Y{2}_{2}& \cdots & Y{n}_{2}\\ \vdots & \vdots & \ddots & \vdots \\ Y{1}_{T}& Y{2}_{T}& \cdots & Y{n}_{T}\end{array}\right]$$
Y1_{t} represents time series 1,..., Yn_{t} represents time series n, 1 ≤ t ≤ T.
Multiple Paths. Response time series data can have an extra dimension corresponding to separate, independent paths. For this type of data, use a three-dimensional array Y(t,j,p), where:
t is the time index of an observation, 1 ≤ t ≤ T.
j is the index of a time series, 1 ≤ j ≤ n.
p is the path index, 1 ≤ p ≤ P.
For any path p, Y(t,j,p) is a time series.
The file Data_USEconModel ships with Econometrics Toolbox software. It contains time series from the St. Louis Federal Reserve Economics Database (known as FRED).
Enter
load Data_USEconModel
to load the data into your MATLAB workspace. The following items load into the workspace:
Data, a 249-by-14 matrix containing the 14 time series,
DataTable, a 249-by-14 tabular array that packages the data,
dates, a 249-element vector containing the dates for Data,
Description, a character array containing a description of the data series and the key to the labels for each series,
series, a 1-by-14 cell array of labels for the time series.
Examine the data structures:
firstPeriod = dates(1) lastPeriod = dates(end)
firstPeriod = 711217 lastPeriod = 733863
dates is a vector containing MATLAB serial date numbers, the number of days since the putative date January 1, 0000. (This "date" is not a real date, but is convenient for making date calculations; for more information, see Date Formats in the Financial Toolbox™ User's Guide.)
The Data matrix contains 14 columns. These represent the time series labeled by the cell vector of strings series.
FRED Series | Description |
---|---|
COE | Paid compensation of employees in $ billions |
CPIAUCSL | Consumer Price Index |
FEDFUNDS | Effective federal funds rate |
GCE | Government consumption expenditures and investment in $ billions |
GDP | Gross Domestic Product |
GDPDEF | Gross domestic product in $ billions |
GDPI | Gross private domestic investment in $ billions |
GS10 | Ten-year treasury bond yield |
HOANBS | Non-farm business sector index of hours worked |
M1SL | M1 money supply (narrow money) |
M2SL | M2 money supply (broad money) |
PCEC | Personal consumption expenditures in $ billions |
TB3MS | Three-month treasury bill yield |
UNRATE | Unemployment rate |
DataTable is a tabular array containing the same data as in Data, but you can call variables using the tabular array and the name of the variable. Use dot notation to access a variable, for example, DataTable.UNRATE calls the unemployment rate time series.
Your data might have characteristics that violate assumptions for linear multiple time series models. For example, you can have data with exponential growth, or data from multiple sources at different periodicities. You must preprocess your data to convert it into an acceptable form for analysis.
For time series with exponential growth, you can preprocess the data by taking the logarithm of the growing series. In some cases you then difference the result. For an example, see Transforming Data for Stationarity.
For data from multiple sources, you must decide how best to fill in missing values. Commonly, you take the missing values as unchanged from the previous value, or as interpolated from neighboring values.
Note: If you take a difference of a series, the series becomes 1 shorter. If you take a difference of only some time series in a data set, truncate the other series so that all have the same length, or pad the differenced series with initial values. |
Testing Data for Stationarity. You can test each time series data column for stability using unit root tests. For details, see Unit Root Nonstationarity.
To fit a lagged model to data, partition the response data in up to three sections:
Presample data
Estimation data
Forecast data
The purpose of presample data is to provide initial values for lagged variables. When trying to fit a model to the estimation data, you need to access earlier times. For example, if the maximum lag in a model is 4, and if the earliest time in the estimation data is 50, then the model needs to access data at time 46 when fitting the observations at time 50. vgxvarx assumes the value 0 for any data that is not part of the presample data.
The estimation data contains the observations y_{t}. Use the estimation data to fit the model.
Use the forecast data for comparing fitted model predictions against data. You do not have to have a forecast period. Use one to validate the predictive power of a fitted model.
The following figure shows how to arrange the data in the data matrix, with j presample rows and k forecast rows.
Econometrics Toolbox functions require a model specification structure as an input before they simulate, calibrate, forecast, or perform other calculations. You can specify a model with or without time series data. If you have data, you can fit models to the data as described in VAR Model Estimation. If you do not have data, you can specify a model with parameters you provide, as described in Specification Structures with Known Parameters.
Create an Econometrics Toolbox multiple time series model specification structure using the vgxset function. Use this structure for calibrating, simulating, predicting, analyzing, and displaying multiple time series.
There are three ways to create model structures using the vgxset function:
Specification Structures with Known Parameters. Use this method when you know the values of all relevant parameters of your model.
Specification Structures with No Parameter Values. Use this method when you know the size, type, and number of lags in your model, but do not know the values of any of the AR or MA coefficients, or the value of your Q matrix.
Specification Structures with Selected Parameter Values. Use this method when you know the size, type, and number of lags in your model, and also know some, but not all, of the values of AR or MA coefficients. This method includes the case when you want certain parameters to be zero. You can specify as many parameters as you like. For example, you can specify certain parameters, request that vgxvarx estimate others, and specify other parameters with [] to indicate default values.
If you know the values of model parameters, create a model structure with the parameters. The following are the name-value pairs you can pass to vgxset for known parameter values:
Model Parameters
Name | Value |
---|---|
a | An n-vector of offset constants. The default is empty. |
AR0 | An n-by-n invertible matrix representing the zero-lag structural AR coefficients. The default is empty, which means an identity matrix. |
AR | An nAR-element cell array of n-by-n matrices of AR coefficients. The default is empty. |
MA0 | An n-by-n invertible matrix representing the zero-lag structural MA coefficients. The default is empty, which means an identity matrix. |
MA | An nMA-element cell array of n-by-n matrices of MA coefficients. The default is empty. |
b | An nX-vector of regression coefficients. The default is empty. |
Q | An n-by-n symmetric innovations covariance matrix. The default is empty, which means an identity matrix. |
ARlag | A monotonically increasing nAR-vector of AR lags. The default is empty, which means all the lags from 1 to p, the maximum AR lag. |
MAlag | A monotonically increasing nMA-vector of MA lags. The default is empty, which means all the lags from 1 to q, the maximum MA lag. |
vgxset infers the model dimensionality, given by n, p, and q in Types of VAR Models, from the input parameters. These parameters are n, nAR, and nMA respectively in vgxset syntax. For more information, see Specification Structures with No Parameter Values.
The ARlag and MAlag vectors allow you to specify which lags you want to include. For example, to specify AR lags 1 and 3 without lag 2, set ARlag to [1 3]. This setting corresponds to nAR = 2 for two specified lags, even though this is a third order model, since the maximum lag is 3.
The following example shows how to create a model structure when you have known parameters. Consider a VAR(1) model:
$${y}_{t}=a+\left[\begin{array}{ccc}.5& 0& 0\\ .1& .1& .3\\ 0& .2& .3\end{array}\right]{y}_{t-1}+{\epsilon}_{t},$$
Specifically, a = [0.05, 0, –.05]' and w_{t} are distributed as standard three-dimensional normal random variables.
Create a model specification structure with vgxset:
A1 = [.5 0 0;.1 .1 .3;0 .2 .3]; Q = eye(3); Mdl = vgxset('a',[.05;0;-.05],'AR',{A1},'Q',Q)
Mdl = Model: 3-D VAR(1) with Additive Constant n: 3 nAR: 1 nMA: 0 nX: 0 a: [0.05 0 -0.05] additive constants AR: {1x1 cell} stable autoregressive process Q: [3x3] covariance matrix
vgxset identifies this model as a stable VAR(1) model with three dimensions and additive constants.
By default, vgxvarx fits all unspecified additive (a), AR, regression coefficients (b), and Q parameters. You must specify the number of time series and the type of model you want vgxvarx to fit. The following are the name-value pairs you can pass to vgxset for unknown parameter values:
Model Orders
Name | Value |
---|---|
n | A positive integer specifying the number of time series. The default is 1. |
nAR | A nonnegative integer specifying the number of AR lags (corresponds to p in Types of VAR Models). The default is 0. |
nMA | A nonnegative integer specifying the number of MA lags (corresponds to q in Types of VAR Models). The default is 0. Currently, vgxvarx cannot fit MA matrices. Therefore, specifying an nMA greater than 0 does not yield estimated MA matrices. |
nX | A nonnegative integer specifying the number regression parameters (corresponds to r in Types of VAR Models). The default is 0. |
Constant | Additive offset logical indicator. The default is false. |
The following example shows how to specify the model in Specification Structures with Known Parameters, but without explicit parameters.
Mdl = vgxset('n',3,'nAR',1,'Constant',true)
Mdl = Model: 3-D VAR(1) with Additive Constant n: 3 nAR: 1 nMA: 0 nX: 0 a: [] AR: {} Q: []
You can create a model structure with some known parameters, and have vgxvarx fit the unknown parameters to data. Here are the name-value pairs you can pass to vgxset for requested parameter values:
Model Parameter Estimation
Name | Value |
---|---|
asolve | An n-vector of additive offset logical indicators. The default is empty, which means true(n,1). |
ARsolve | An nAR-element cell array of n-by-n matrices of AR logical indicators. The default is empty, which means an nAR-element cell array of true(n). |
AR0solve | An n-by-n matrix of AR0 logical indicators. The default is empty, which means false(n). |
MAsolve | An nMA-element cell array of n-by-n matrices of MA logical indicators. The default is empty, which means false(n). |
MA0solve | An n-by-n matrix of MA0 logical indicators. The default is empty, which means false(n). |
bsolve | An nX-vector of regression logical indicators. The default is empty, which means true(n,1). |
Qsolve | An n-by-n symmetric covariance matrix logical indicator. The default is empty, which means true(n), unless the 'CovarType' option of vgxvarx overrides it. |
Specify a logical 1 (true) for every parameter that you want vgxvarx to estimate.
Currently, vgxvarx cannot fit the AR0, MA0, or MA matrices. Therefore, vgxvarx ignores the AR0solve, MA0solve, and MAsolve indicators. However, you can examine the Example_StructuralParams.m file for an approach to estimating the AR0 and MA0 matrices. Enter help Example_StructuralParams at the MATLAB command line for information. See Lütkepohl [69] Chapter 9 for algorithms for estimating structural models.
Currently, vgxvarx also ignores the Qsolve matrix. vgxvarx can fit either a diagonal or a full Q matrix; see vgxvarx.
This example shows how to specify the model in Specification Structures with Known Parameters, but with requested AR parameters with a diagonal autoregressive structure. The dimensionality of the model is known, as is the parameter vector a, but the autoregressive matrix A1 and covariance matrix Q are not known.
Mdl = vgxset('ARsolve',{logical(eye(3))},'a',... [.05;0;-.05])
Mdl = Model: 3-D VAR(1) with Additive Constant n: 3 nAR: 1 nMA: 0 nX: 0 a: [0.05 0 -0.05] additive constants AR: {} ARsolve: {1x1 cell of logicals} autoregressive lag indicators Q: []
After you set up a model structure, you can examine it in several ways:
Call the vgxdisp function.
Double-click the structure in the MATLAB Workspace browser.
Call the vgxget function.
Enter Spec at the MATLAB command line, where Spec is the name of the model structure.
Enter Spec.ParamName at the MATLAB command line, where Spec is the name of the model structure, and ParamName is the name of the parameter you want to examine.
You can change any part of a model structure named, for example, Spec, using the vgxset as follows:
Spec = vgxset(Spec,'ParamName',value,...)
This syntax changes only the 'ParamName' parts of the Spec structure.
There are two Econometrics Toolbox functions that can help you determine an appropriate number of lags for your models:
The lratiotest function performs likelihood ratio tests to help identify the appropriate number of lags.
The aicbic function calculates the Akaike and Bayesian information criteria to determine the minimal appropriate number of required lags.
Example: Using the Likelihood Ratio Test to Calculate the Minimal Requisite Lag. lratiotest requires inputs of the loglikelihood of an unrestricted model, the loglikelihood of a restricted model, and the number of degrees of freedom (DoF). DoF is the difference in the number of active parameters between the unrestricted and restricted models. lratiotest returns a Boolean: 1 means reject the restricted model in favor of the unrestricted model, 0 means there is insufficient evidence to reject the restricted model.
In the context of determining an appropriate number of lags, the restricted model has fewer lags, and the unrestricted model has more lags. Otherwise, test models with the same type of fitted parameters (for example, both with full Q matrices, or both with diagonal Q matrices).
Obtain the loglikelihood (LLF) of a model as the third output of vgxvarx:
[EstSpec,EstStdErrors,LLF,W] = vgxvarx(...)
Obtain the number of active parameters in a model as the second output of vgxcount:
[NumParam,NumActive] = vgxcount(Spec)
For example, suppose you have four fitted models with varying lag structures. The models have loglikelihoods LLF1, LLF2, LLF3, and LLF4, and active parameter counts n1p, n2p, n3p, and n4p. Suppose model 4 has the largest number of lags. Perform likelihood ratio tests of models 1, 2, and 3 against model 4, as follows:
reject1 = lratiotest(LLF4,LLF1,n4p - n1p) reject2 = lratiotest(LLF4,LLF2,n4p - n2p) reject3 = lratiotest(LLF4,LLF3,n4p - n3p)
If reject1 = 1, you reject model 1 in favor of model 4, and similarly for models 2 and 3. If any of the models have rejectI = 0, you have an indication that you can use fewer lags than in model 4.
Example: Using Akaike Information Criterion to Calculate the Minimal Requisite Lag. aicbic requires inputs of the loglikelihood of a model and of the number of active parameters in the model. It returns the value of the Akaike information criterion. Lower values are better than higher values. aicbic accepts vectors of loglikelihoods and vectors of active parameters, and returns a vector of Akaike information criteria, which makes it easy to find the minimum.
Obtain the loglikelihood (LLF) of a model as the third output of vgxvarx:
[EstSpec,EstStdErrors,LLF,W] = vgxvarx(...)
Obtain the number of active parameters in a model as the second output of vgxcount:
[NumParam,NumActive] = vgxcount(Spec)
For example, suppose you have four fitted models with varying lag structures. The models have loglikelihoods LLF1, LLF2, LLF3, and LLF4, and active parameter counts n1p, n2p, n3p, and n4p. Calculate the Akaike information criteria as follows:
AIC = aicbic([LLF1 LLF2 LLF3 LLF4],[n1p n2p n3p n4p])
The most suitable model has the lowest value of the Akaike information criterion.
To create a model of multiple time series data, decide on a parametric form of the model, and fit parameters to the data. When you have a calibrated (fitted) model, check if the model fits the data adequately.
To fit a model to data, you must have:
Time series data, as described in Multivariate Time Series Data
At least one time series model specification structure, as described in Model Specification Structures
There are several Econometrics Toolbox functions that aid these tasks, including:
vgxvarx, which fits VARX models.
vgxar and vgxma, which convert models to pure AR or MA models; vgxar enables you to fit VARMA models with vgxvarx, as described in Fit a VARMA Model
lratiotest, lmtest, waldtest, and aicbic, which can help determine the number of lags to include in a model.
vgxqual, which examines the stability of models, as described in Checking Model Adequacy.
vgxpred, which creates forecasts that can be used to check the adequacy of the fit, as described in VAR Model Forecasting, Simulation, and Analysis
Structural Matrices. The structural matrices in SVARMAX models are the A_{0} and B_{0} matrices. See Types of VAR Models for definitions of these terms. Currently, vgxvarx cannot fit these matrices to data. However, you can examine the Example_StructuralParams.m file for an approach to estimating the AR0 and MA0 matrices. Enter help Example_StructuralParams at the MATLAB command line for information. See Lütkepohl [69] Chapter 9 for algorithms for estimating structural models.
You can convert a VARMA model to an equivalent VAR model using the vgxar function. (See Types of VAR Models for definitions of these terms.) Similarly, you can convert a VARMA model to an equivalent VMA model using the vgxma function. These functions are useful in the following situations:
Calibration of models
The vgxvarx function can calibrate only VAR and VARX models. So to calibrate a VARMA model, you could first convert it to a VAR model. However, you can ask vgxvarx to ignore VMA terms and fit just the VAR structure. See Fit a VARMA Model for a comparison of conversion versus no conversion.
Forecasting
It is straightforward to generate forecasts for VMA models. In fact, vgxpred internally converts models to VMA models to calculate forecast statistics.
Analyzing models
Sometimes it is easier to define your model using one structure, but you want to analyze it using a different structure.
The algorithm for conversion between models involves series that are, in principle, infinite. The vgxar and vgxma functions truncate these series to the maximum of nMA and nAR, introducing an inaccuracy. You can specify that the conversion give more terms, or give terms to a specified accuracy. See [69] for more information on these transformations.
Convert a VARMA Model to a VAR Model.
This example creates a VARMA model, then converts it to a pure VAR model.
Create a VARMA model specification structure.
A1 = [.2 -.1 0;.1 .2 .05;0 .1 .3]; A2 = [.3 0 0;.1 .4 .1;0 0 .2]; A3 = [.4 .1 -.1;.2 -.5 0;.05 .05 .2]; MA1 = .2*eye(3); MA2 = [.3 .2 .1;.2 .4 0;.1 0 .5]; Spec = vgxset('AR',{A1,A2,A3},'MA',{MA1,MA2})
Spec = Model: 3-D VARMA(3,2) with No Additive Constant n: 3 nAR: 3 nMA: 2 nX: 0 AR: {3x1 cell} stable autoregressive process MA: {2x1 cell} invertible moving average process Q: []
Convert the structure to a pure VAR structure:
SpecAR = vgxar(Spec)
SpecAR = Model: 3-D VAR(3) with No Additive Constant n: 3 nAR: 3 nMA: 0 nX: 0 AR: {3x1 cell} unstable autoregressive process Q: []
The converted process is unstable; see the AR row. An unstable model could yield inaccurate predictions. Taking more terms in the series gives a stable model:
specAR4 = vgxar(Spec,4)
specAR4 = Model: 3-D VAR(4) with No Additive Constant n: 3 nAR: 4 nMA: 0 nX: 0 AR: {4x1 cell} stable autoregressive process Q: []
Convert a VARMA Model to a VMA Model.
This example uses a VARMA model and converts it to a pure VMA model.
Create a VARMA model specification structure.
A1 = [.2 -.1 0;.1 .2 .05;0 .1 .3]; A2 = [.3 0 0;.1 .4 .1;0 0 .2]; A3 = [.4 .1 -.1;.2 -.5 0;.05 .05 .2]; MA1 = .2*eye(3); MA2 = [.3 .2 .1;.2 .4 0;.1 0 .5]; Spec = vgxset('AR',{A1,A2,A3},'MA',{MA1,MA2})
Spec = Model: 3-D VARMA(3,2) with No Additive Constant n: 3 nAR: 3 nMA: 2 nX: 0 AR: {3x1 cell} stable autoregressive process MA: {2x1 cell} invertible moving average process Q: []
Convert the structure to a pure VAR structure:
SpecAR = vgxar(Spec)
SpecAR = Model: 3-D VAR(3) with No Additive Constant n: 3 nAR: 3 nMA: 0 nX: 0 AR: {3x1 cell} unstable autoregressive process Q: []
Convert the model specification structure Spec to a pure MA structure:
SpecMA = vgxma(Spec)
SpecMA = Model: 3-D VMA(3) with No Additive Constant n: 3 nAR: 0 nMA: 3 nX: 0 MA: {3x1 cell} non-invertible moving average process Q: []
Notice that the pure VMA process has more MA terms than the original process. The number is the maximum of nMA and nAR, and nAR = 3.
The converted VMA model is not invertible; see the MA row. A noninvertible model can yield inaccurate predictions. Taking more terms in the series results in an invertible model.
specMA4 = vgxma(Spec,4)
specMA4 = Model: 3-D VMA(4) with No Additive Constant n: 3 nAR: 0 nMA: 4 nX: 0 MA: {4x1 cell} invertible moving average process Q: []
Converting the resulting VMA model to a pure VAR model results in the same VAR(3) model as SpecAR.
SpecAR2 = vgxar(SpecMA); vgxdisp(SpecAR,SpecAR2)
Model 1: 3-D VAR(3) with No Additive Constant Conditional mean is not AR-stable and is MA-invertible Model 2: 3-D VAR(3) with No Additive Constant Conditional mean is not AR-stable and is MA-invertible Parameter Model 1 Model 2 -------------- -------------- -------------- AR(1)(1,1) 0.4 0.4 (1,2) -0.1 -0.1 (1,3) -0 -0 (2,1) 0.1 0.1 (2,2) 0.4 0.4 (2,3) 0.05 0.05 (3,1) -0 -0 (3,2) 0.1 0.1 (3,3) 0.5 0.5 AR(2)(1,1) 0.52 0.52 (1,2) 0.22 0.22 (1,3) 0.1 0.1 (2,1) 0.28 0.28 (2,2) 0.72 0.72 (2,3) 0.09 0.09 (3,1) 0.1 0.1 (3,2) -0.02 -0.02 (3,3) 0.6 0.6 AR(3)(1,1) 0.156 0.156 (1,2) -0.004 -0.004 (1,3) -0.18 -0.18 (2,1) 0.024 0.024 (2,2) -0.784 -0.784 (2,3) -0.038 -0.038 (3,1) -0.01 -0.01 (3,2) 0.014 0.014 (3,3) -0.17 -0.17 Q(:,:) [] []
Conversion Types and Accuracy. Some conversions occur when explicitly requested, such as those initiated by calls to vgxar and vgxma. Other conversions occur automatically as needed for calculations. For example, vgxpred internally converts models to VMA models to calculate forecast statistics.
By default, conversions give terms up to the largest lag present in the model. However, for more accuracy in conversion, you can specify that the conversion use more terms. You can also specify that it continue until a residual term is below a threshold you set. The syntax is
SpecAR = vgxar(Spec,nAR,ARlag,Cutoff) SpecMA = vgxma(Spec,nMA,MAlag,Cutoff)
nMA and nAR represent the number of terms in the series.
ARlag and MAlag are vectors of particular lags that you want in the converted model.
Cutoff is a positive parameter that truncates the series if the norm of a converted term is smaller than Cutoff. Cutoff is 0 by default.
For details, see the function reference pages for vgxar and vgxma.
The vgxvarx function performs parameter estimation. vgxvarx only estimates parameters for VAR and VARX models. In other words, vgxvarx does not estimate moving average matrices, which appear, for example, in VMA and VARMA models. For definitions of these terms, see Types of VAR Models.
The vgxar function converts a VARMA model to a VAR model. Currently, it does not handle VARMAX models.
You have two choices in fitting parameters to a VARMA model or VARMAX model:
Set the vgxvarx 'IgnoreMA' parameter to 'yes' (the default is 'no'). In this case vgxvarx ignores VMA parameters, and fits the VARX parameters.
Convert a VARMA model to a VAR model using vgxar. Then fit the resulting VAR model using vgxvarx.
Each of these options is effective on some data. Try both if you have VMA terms in your model. See Fit a VARMA Model for an example showing both options.
This example uses two series: the consumer price index (CPI) and the unemployment rate (UNRATE) from the data set Data_USEconmodel.
Obtain the two time series, and convert them for stationarity:
load Data_USEconModel
cpi = DataTable.CPIAUCSL;
cpi = log(cpi);
dCPI = diff(cpi);
unem = DataTable.UNRATE;
Y = [dCPI,unem(2:end)];
Create a VAR model:
Spec = vgxset('n',2,'nAR',4,'Constant',true)
Spec = Model: 2-D VAR(4) with Additive Constant n: 2 nAR: 4 nMA: 0 nX: 0 a: [] AR: {} Q: []
Fit the model to the data using vgxvarx:
[EstSpec,EstStdErrors,logL,W] = vgxvarx(Spec,Y); vgxdisp(EstSpec)
Model : 2-D VAR(4) with Additive Constant Conditional mean is AR-stable and is MA-invertible a Constant: 0.00184568 0.315567 AR(1) Autoregression Matrix: 0.308635 -0.0032011 -4.48152 1.34343 AR(2) Autoregression Matrix: 0.224224 0.00124669 7.19015 -0.26822 AR(3) Autoregression Matrix: 0.353274 0.00287036 1.48726 -0.227145 AR(4) Autoregression Matrix: -0.0473456 -0.000983119 8.63672 0.0768312 Q Innovations Covariance: 3.51443e-05 -0.000186967 -0.000186967 0.116697
This example uses artificial data to generate a time series, then shows two ways of fitting a VARMA model to the series.
Specify the model:
AR1 = [.3 -.1 .05;.1 .2 .1;-.1 .2 .4]; AR2 = [.1 .05 .001;.001 .1 .01;-.01 -.01 .2]; Q = [.2 .05 .02;.05 .3 .1;.02 .1 .25]; MA1 = [.5 .2 .1;.1 .6 .2;0 .1 .4]; MA2 = [.2 .1 .1; .05 .1 .05;.02 .04 .2]; Spec = vgxset('AR',{AR1,AR2},'Q',Q,'MA',{MA1,MA2})
Spec = Model: 3-D VARMA(2,2) with No Additive Constant n: 3 nAR: 2 nMA: 2 nX: 0 AR: {2x1 cell} stable autoregressive process MA: {2x1 cell} invertible moving average process Q: [3x3] covariance matrix
Generate a time series using vgxsim:
YF = [100 50 20;110 52 22;119 54 23]; % starting values rng(1); % For reproducibility Y = vgxsim(Spec,190,[],YF);
Fit the data to a VAR model by calling vgxvarx with the 'IgnoreMA' option:
estSpec = vgxvarx(Spec,Y(3:end,:),[],Y(1:2,:),'IgnoreMA','yes');
Compare the estimated model with the original:
vgxdisp(Spec,estSpec)
Model 1: 3-D VARMA(2,2) with No Additive Constant Conditional mean is AR-stable and is MA-invertible Model 2: 3-D VAR(2) with No Additive Constant Conditional mean is AR-stable and is MA-invertible Parameter Model 1 Model 2 -------------- -------------- -------------- AR(1)(1,1) 0.3 0.723964 (1,2) -0.1 0.119695 (1,3) 0.05 0.10452 (2,1) 0.1 0.0828041 (2,2) 0.2 0.788177 (2,3) 0.1 0.299648 (3,1) -0.1 -0.138715 (3,2) 0.2 0.397231 (3,3) 0.4 0.748157 AR(2)(1,1) 0.1 -0.126833 (1,2) 0.05 -0.0690256 (1,3) 0.001 -0.118524 (2,1) 0.001 0.0431623 (2,2) 0.1 -0.265387 (2,3) 0.01 -0.149646 (3,1) -0.01 0.107702 (3,2) -0.01 -0.304243 (3,3) 0.2 0.0165912 MA(1)(1,1) 0.5 (1,2) 0.2 (1,3) 0.1 (2,1) 0.1 (2,2) 0.6 (2,3) 0.2 (3,1) 0 (3,2) 0.1 (3,3) 0.4 MA(2)(1,1) 0.2 (1,2) 0.1 (1,3) 0.1 (2,1) 0.05 (2,2) 0.1 (2,3) 0.05 (3,1) 0.02 (3,2) 0.04 (3,3) 0.2 Q(1,1) 0.2 0.193553 Q(2,1) 0.05 0.0408221 Q(2,2) 0.3 0.252461 Q(3,1) 0.02 0.00690626 Q(3,2) 0.1 0.0922074 Q(3,3) 0.25 0.306271
The estimated Q matrix is close to the original Q matrix. However, the estimated AR terms are not close to the original AR terms. Specifically, nearly all the AR(2) coefficients are the opposite sign, and most AR(1) coefficients are off by about a factor of 2.
Alternatively, before fitting the model, convert it to a pure AR model. To do this, specify the model and generate a time series as above. Then, convert the model to a pure AR model:
specAR = vgxar(Spec);
Fit the converted model to the data:
estSpecAR = vgxvarx(specAR,Y(3:end,:),[],Y(1:2,:));
Compare the fitted model to the original model:
vgxdisp(specAR,estSpecAR)
Model 1: 3-D VAR(2) with No Additive Constant Conditional mean is AR-stable and is MA-invertible Model 2: 3-D VAR(2) with No Additive Constant Conditional mean is AR-stable and is MA-invertible Parameter Model 1 Model 2 -------------- -------------- -------------- AR(1)(1,1) 0.8 0.723964 (1,2) 0.1 0.119695 (1,3) 0.15 0.10452 (2,1) 0.2 0.0828041 (2,2) 0.8 0.788177 (2,3) 0.3 0.299648 (3,1) -0.1 -0.138715 (3,2) 0.3 0.397231 (3,3) 0.8 0.748157 AR(2)(1,1) -0.13 -0.126833 (1,2) -0.09 -0.0690256 (1,3) -0.114 -0.118524 (2,1) -0.129 0.0431623 (2,2) -0.35 -0.265387 (2,3) -0.295 -0.149646 (3,1) 0.03 0.107702 (3,2) -0.17 -0.304243 (3,3) 0.05 0.0165912 Q(1,1) 0.2 0.193553 Q(2,1) 0.05 0.0408221 Q(2,2) 0.3 0.252461 Q(3,1) 0.02 0.00690626 Q(3,2) 0.1 0.0922074 Q(3,3) 0.25 0.306271
The model coefficients between the pure AR models are closer than between the original VARMA model and the fitted AR model. Most model coefficients are within 20% or the original. Notice, too, that estSpec and estSpecAR are identical. This is because both are AR(2) models fitted to the same data series.
How vgxvarx Works. vgxvarx finds maximum likelihood estimators of AR and Q matrices and the a and b vectors if present. For VAR models and if the response series do not contain NaN values, vgxvarx uses a direct solution algorithm that requires no iterations. For VARX models or if the response data contain missing values, vgxvarx optimizes the likelihood using the expectation-conditional-maximization (ECM) algorithm. The iterations usually converge quickly, unless two or more exogenous data streams are proportional to each other. In that case, there is no unique maximum likelihood estimator, and the iterations might not converge. You can set the maximum number of iterations with the MaxIter parameter, which has a default value of 1000. vgxvarx does not support exogenous series containing NaN values.
vgxvarx calculates the loglikelihood of the data, giving it as an output of the fitted model. Use this output in testing the quality of the model. For example, see Determining an Appropriate Number of Lags and Examining the Stability of a Fitted Model.
When you enter the name of a fitted model at the command line, you obtain a summary. This summary contains a report on the stability of the VAR part of the model, and the invertibility of the VMA part. You can also find whether a model is stable or invertible by entering:
[isStable,isInvertible] = vgxqual(Spec)
This gives a Boolean value of 1 for isStable if the model is stable, and a Boolean value of 1 for isInvertible if the model is invertible. This stability or invertibility does not take into account exogenous terms, which can disrupt the stability of a model.
Stable, invertible models give reliable results, while unstable or noninvertible ones might not.
Stability and invertibility are equivalent to all eigenvalues of the associated lag operators having modulus less than 1. In fact vgxqual evaluates these quantities by calculating eigenvalues. For more information, see the vgxqual function reference page or Hamilton [48]
When you have models with parameters (known or estimated), you can examine the predictions of the models. For information on specifying models, see Model Specification Structures. For information on calibrating models, see VAR Model Estimation.
The main methods of forecasting are:
Generating forecasts with error bounds using vgxpred
Generating simulations with vgxsim
Generating sample paths with vgxproc
These functions base their forecasts on a model specification and initial data. The functions differ in their innovations processes:
vgxpred assumes zero innovations. Therefore, vgxpred yields a deterministic forecast.
vgxsim assumes the innovations are jointly normal with covariance matrix Q. vgxsim yields pseudorandom (Monte Carlo) sample paths.
vgxproc uses a separate input for the innovations process. vgxproc yields a sample path that is deterministically based on the innovations process.
vgxpred is faster and takes less memory than generating many sample paths using vgxsim. However, vgxpred is not as flexible as vgxsim. For example, suppose you transform some time series before making a model, and want to undo the transformation when examining forecasts. The error bounds given by transforms of vgxpred error bounds are not valid bounds. In contrast, the error bounds given by the statistics of transformed simulations are valid.
This example shows how to use vgxpred to forecast a VAR model.
vgxpred enables you to generate forecasts with error estimates. vgxpred requires:
A fully-specified model (for example , impSpec in what follows)
The number of periods for the forecast (for example, FT in what follows)
vgxpred optionally takes:
An exogenous data series
A presample time series (e.g., Y(end-3:end,:) in what follows)
Extra paths
Load the Data_USEconModel data set. This example uses two time series: the logarithm of real GDP, and the real 3-month T-bill rate, both differenced to be approximately stationary. Suppose that a VAR(4) model is appropriate to describe the time series.
load Data_USEconModel DEF = log(DataTable.CPIAUCSL); GDP = log(DataTable.GDP); rGDP = diff(GDP - DEF); % Real GDP is GDP - deflation TB3 = 0.01*DataTable.TB3MS; dDEF = 4*diff(DEF); % Scaling rTB3 = TB3(2:end) - dDEF; % Real interest is deflated Y = [rGDP,rTB3];
Fit a VAR(4) model specification:
Spec = vgxset('n',2,'nAR',4,'Constant',true); impSpec = vgxvarx(Spec,Y(5:end,:),[],Y(1:4,:)); impSpec = vgxset(impSpec,'Series',... {'Transformed real GDP','Transformed real 3-mo T-bill rate'});
Predict the evolution of the time series:
FDates = datenum({'30-Jun-2009'; '30-Sep-2009'; '31-Dec-2009'; '31-Mar-2010'; '30-Jun-2010'; '30-Sep-2010'; '31-Dec-2010'; '31-Mar-2011'; '30-Jun-2011'; '30-Sep-2011'; '31-Dec-2011'; '31-Mar-2012'; '30-Jun-2012'; '30-Sep-2012'; '31-Dec-2012'; '31-Mar-2013'; '30-Jun-2013'; '30-Sep-2013'; '31-Dec-2013'; '31-Mar-2014'; '30-Jun-2014' }); FT = numel(FDates); [Forecast,ForecastCov] = vgxpred(impSpec,FT,[],... Y(end-3:end,:));
View the forecast using vgxplot:
vgxplot(impSpec,Y(end-10:end,:),Forecast,ForecastCov);
Forecast a VAR Model Using Monte Carlo Simulation.
This example shows how to use Monte Carlo simulation via vgxsim to forecast a VAR model.
vgxsim enables you to generate simulations of time series based on your model. If you have a trustworthy model structure, you can use these simulations as sample forecasts.
vgxsim requires:
A model (impSpec in what follows)
The number of periods for the forecast (FT in what follows)
vgxsim optionally takes:
An exogenous data series
A presample time series (Y(end-3:end,:) in what follows)
Presample innovations
The number of realizations to simulate (2000 in what follows)
Load the Data_USEconModel data set. This example uses two time series: the logarithm of real GDP, and the real 3-month T-bill rate, both differenced to be approximately stationary. For illustration, a VAR(4) model describes the time series.
load Data_USEconModel DEF = log(DataTable.CPIAUCSL); GDP = log(DataTable.GDP); rGDP = diff(GDP - DEF); % Real GDP is GDP - deflation TB3 = 0.01*DataTable.TB3MS; dDEF = 4*diff(DEF); % Scaling rTB3 = TB3(2:end) - dDEF; % Real interest is deflated Y = [rGDP,rTB3];
Fit a VAR(4) model specification:
Spec = vgxset('n',2,'nAR',4,'Constant',true); impSpec = vgxvarx(Spec,Y(5:end,:),[],Y(1:4,:)); impSpec = vgxset(impSpec,'Series',... {'Transformed real GDP','Transformed real 3-mo T-bill rate'});
Define the forecast horizon.
FDates = datenum({'30-Jun-2009'; '30-Sep-2009'; '31-Dec-2009'; '31-Mar-2010'; '30-Jun-2010'; '30-Sep-2010'; '31-Dec-2010'; '31-Mar-2011'; '30-Jun-2011'; '30-Sep-2011'; '31-Dec-2011'; '31-Mar-2012'; '30-Jun-2012'; '30-Sep-2012'; '31-Dec-2012'; '31-Mar-2013'; '30-Jun-2013'; '30-Sep-2013'; '31-Dec-2013'; '31-Mar-2014'; '30-Jun-2014' }); FT = numel(FDates);
Simulate the model for 10 steps, replicated 2000 times:
rng(1); %For reproducibility
Ysim = vgxsim(impSpec,FT,[],Y(end-3:end,:),[],2000);
Calculate the mean and standard deviation of the simulated series:
Ymean = mean(Ysim,3); % Calculate means Ystd = std(Ysim,0,3); % Calculate std deviations
Plot the means +/- 1 standard deviation for the simulated series:
subplot(2,1,1) plot(dates(end-10:end),Y(end-10:end,1),'k') hold('on') plot([dates(end);FDates],[Y(end,1);Ymean(:,1)],'r') plot([dates(end);FDates],[Y(end,1);Ymean(:,1)]+[0;Ystd(:,1)],'b') plot([dates(end);FDates],[Y(end,1);Ymean(:,1)]-[0;Ystd(:,1)],'b') datetick('x') title('Transformed real GDP') subplot(2,1,2) plot(dates(end-10:end),Y(end-10:end,2),'k') hold('on') axis([dates(end-10),FDates(end),-.1,.1]); plot([dates(end);FDates],[Y(end,2);Ymean(:,2)],'r') plot([dates(end);FDates],[Y(end,2);Ymean(:,2)]+[0;Ystd(:,2)],'b') plot([dates(end);FDates],[Y(end,2);Ymean(:,2)]-[0;Ystd(:,2)],'b') datetick('x') title('Transformed real 3-mo T-bill rate')
How vgxpred and vgxsim Work. vgxpred generates two quantities:
A deterministic forecast time series based on 0 innovations
Time series of forecast covariances based on the Q matrix
The simulations for models with VMA terms uses presample innovation terms. Presample innovation terms are values of ε_{t} for times before the forecast period that affect the MA terms. For definitions of the terms MA, Q, and ε_{t}, see Types of VAR Models. If you do not provide all requisite presample innovation terms, vgxpred assumes the value 0 for missing terms.
vgxsim generates random time series based on the model using normal random innovations distributed with Q covariances. The simulations of models with MA terms uses presample innovation terms. If you do not provide all requisite presample innovation terms, vgxsim assumes the value 0 for missing terms.
If you scaled any time series before fitting a model, you can unscale the resulting time series to understand its predictions more easily.
If you scaled a series with log, transform predictions of the corresponding model with exp.
If you scaled a series with diff(log), transform predictions of the corresponding model with cumsum(exp). cumsum is the inverse of diff; it calculates cumulative sums. As in integration, you must choose an appropriate additive constant for the cumulative sum. For example, take the log of the final entry in the corresponding data series, and use it as the first term in the series before applying cumsum.
You can examine the effect of impulse responses to models with the vgxproc function. An impulse response is the deterministic response of a time series model to an innovations process that has the value of one standard deviation in one component at the initial time, and zeros in all other components and times. vgxproc simulates the evolution of a time series model from a given innovations process. Therefore, vgxproc is appropriate for examining impulse responses.
The only difficulty in using vgxproc is determining exactly what is "the value of one standard deviation in one component at the initial time." This value can mean different things depending on your model.
For a structural model, B_{0} is usually a known diagonal matrix, and Q is an identity matrix. In this case, the impulse response to component i is the square root of B(i,i).
For a nonstructural model, there are several choices. The simplest choice, though not necessarily the most accurate, is to take component i as the square root of Q(i,i). Other possibilities include taking the Cholesky decomposition of Q, or diagonalizing Q and taking the square root of the diagonal matrix.
Generate Impulse Responses for a VAR model.
This example shows how to generate impulse responses of an interest rate shock on real GDP using vgxproc.
Load the Data_USEconModel data set. This example uses two time series: the logarithm of real GDP, and the real 3-month T-bill rate, both differenced to be approximately stationary. Suppose that a VAR(4) model is appropriate to describe the time series.
load Data_USEconModel DEF = log(DataTable.CPIAUCSL); GDP = log(DataTable.GDP); rGDP = diff(GDP - DEF); % Real GDP is GDP - deflation TB3 = 0.01*DataTable.TB3MS; dDEF = 4*diff(DEF); % Scaling rTB3 = TB3(2:end) - dDEF; % Real interest is deflated Y = [rGDP,rTB3];
Define the forecast horizon.
FDates = datenum({'30-Jun-2009'; '30-Sep-2009'; '31-Dec-2009'; '31-Mar-2010'; '30-Jun-2010'; '30-Sep-2010'; '31-Dec-2010'; '31-Mar-2011'; '30-Jun-2011'; '30-Sep-2011'; '31-Dec-2011'; '31-Mar-2012'; '30-Jun-2012'; '30-Sep-2012'; '31-Dec-2012'; '31-Mar-2013'; '30-Jun-2013'; '30-Sep-2013'; '31-Dec-2013'; '31-Mar-2014'; '30-Jun-2014' }); FT = numel(FDates);
Fit a VAR(4) model specification:
Spec = vgxset('n',2,'nAR',4,'Constant',true); impSpec = vgxvarx(Spec,Y(5:end,:),[],Y(1:4,:)); impSpec = vgxset(impSpec,'Series',... {'Transformed real GDP','Transformed real 3-mo T-bill rate'});
Generate the innovations processes both with and without an impulse (shock):
W0 = zeros(FT, 2); % Innovations without a shock W1 = W0; W1(1,2) = sqrt(impSpec.Q(2,2)); % Innovations with a shock
Generate the processes with and without the shock:
Yimpulse = vgxproc(impSpec,W1,[],Y); % Process with shock Ynoimp = vgxproc(impSpec,W0,[],Y); % Process with no shock
Undo the scaling for the GDP processes:
Yimp1 = exp(cumsum(Yimpulse(:,1))); % Undo scaling
Ynoimp1 = exp(cumsum(Ynoimp(:,1)));
Compute and plot the relative difference between the calculated GDPs:
RelDiff = (Yimp1 - Ynoimp1) ./ Yimp1; plot(FDates,100*RelDiff);dateaxis('x',12) title(... 'Impact of Interest Rate Shock on Real Gross Domestic Product') ylabel('% Change')
The graph shows that an increased interest rate causes a dip in the real GDP for a short time. Afterwards the real GDP begins to climb again, reaching its former level in about 1 year.
Incorporate feedback from exogenous predictors, or study their linear associations to the response series by including a regression component in multivariate time series models. By order of increasing complexity, examples of multivariate, time series, regression models include:
Modeling the effects of an intervention or to include shared intercepts among several responses. In these cases, the exogenous series are indicator variables.
Modeling the contemporaneous linear associations between a subset of exogenous series to each response. Applications include CAPM analysis and studying the effects of prices of items on their demand. These applications are examples of seemingly unrelated regression (SUR). For more details, see Implement Seemingly Unrelated Regression Analyses and Estimate the Capital Asset Pricing Model Using SUR.
Modeling the linear associations between contemporaneous and lagged, exogenous series and the response as part of a multivariate, distributed lag model. Applications include determining how a change in monetary growth affects real gross domestic product (GDP) and gross national income (GNI).
Any combination of SUR and the distributed lag model that includes the lagged effects of responses, also known as simultaneous equation models. VARMAX modeling is an example (see Types of VAR Models).
The general equation for a multivariate, time series, regression model is
$${y}_{t}=a+{X}_{t}\cdot b+{\displaystyle \sum _{i=1}^{p}{A}_{i}{y}_{t-i}}+{\displaystyle \sum _{j=1}^{q}{B}_{j}{\epsilon}_{t-j}}+{\epsilon}_{t},$$
where, in particular,
X_{t} is an n-by-r design matrix.
Row j of X_{t} contains the observations of the regression variables that correspond to the period t observation of response series j.
Column k of X_{t} corresponds to the period t observations of regression variable k. (There are r regression variables composed from the exogenous series. For details, see Design Matrix Structure for Including Exogenous Data.)
X_{t} can contain lagged exogenous series.
b is an r-by-1 vector of regression coefficients corresponding to the r regression variables. The column entries of X_{t} share a common regression coefficient for all t. That is, the regression component of the response series (y_{t} = [y_{1t},y_{2t},...,y_{nt}]′) for period t is
$$\left[\begin{array}{c}X{(1,1)}_{t}{b}_{1}+\cdots +X{(1,r)}_{t}{b}_{r}\\ X{(2,1)}_{t}{b}_{1}+\cdots +X{(2,r)}_{t}{b}_{r}\\ \vdots \\ X{(n,1)}_{t}{b}_{1}+\cdots +X{(n,r)}_{t}{b}_{r}\end{array}\right].$$
a is an n-by-1 vector of intercepts corresponding to the n response series.
Overview. For maximum flexibility, construct a design matrix that linearly associates the exogenous series to each response series. It helps to think of the design matrix as a vector of T smaller, block design matrices. The rows of block design matrix t correspond to observation t of the response series, and the columns correspond to the regression coefficients of the regression variables.
vgxvarx estimates the regression component of multivariate time series models using the Statistics Toolbox™ function mvregress. Therefore, you must pass the design matrix as a T-by-1 cell vector, where cell t is the n-by-r numeric, block, design matrix at period t, n is the number of response series, and r is the number of regression variables in the design. That is, the structure of the entire design matrix is
$$\begin{array}{c}\text{Regressionvariables}{X}_{t}\\ \left\{\begin{array}{c}\left[\begin{array}{ccc}X{(1,1)}_{1}& \cdots & X{(1,r)}_{1}\\ \vdots & \ddots & \vdots \\ X{(n,1)}_{1}& \cdots & X{(n,r)}_{1}\end{array}\right]\\ \left[\begin{array}{ccc}X{(1,1)}_{2}& \cdots & X{(1,r)}_{2}\\ \vdots & \ddots & \vdots \\ X{(n,1)}_{2}& \cdots & X{(n,r)}_{2}\end{array}\right]\\ \vdots \\ \left[\begin{array}{ccc}X{(1,1)}_{T}& \cdots & X{(1,r)}_{T}\\ \vdots & \ddots & \vdots \\ X{(n,1)}_{T}& \cdots & X{(n,r)}_{T}\end{array}\right]\end{array}\right\}\end{array}$$
At each time t, the n-by-r matrix X_{t} multiplies the r-by-1 vector b, yielding an n-by-1 vector of linear combinations. This setup implies suggests that:
The number of regression variables might differ from the number of exogenous series. That is, you can associate different sets of exogenous series among response series.
Each block design matrix in the cell vector must have the same dimensionality. That is, the multivariate time series framework does not accommodate time-varying models. The state-space framework does accommodate time-varying, multivariate time series models. For details, see ssm.
vgxinfer, vgxpred, vgxproc, and vgxsim accommodate multiple response paths. You can associate a common design matrix for all response paths by passing in a cell vector of design matrices. You can also associate a different design matrix to each response path by passing in a T-by-M cell matrix of design matrices, where M is the number of response paths and cell (t,m) is an n-by-r numeric, design matrix at period t (denoted X_{t}^{(m)}). That is, the structure of the entire design matrix for all paths is
$$\begin{array}{c}\begin{array}{ccc}\text{Path}1\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}& \cdots \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}& \text{Path}M\end{array}\\ \left\{\begin{array}{ccc}\left[\begin{array}{ccc}X{(1,1)}_{1}^{(1)}& \cdots & X{(1,r)}_{1}^{(1)}\\ \vdots & \ddots & \vdots \\ X{(n,1)}_{1}^{(1)}& \cdots & X{(n,r)}_{1}^{(1)}\end{array}\right],& \cdots & ,\left[\begin{array}{ccc}X{(1,1)}_{1}^{(M)}& \cdots & X{(1,r)}_{1}^{(M)}\\ \vdots & \ddots & \vdots \\ X{(n,1)}_{1}^{(M)}& \cdots & X{(n,r)}_{1}^{(M)}\end{array}\right]\\ \left[\begin{array}{ccc}X{(1,1)}_{2}^{(1)}& \cdots & X{(1,r)}_{2}^{(1)}\\ \vdots & \ddots & \vdots \\ X{(n,1)}_{2}^{(1)}& \cdots & X{(n,r)}_{2}^{(1)}\end{array}\right],& \cdots & ,\left[\begin{array}{ccc}X{(1,1)}_{2}^{(M)}& \cdots & X{(1,r)}_{2}^{(M)}\\ \vdots & \ddots & \vdots \\ X{(n,1)}_{2}^{(M)}& \cdots & X{(n,r)}_{2}^{(M)}\end{array}\right]\\ \vdots & \ddots & \vdots \\ \left[\begin{array}{ccc}X{(1,1)}_{T}^{(1)}& \cdots & X{(1,r)}_{T}^{(1)}\\ \vdots & \ddots & \vdots \\ X{(n,1)}_{T}^{(1)}& \cdots & X{(n,r)}_{T}^{(1)}\end{array}\right],& \cdots & ,\left[\begin{array}{ccc}X{(1,1)}_{T}^{(M)}& \cdots & X{(1,r)}_{T}^{(M)}\\ \vdots & \ddots & \vdots \\ X{(n,1)}_{T}^{(M)}& \cdots & X{(n,r)}_{T}^{(M)}\end{array}\right]\end{array}\right\}\end{array}.$$
For more details on how to structure design matrices for mvregress, see Set Up Multivariate Regression Problems.
Examples of Design Matrix Structures.
Intervention model — Suppose a tariff is imposed over some time period. You suspect that this tariff affects GNP and three other economic time series. To determine the effects of the tariff, use an intervention model, where the response series are the four econometric series, and the exogenous, regression variables are indicator variables representing the presence of the tariff in the system. Here are two ways of including the exogenous tariffs.
Responses share a regression coefficient — Each block design matrix (or cell) consists of either ones(4,1) or zeros(4,1), where a 1 indicates that the tariff is in the system, and 0 otherwise. That is, at period t, a cell of the entire design matrix contains one of
$$\left[\begin{array}{c}1\\ 1\\ 1\\ 1\end{array}\right]\text{or}\left[\begin{array}{c}0\\ 0\\ 0\\ 0\end{array}\right].$$
Responses do not share regression coefficients — Each block matrix (or cell) consists of either eye(4) or zeros(4). That is, at period t, a cell of the entire design matrix contains one of
$$\left[\begin{array}{cccc}1& 0& 0& 0\\ 0& 1& 0& 0\\ 0& 0& 1& 0\\ 0& 0& 0& 1\end{array}\right]\text{or}\left[\begin{array}{cccc}0& 0& 0& 0\\ 0& 0& 0& 0\\ 0& 0& 0& 0\\ 0& 0& 0& 0\end{array}\right].$$
In this case, the sole exogenous, indicator variable expands to four regression variables. The advantage of the larger (replicated) formulation is that it allows for vgxvarx to estimate the influence of the tariff on each response series separately. The resulting estimated regression coefficient vector $$\widehat{b}$$ can have differing values for each component. The different values reflect the different direct influences of the tariff on each time series.
Once you have the entire design matrix (denoted Design), you must put each block design matrix that composes Design into the respective cells of a T-by-1 cell vector (or cell matrix for multiple paths). To do this, use mat2cell. Specify to break up Design into T, 4-by-size(Design,2) block design matrices using
DesignCell = mat2cell(Design,4*ones(T,1),size(Design,2))
DesignCell is the properly structured regression variables that you can now pass into vgxvarx to estimate the model parameters.
SUR that associates all exogenous series to each response series — If the columns of a X are exogenous series, then, to associate all exogenous series to each response,
Create the entire design matrix by expanding X using its Kronecker product with the n-by-n identity matrix, e.g., if there are four responses, then the entire design matrix is
Design = kron(X,eye(4));
Put each block design matrix into the cells of a T-by-1 cell vector using mat2cell. Each block matrix has four rows and size(Design,2) columns.
Linear trend — You can model linear trends in your data by including the exogenous matrix eye(n)*t in cell t of the entire design matrix.
Before you estimate a multivariate, time series, regression model using vgxvarx, specify the number of regression variables in the created model. (For details on specifying a multivariate time series model using vgxset, see Model Specification Structures). Recall from Design Matrix Structure for Including Exogenous Data that the number of regression variables in the model is the number of columns in each design matrix denoted r. You can indicate the number of regression variables several ways:
For a new model,
Specify the nX name-value pair argument as the number of regression variables when you create the model using vgxset.
Specify the bsolve name-value pair argument as the logical vector true(r,1)
vgxset creates a multivariate time series model object, and fills in the appropriate properties. In what follows, Mdl denotes a created multivariate time aeries model in the Workspace.
For a model in the Workspace, set either of the nX or bsolve properties to r or true(r,1), respectively, using dot notation, e.g., Mdl.nX = r.
You can also exclude a subset of regression coefficient from being estimated. For example, to exclude the first regression coefficient, set 'bsolve',[false(1);true(r-1,1)]. Be aware that if the model is new (i.e, Mdl.b = []), then the software sets any coefficient it doesn't estimate to 0. To fix a coefficient to a value:
Enter
Mdl.b = ones(r,1);
Specify values for the elements you want to hold fixed during estimation in the b property. For example, to specify that the first regression coefficient should be held at 2 during estimation, enter
Mdl.b(1) = 2;
Enter
Mdl.bsolve = [false(1);true(r-1,1)];
The software does not estimate regression intercepts (a) by default. To include a different regression intercept for each response series, specify 'Constant',true when you create the model using vgxset, or set the Constant property of a model in the Workspace to true using dot notation. Alternatively, you can specify 'asolve',true(n,1) or set the asolve property to true(n,1). To exclude a regression intercept from estimation, follow the same steps as for excluding a regression coefficient.
To estimate the regression coefficients, pass the model, response data, and the cell vector of design matrices (see Design Matrix Structure for Including Exogenous Data) to vgxvarx. For details on how vgxvarx works when it estimates regression coefficients, see How vgxvarx Works.
Be aware that the presence of exogenous series in a multivariate time series model might destabilized the fitted model.
This example shows how to prepare exogenous data for several seemingly unrelated regression (SUR) analyses. The response and exogenous series are random paths from a standard Gaussian distribution.
In seemingly unrelated regression (SUR), each response variable is a function of a subset of the exogenous series, but not of any endogenous variable. That is, for and , the model for response at period is
The indices of the regression coefficients and exogenous predictors indicate that:
You can associate each response with a different subset of exogenous predictors.
The response series might not share intercepts or regression coefficients.
SUR accommodates intra-period innovation heteroscedasticity and correlation, but inter-period innovation independence and homoscedasticity, i.e.,
Simulate Data from the True Model
Suppose that the true model is
where , are multivariate Gaussian random variables each having mean zero and jointly having covariance matrix
Suppose that the paths represent different econometric measurements, e.g. stock returns.
Simulate four exogenous predictor paths from the standard Gaussian distribution.
rng(1); % For reproducibility n = 3; % Number of response series nExo = 4; % Number of exogenous series T = 100; X = randn(100,nExo);
The multivariate time series analysis functions of Econometrics Toolbox™ require you to input the exogenous data in a T-by-1 cell vector. Cell of the cell vector is a design matrix indicating the linear relationship of the exogenous variables with each response series at period . Specifically, each design matrix in the cell array:
Has rows, each corresponding to a response series.
Has columns since, in this example, all exogenous variables are in the regression component of each response series.
To create the cell vector of design matrices for this case, first expand the exogenous predictor data by finding its Kronecker product with the -by- identity matrix.
ExpandX1 = kron(X,eye(n));
r1 = size(ExpandX1,2); % Number of regression variables
ExpandX1 is an -by- numeric matrix formed by multiplying each element of X to the -by- identity matrix, and then putting the product in the corresponding position of a -by-1 block matrix of -by- matrices.
Create the cell vector of design matrices by putting each consecutive -by- block matrices of ExpandX1 into the cells of a -by-1 cell vector. Verify that one of the cells contains the expected design matrix (e.g. the third cell)).
CellX1 = mat2cell(ExpandX1,n*ones(T,1)); CellX1{3} X(3,:)
ans = Columns 1 through 7 -0.7585 0 0 1.9302 0 0 1.8562 0 -0.7585 0 0 1.9302 0 0 0 0 -0.7585 0 0 1.9302 0 Columns 8 through 12 0 0 1.3411 0 0 1.8562 0 0 1.3411 0 0 1.8562 0 0 1.3411 ans = -0.7585 1.9302 1.8562 1.3411
In period 3, all observed predictors are associated with each response series.
Create a multivariate time series model object that characterizes the true model using vgxset.
aTrue = [1; -1; 0.5]; bTrue = [2; 4; -2; -1.5; 2.5; 0.5; 0.5; -1.75; -1.5; 0.75; -0.05; 0.7]; InnovCov = [1 0.5 -0.05; 0.5 1 0.25; -0.05 0.25 1]; TrueMdl = vgxset('n',n,'b',bTrue,'a',aTrue,'Q',InnovCov) Y = vgxsim(TrueMdl,100,CellX1);
TrueMdl = Model: 3-D VARMAX(0,0,12) with Additive Constant n: 3 nAR: 0 nMA: 0 nX: 12 a: [1 -1 0.5] additive constants b: [12x1] regression coefficients Q: [3x3] covariance matrix
SUR Using All Predictors for Each Response Series
Create a multivariate time series model suitable for SUR using vgxset. You must specify the number of response series ('n'), the number of regression variables ('nX'), and whether to include different regression intercepts for each response series ('Constant').
Mdl1 = vgxset('n',n,'nX',r1,'Constant',true)
Mdl1 = Model: 3-D VARMAX(0,0,12) with Additive Constant n: 3 nAR: 0 nMA: 0 nX: 12 a: [] b: [] Q: []
Mdl1 is a multivariate time series model object. Unlike TrueMdl, none of the coefficients, intercepts, and intra-period covariance matrix have values. Therefore, Mdl1 is suitable for estimation.
Estimate the regression coefficients using vgxvarx. Extract the residuals. Display the estimated model using vgxdisp
[EstMdl1,~,~,W] = vgxvarx(Mdl1,Y,CellX1); vgxdisp(EstMdl1)
Model : 3-D VARMAX(0,0,12) with Additive Constant a Constant: 0.978981 -1.06438 0.453232 b Regression Parameter: 1.76856 3.85757 -2.20089 -1.55085 2.44067 0.464144 0.69588 -1.71386 -1.6414 0.670357 -0.0564374 0.565809 Q Innovations Covariance: 1.38503 0.667301 -0.159136 0.667301 0.973123 0.216492 -0.159136 0.216492 0.993384
EstMdl is a multivariate time series model containing the estimated parameters. W is a -by- matrix of residuals. By default, vgxvarx models a full, intra-period innovations covariance matrix.
Alternatively, and in this case, you can use the backslash operator on X and Y. However, you must include a column of ones in X for the intercepts.
coeff = [ones(T,1) X]\Y
coeff = 0.9790 -1.0644 0.4532 1.7686 3.8576 -2.2009 -1.5508 2.4407 0.4641 0.6959 -1.7139 -1.6414 0.6704 -0.0564 0.5658
coeff is a nExo + 1-by- n matrix of estimated regression coefficients and intercepts. The estimated intercepts are in the first row, and the rest of the matrix contains the estimated regression coefficients
Compare all estimates to their true values.
fprintf('\n'); fprintf(' Intercepts \n'); fprintf(' True | vgxvarx | backslash\n'); fprintf('--------------------------------------\n'); for j = 1:n fprintf(' %8.4f | %8.4f | %8.4f\n',aTrue(j),EstMdl1.a(j),coeff(1,j)); end cB = coeff'; cB = cB(:); fprintf('\n'); fprintf(' Coefficients \n'); fprintf(' True | vgxvarx | backslash\n'); fprintf('--------------------------------------\n'); for j = 1:r1 fprintf(' %8.4f | %8.4f | %8.4f\n',bTrue(j),... EstMdl1.b(j),cB(n + j)); end fprintf('\n'); fprintf(' Innovations Covariance\n'); fprintf(' True | vgxvarx\n'); fprintf('----------------------------------------------------------\n'); for j = 1:n fprintf('%8.4f %8.4f %8.4f | %8.4f %8.4f %8.4f\n',... InnovCov(j,:),EstMdl1.Q(j,:)); end
Intercepts True | vgxvarx | backslash -------------------------------------- 1.0000 | 0.9790 | 0.9790 -1.0000 | -1.0644 | -1.0644 0.5000 | 0.4532 | 0.4532 Coefficients True | vgxvarx | backslash -------------------------------------- 2.0000 | 1.7686 | 1.7686 4.0000 | 3.8576 | 3.8576 -2.0000 | -2.2009 | -2.2009 -1.5000 | -1.5508 | -1.5508 2.5000 | 2.4407 | 2.4407 0.5000 | 0.4641 | 0.4641 0.5000 | 0.6959 | 0.6959 -1.7500 | -1.7139 | -1.7139 -1.5000 | -1.6414 | -1.6414 0.7500 | 0.6704 | 0.6704 -0.0500 | -0.0564 | -0.0564 0.7000 | 0.5658 | 0.5658 Innovations Covariance True | vgxvarx ---------------------------------------------------------- 1.0000 0.5000 -0.0500 | 1.3850 0.6673 -0.1591 0.5000 1.0000 0.2500 | 0.6673 0.9731 0.2165 -0.0500 0.2500 1.0000 | -0.1591 0.2165 0.9934
The estimates from implementing vgxvarx and the backslash operator are the same, and are fairly close to their corresponding true values.
One way to check the relationship strength between the predictors and responses is to compute the coefficient of determination (i.e., the fraction of variation explained by the predictors), which is
where is the estimated variance of residual series , and is the estimated variance of response series .
R2 = 1 - sum(diag(cov(W)))/sum(diag(cov(Y)))
R2 = 0.9118
The SUR model and predictor data explain approximately 91% of the variation in the response data.
SUR Using a Unique Predictor for Each Response Series
For each period , create block design matrices such that response series is linearly associated to predictor series , . Put the block design matrices in cells of a -by-1 cell vector in chronological order.
CellX2 = cell(T,1); for j = 1:T CellX2{j} = diag(X(j,1:n)); end r2 = size(CellX2{1},2);
Create a multivariate time series model by using vgxset and specifying the number of response series, the number of regression variables, and whether to include different regression intercepts for each response series.
Mdl2 = vgxset('n',n,'nX',r2,'Constant',true);
Estimate the regression coefficients using vgxvarx. Display the estimated parameters. Compute the coefficient of determination.
[EstMdl2,~,~,W2] = vgxvarx(Mdl2,Y,CellX2); vgxdisp(EstMdl2) R2 = 1 - sum(diag(cov(W2)))/sum(diag(cov(Y)))
Model : 3-D VARMAX(0,0,3) with Additive Constant a Constant: 1.07752 -1.43445 0.674376 b Regression Parameter: 1.01491 3.83837 -2.71834 Q Innovations Covariance: 4.96205 4.91571 -1.86546 4.91571 20.8263 -11.0945 -1.86546 -11.0945 7.75392 R2 = 0.1177
The model and predictors explain approximately 12% of the variation in the response series. This should not be surprising since the model is not the same as the response-generating process.
SUR Using a Shared Intercept for All Response Series
Create block design matrices such that each response series is linearly associated to all predictor series . Prepend the resulting design matrix with a vector of ones representing the common intercept.
ExpandX3 = [ones(T*n,1) kron(X,eye(n))]; r3 = size(ExpandX3,2);
Put the block design matrices into the cells of a -by-1 cell vector in chronological order.
CellX3 = mat2cell(ExpandX3,n*ones(T,1));
Create a multivariate time series model by using vgxset and specifying the number of response series and the number of regression variables. By default, vgxset excludes regression intercepts.
Mdl3 = vgxset('n',n,'nX',r3);
Estimate the regression coefficients using vgxvarx. Display the estimated parameters. Compute the coefficient of determination.
[EstMdl3,~,~,W3] = vgxvarx(Mdl3,Y,CellX3); vgxdisp(EstMdl3) a = EstMdl3.b(1) R2 = 1 - sum(diag(cov(W3)))/sum(diag(cov(Y)))
Model : 3-D VARMAX(0,0,13) with No Additive Constant b Regression Parameter: 0.388833 1.73468 3.94099 -2.20458 -1.56878 2.48483 0.462187 0.802394 -1.97614 -1.62978 0.63972 0.0190058 0.562466 Q Innovations Covariance: 1.72265 -0.164059 -0.122294 -0.164059 3.02031 0.12577 -0.122294 0.12577 0.997404 a = 0.3888 R2 = 0.9099
The shared, estimated regression intercept is 0.389, and the other coefficients are similar to the first SUR implementation. The model and predictors explain approximately 91% of the variation in the response series. This should not be surprising since the model almost the same as the response-generating process.
This example shows how to implement the capital asset pricing model (CAPM) using the Econometric Toolbox™ multivariate time series framework.
The CAPM model characterizes comovements between asset and market prices. Under this framework, individual asset returns are linearly associated with the return of the whole market (for details, see [1], [2], and [3]). That is, given the return series of all stocks in a market ( ) and the return of a riskless asset ( ), the CAPM model for return series ( ) is
for all assets in the market.
is an -by-1 vector of asset alphas that should be zero, and it is of interest to investigate assets whose asset alphas are significantly away from zero. is a -by-1 vector of asset betas that specify the degree of comovement between the asset being modeled and the market. An interpretation of element of is
If , then asset moves in the same direction and with the same volatility as the market, i.e., is positively correlated with the market .
If , then asset moves in the opposite direction, but with the same volatility as the market, i.e., is negatively correlated with the market.
If , then asset is uncorrelated with the market.
In general:
determines the direction that the asset is moving relative to the market as described in the previous bullets.
is the factor that determines how much more or less volatile asset is relative to the market. For example, if , then asset is 10 times as volatile as the market.
Load the CAPM data set included in the Financial Toolbox™.
load CAPMuniverse
varWithNaNs = Assets(any(isnan(Data),1))
dateRange = datestr([Dates(1) Dates(end)])
varWithNaNs = 'AMZN' 'GOOG' dateRange = 03-Jan-2000 07-Nov-2005
The variable Data is a 1471-by-14 numeric matrix containing the daily returns of a set of 12 stocks (columns 1 through 12), one rickless asset (column 13), and the return of the whole market (column 14). The returns were measured from 03Jan2000 through 07Nov2005. AMZN and GOOG had their IPO during sampling, and so they have missing values.
Assign variables for the response and predictor series.
Y = bsxfun(@minus,Data(:,1:12),Data(:,14)); X = Data(:,13) - Data(:,14); [T,n] = size(Y)
T = 1471 n = 12
Y is a 1471-by-12 matrix of the returns adjusted by the riskless return. X is a 1471-by-1 vector of the market return adjusted by the riskless return.
Create block design matrices for each period, and put them into cells of a -by-1 cell vector. You can specify that each response series has its own intercept when you create the multivariate time series model object. Therefore, do not consider the intercept when you create the block design matrices.
Design = kron(X,eye(n)); CellX = mat2cell(Design,n*ones(T,1)); nX = size(Design,2);
Create the Multivariate Time Series Model
Create a multivariate time series model that characterizes the CAPM model. You must specify the number of response series, whether to give each response series equation an intercept, and the number of regression variables.
Mdl = vgxset('n',n,'Constant',true,'nX',nX);
Mdl is a multivariate time series model object that characterizes the desired CAPM model.
Estimate the Multivariate Time Series Model
Pass the CAPM model specification (Mdl), the response series (Y), and the cell vector of block design matrices (CellX) to vgxvarx. Request to return the estimated multivariate time series model and the estimated coefficient standard errors. vgxvarx maximizes the likelihood using the expectation-conditional-maximization (ECM) algorithm. ECM accommodates missing response values directly (i.e., without imputation), but at the cost of computation time.
[EstMdl,EstCoeffSEMdl] = vgxvarx(Mdl,Y,CellX);
EstMdl and EstCoeffSEMdl have the same structure as Mdl, but EstMdl contains the parameter estimates and EstCoeffSEMdl contains the estimated standard errors of the parameter estimates. EstCoeffSEMdl:
Contains the biased maximum likelihood standard errors.
Does not include the estimated standard errors of the intra-period covariances. To include the standard errors of the intra-period covariances, specify the name-value pair 'StdErrType','all' in vgxvarx.
Display the regression estimates, their standard errors, and their t statistics. By default, the software estimates, stores, and displays standard errors from maximum likelihood. Specify to use the unbiased least squares standard errors.
dispMdl = vgxset(EstMdl,'Q',[]) % Suppress printing covariance estimates vgxdisp(dispMdl,EstCoeffSEMdl,'DoFAdj',true)
dispMdl = Model: 12-D VARMAX(0,0,12) with Additive Constant n: 12 nAR: 0 nMA: 0 nX: 12 a: [12x1] additive constants asolve: [12x1 logical] additive constant indicators b: [12x1] regression coefficients bsolve: [12x1 logical] regression coefficient indicators Q: [] Qsolve: [12x12 logical] covariance matrix indicators Model : 12-D VARMAX(0,0,12) with Additive Constant Standard errors with DoF adjustment (least-squares) Parameter Value Std. Error t-Statistic -------------- -------------- -------------- -------------- a(1) 0.00116454 0.000869904 1.3387 a(2) 0.000715822 0.00121752 0.587932 a(3) -0.000223753 0.000806185 -0.277546 a(4) -2.44513e-05 0.000689289 -0.0354732 a(5) 0.00140469 0.00101676 1.38153 a(6) 0.00412219 0.000910392 4.52793 a(7) 0.000116143 0.00068952 0.168441 a(8) -1.37697e-05 0.000456934 -0.0301351 a(9) 0.000110279 0.000710953 0.155114 a(10) -0.000244727 0.000521036 -0.469693 a(11) 3.2336e-05 0.000861501 0.0375346 a(12) 0.000128267 0.00103773 0.123603 b(1) 1.22939 0.0741875 16.5714 b(2) 1.36728 0.103833 13.1681 b(3) 1.5653 0.0687534 22.7669 b(4) 1.25942 0.0587843 21.4245 b(5) 1.34406 0.0867116 15.5003 b(6) 0.617321 0.0776404 7.95103 b(7) 1.37454 0.0588039 23.375 b(8) 1.08069 0.0389684 27.7326 b(9) 1.60024 0.0606318 26.3928 b(10) 1.1765 0.0444352 26.4767 b(11) 1.50103 0.0734709 20.4303 b(12) 1.65432 0.0885002 18.6928
To determine whether the parameters are significantly away from zero, suppose that a t statistic of 3 or more indicates significance.
Response series 6 has a significant asset alpha.
sigASymbol = Assets(6)
sigASymbol = 'GOOG'
As a result, GOOG has exploitable economic properties.
All asset betas are greater than 3. This indicates that all assets are significantly correlated with the market.
However, GOOG has an asset beta of approximately 0.62, whereas all other asset betas are greater than 1. This indicates that the magnitude of the volatility of GOOG is approximately 62% of the market volatility. The reason for this is that GOOG steadily and almost consistently appreciated in value while the market experienced volatile horizontal movements.
For more details and an alternative analysis, see Capital Asset Pricing Model with Missing Data.
This example shows how to estimate a multivariate time series model that contains lagged endogenous and exogenous variables, and how to simulate responses. The response series are the quarterly:
Changes in real gross domestic product (rGDP) rates ( )
Real money supply (rM1SL) rates ( )
Short-term interest rates (i.e., three-month treasury bill yield, )
from March, 1959 through March, 2009 . The exogenous series is the quarterly changes in the unemployment rates ( ).
Suppose that a model for the responses is this VARX(4,3) model
Simulate Out-Of-Sample Response Paths Using the Same Exogenous Data per Path
Simulate Out-Of-Sample Response Paths Using Random Exogenous Data
Load the U.S. macroeconomic data set. Flag the series and their periods that contain missing values (indicated by NaN values).
load Data_USEconModel varNaN = any(ismissing(DataTable),1); % Variables containing NaN values seriesWithNaNs = series(varNaN)
seriesWithNaNs = Columns 1 through 3 '(FEDFUNDS) Effec...' '(GS10) Ten-year ...' '(M1SL) M1 money ...' Columns 4 through 5 '(M2SL) M2 money ...' '(UNRATE) Unemplo...'
In this data set, the variables that contain missing values entered the sample later than the other variables. There are no missing values after sampling started for a particular variable.
vgxvarx accommodates missing values for responses, but not for regression variables. Flag all periods corresponding to a missing regression variable value.
idx = ~isnan(DataTable.UNRATE);
For the rest of the example, consider only those values that of the series indicated by a true in idx.
Compute rGDP and rM1SL, and the growth rates of rGDP, rM1SL, short-term interest rates, and the unemployment rate. Description contains a description of the data and the variable names. Reserve the last three years of data to investigate the out-of-sample performance of the estimated model.
rGDP = DataTable.GDP(idx)./(DataTable.GDPDEF(idx)/100); rM1SL = DataTable.M1SL(idx)./(DataTable.GDPDEF(idx)/100); dLRGDP = diff(log(rGDP)); % rGDP growth rate dLRM1SL = diff(log(rM1SL)); % rM1SL growth rate d3MTB = diff(DataTable.TB3MS(idx)); % Change in short-term interest rate (3MTB) dUNRATE = diff(DataTable.UNRATE(idx)); % Change in unemployment rate T = numel(d3MTB); % Total sample size oosT = 12; % Out-of-sample size estT = T - oosT; % Estimation sample size estIdx = 1:estT; % Estimation sample indices oosIdx = (T - 11):T; % Out-of-sample indices dates = dates((end - T + 1):end); EstY = [dLRGDP(estIdx) dLRM1SL(estIdx) d3MTB(estIdx)]; % In-sample responses estX = dUNRATE(estIdx); % In-sample exogenous data n = size(EstY,2); OOSY = [dLRGDP(oosIdx) dLRM1SL(oosIdx) d3MTB(oosIdx)]; % Out-of-sample responses oosX = dUNRATE(oosIdx); % Out-of-sample exogenous data
Create an estT-by-1 cell vector of block design matrices that associate the predictor series with each response such that the responses do not share a coefficient.
EstExpandX = kron(estX,eye(n)); EstCellX = mat2cell(EstExpandX,n*ones(estT,1)); nX = size(EstExpandX,2);
Specify a multivariate time series model object that characterizes the VARX(4,3) model using vgxset.
Mdl = vgxset('n',n,'nAR',4,'nX',nX,'Constant',true);
Estimate the parameters of the VARX(4,3) model using vgxvarx. Display the parameter estimates.
EstMdl = vgxvarx(Mdl,EstY,EstCellX); vgxdisp(EstMdl)
Model : 3-D VARMAX(4,0,3) with Additive Constant Conditional mean is AR-stable and is MA-invertible a Constant: 0.00811792 0.000709263 0.0465824 b Regression Parameter: -0.0162116 -0.00163933 -1.50115 AR(1) Autoregression Matrix: -0.0375772 -0.0133236 0.00108218 -0.00519697 0.177963 -0.00501432 -0.873992 -6.89049 -0.165888 AR(2) Autoregression Matrix: 0.0753033 0.0775643 -0.001049 0.00282857 0.29064 -0.00159574 4.00724 0.465046 -0.221024 AR(3) Autoregression Matrix: -0.0927688 -0.0240239 -0.000549057 0.0627837 0.0686179 -0.00212185 -7.52241 10.247 0.227121 AR(4) Autoregression Matrix: 0.0646951 -0.0792765 -0.000176166 0.0276958 0.00922231 -0.000183861 1.38523 -11.8774 0.0518154 Q Innovations Covariance: 3.57524e-05 7.05807e-06 -4.23542e-06 7.05807e-06 9.67992e-05 -0.00188786 -4.23542e-06 -0.00188786 0.777151
EstMdl is a multivariate time series model object containing the estimated parameters.
Simulate Out-Of-Sample Response Paths Using the Same Exogenous Data per Path
Simulate 1000, 3 year response series paths from the estimated model assuming that the exogenous unemployment rate is a fixed series. Since the model contains 4 lags per endogenous variable, specify the last 4 observations in the estimation sample as presample data.
OOSExpandX = kron(oosX,eye(n));
OOSCellX = mat2cell(OOSExpandX,n*ones(oosT,1));
numPaths = 1000;
Y0 = EstY((end-3):end,:);
rng(1); % For reproducibility
YSim = vgxsim(EstMdl,oosT,OOSCellX,Y0,[],numPaths);
YSim is a 12-by-3-by-1000 numeric array of simulated responses. The rows of YSim correspond to out-of-sample periods, the columns correspond to the response series, and the leaves correspond to paths.
Plot the response data and the simulated responses. Identify the 5%, 25%, 75% and 95% percentiles, and the mean and median of the simulated series at each out-of-sample period.
YSimBar = mean(YSim,3); YSimQrtl = quantile(YSim,[0.05 0.25 0.5 0.75 0.95],3); RepDates = repmat(dates(oosIdx),1,1000); respNames = {'dLRGDP' 'dLRM1SL' 'd3MTB'}; figure; for j = 1:n; subplot(3,1,j); h1 = plot(dates(oosIdx),squeeze(YSim(:,j,:)),'Color',0.75*ones(3,1)); hold on; h2 = plot(dates(oosIdx),YSimBar(:,j),'.-k','LineWidth',2); h3 = plot(dates(oosIdx),squeeze(YSimQrtl(:,j,:)),':r','LineWidth',1.5); h4 = plot(dates((end - 30):end),[EstY((end - 18):end,j);OOSY(:,j)],... 'b','LineWidth',2); title(sprintf('%s',respNames{j})); datetick; axis tight; hold off; end legend([h1(1) h2(1) h3(1) h4],{'Simulated Series','Simulation Mean',... 'Simulation Quartile','Data'},'Location',[0.4 0.1 0.01 0.01],... 'FontSize',8);
Simulate Out-Of-Sample Response Paths Using Random Exogenous Data
Suppose that the change in the unemployment rate is an AR(4) model, and fit the model to the estimation sample data.
MdlUNRATE = arima('ARLags',1:4); EstMdlUNRATE = estimate(MdlUNRATE,estX,'Display','off');
EstMdlUNRATE is an arima model object containing the parameter estimates.
Simulate 1000, 3 year paths from the estimated AR(4) model for the change in unemployment rate. Since the model contains 4 lags, specify the last 4 observations in the estimation sample as presample data.
XSim = simulate(EstMdlUNRATE,oosT,'Y0',estX(end-3:end),... 'NumPaths',numPaths);
XSim is a 12-by-1000 numeric matrix of simulated exogenous paths. The rows correspond to periods and the columns correspond to paths.
Create a cell matrix of block design matrices to organize the exogenous data, where each column corresponds to a path.
ExpandXSim = kron(XSim,eye(n)); size(ExpandXSim) CellXSim = mat2cell(ExpandXSim,n*ones(oosT,1),n*ones(1,numPaths)); size(CellXSim) CellXSim{1,1}
ans = 36 3000 ans = 12 1000 ans = 0.7901 0 0 0 0.7901 0 0 0 0.7901
ExpandXSim is a 36-by-3000 numeric matrix, and CellXSim is a 12-by-1000 cell matrix of mutually exclusive, neighboring, 3-by-3 block matrices in ExpandXSim.
Simulate 3 years of 1000 future response series paths from the estimated model using the simulated exogenous data. Since the model contains 4 lags per endogenous variable, specify the last 4 observations in the estimation sample as presample data.
YSimRX = vgxsim(EstMdl,oosT,CellXSim,Y0,[],numPaths);
YSimRX is a 12-by-3-by-1000 numeric array of simulated responses.
Plot the response data and the simulated responses. Identify the 5%, 25%, 75% and 95% percentiles, and the mean and median of the simulated series at each out-of-sample period.
YSimBarRX = mean(YSimRX,3); YSimQrtlRX = quantile(YSimRX,[0.05 0.25 0.5 0.75 0.95],3); figure; for j = 1:n; subplot(3,1,j); h1 = plot(dates(oosIdx),squeeze(YSimRX(:,j,:)),'Color',0.75*ones(3,1)); hold on; h2 = plot(dates(oosIdx),YSimBarRX(:,j),'.-k','LineWidth',2); h3 = plot(dates(oosIdx),squeeze(YSimQrtlRX(:,j,:)),':r','LineWidth',1.5); h4 = plot(dates((end - 30):end),[EstY((end - 18):end,j);OOSY(:,j)],... 'b','LineWidth',2); title(sprintf('%s with Simulated Unemployment Rate',respNames{j})); datetick; axis tight; hold off; end legend([h1(1) h2(1) h3(1) h4],{'Simulated Series','Simulation Mean',... 'Simulation Quartile','Data'},'Location',[0.4 0.1 0.01 0.01],... 'FontSize',8)
This section contains an example of the workflow described in Building VAR Models. The example uses three time series: GDP, M1 money supply, and the 3-month T-bill rate. The example shows:
Loading and transforming the data for stationarity
Partitioning the transformed data into presample, estimation, and forecast intervals to support a backtesting experiment
Making several models
Fitting the models to the data
Deciding which of the models is best
Making forecasts based on the best model
Loading Data. The file Data_USEconModel ships with Econometrics Toolbox software. The file contains time series from the St. Louis Federal Reserve Economics Database. This example uses three of the time series: GDP, M1 money supply (M1SL), and 3-month T-bill rate (TB3MS). Load the data into a time series matrix Y as follows:
load Data_USEconModel
gdp = DataTable.GDP;
m1 = DataTable.M1SL;
tb3 = DataTable.TB3MS;
Y = [gdp,m1,tb3];
Transforming Data for Stationarity. Plot the data to look for trends:
figure subplot(3,1,1) plot(dates,Y(:,1),'r'); title('GDP') datetick('x') grid on subplot(3,1,2); plot(dates,Y(:,2),'b'); title('M1') datetick('x') grid on subplot(3,1,3); plot(dates, Y(:,3), 'k') title('3-mo T-bill') datetick('x') grid on hold off
Not surprisingly, both the GDP and M1 data appear to grow, while the T-bill returns show no long-term growth. To counter the trends in GDP and M1, take a difference of the logarithms of the data. Taking a difference shortens the time series, as described in Transforming Data for Stationarity. Therefore, truncate the T-bill series and date series X so that the Y data matrix has the same number of rows for each column:
Y = [diff(log(Y(:,1:2))), Y(2:end,3)]; % Transformed data X = dates(2:end); figure subplot(3,1,1) plot(X,Y(:,1),'r'); title('GDP') datetick('x') grid on subplot(3,1,2); plot(X,Y(:,2),'b'); title('M1') datetick('x') grid on subplot(3,1,3); plot(X, Y(:,3),'k'), title('3-mo T-bill') datetick('x') grid on
You see that the scale of the first two columns is about 100 times smaller than the third. Multiply the first two columns by 100 so that the time series are all roughly on the same scale. This scaling makes it easy to plot all the series on the same plot. More importantly, this type of scaling makes optimizations more numerically stable (for example, maximizing loglikelihoods).
Y(:,1:2) = 100*Y(:,1:2); figure plot(X,Y(:,1),'r'); hold on plot(X,Y(:,2),'b'); datetick('x') grid on plot(X,Y(:,3),'k'); legend('GDP','M1','3-mo T-bill'); hold off
Selecting Models. You can choose many different models for the data. This example rather arbitrarily chooses four models:
VAR(2) with diagonal autoregressive and covariance matrices
VAR(2) with full autoregressive and covariance matrices
VAR(4) with diagonal autoregressive and covariance matrices
VAR(4) with full autoregressive and covariance matrices
Make the series the same length, and transform them to be stationary and on a similar scale.
dGDP = 100*diff(log(gdp(49:end))); dM1 = 100*diff(log(m1(49:end))); dT3 = diff(tb3(49:end)); Y = [dGDP dM1 dT3];
Create the four models as follows:
dt = logical(eye(3)); VAR2diag = vgxset('ARsolve',repmat({dt},2,1),... 'asolve',true(3,1),'Series',{'GDP','M1','3-mo T-bill'}); VAR2full = vgxset(VAR2diag,'ARsolve',[]); VAR4diag = vgxset(VAR2diag,'nAR',4,'ARsolve',repmat({dt},4,1)); VAR4full = vgxset(VAR2full,'nAR',4);
The matrix dt is a diagonal logical matrix. dt specifies that both the autoregressive matrices for VAR2diag and VAR4diag are diagonal. In contrast, the specifications for VAR2full and VAR4full have empty matrices instead of dt. Therefore, vgxvarx fits the defaults, which are full matrices for autoregressive and correlation matrices.
Choosing Presample, Estimation, and Forecast Periods. To assess the quality of the models, divide the response data Y into three periods: presample, estimation, and forecast. Fit the models to the estimation data, using the presample period to provide lagged data. Compare the predictions of the fitted models to the forecast data. The estimation period is in-sample, and the forecast period is out-of-sample (also known as backtesting).
For the two VAR(4) models, the presample period is the first four rows of Y. Use the same presample period for the VAR(2) models so that all the models are fit to the same data. This is necessary for model fit comparisons. For both models, the forecast period is the final 10% of the rows of Y. The estimation period for the models goes from row 5 to the 90% row. The code for defining these data periods follows:
Ypre = Y(1:4,:); T = ceil(.9*size(Y,1)); Yest = Y(5:T,:); YF = Y((T+1):end,:); TF = size(YF,1);
Fitting with vgxvarx. Now that the models and time series exist, you can easily fit the models to the data:
[EstSpec1,EstStdErrors1,LLF1,W1] = ... vgxvarx(VAR2diag,Yest,[],Ypre,'CovarType','Diagonal'); [EstSpec2,EstStdErrors2,LLF2,W2] = ... vgxvarx(VAR2full,Yest,[],Ypre); [EstSpec3,EstStdErrors3,LLF3,W3] = ... vgxvarx(VAR4diag,Yest,[],Ypre,'CovarType','Diagonal'); [EstSpec4,EstStdErrors4,LLF4,W4] = ... vgxvarx(VAR4full,Yest,[],Ypre);
The EstSpec structures are the fitted models.
The EstStdErrors structures contain the standard errors of the fitted models.
The LLF are the loglikelihoods of the fitted models. Use these to help select the best model, as described in Checking Model Adequacy.
The W are the estimated innovations (residuals) processes, the same size as Yest.
The specification for EstSpec1 and EstSpec3 includes diagonal covariance matrices.
Checking Stability. You can check whether the estimated models are stable and invertible with the vgxqual function. (There are no MA terms in these models, so the models are necessarily invertible.) The test shows that all the estimated models are stable:
[isStable1,isInvertible1] = vgxqual(EstSpec1); [isStable2,isInvertible2] = vgxqual(EstSpec2); [isStable3,isInvertible3] = vgxqual(EstSpec3); [isStable4,isInvertible4] = vgxqual(EstSpec4); [isStable1,isStable2,isStable3,isStable4]
ans = 1 1 1 1
You can also look at the estimated specification structures. Each contains a line stating whether the model is stable:
EstSpec4
EstSpec4 = Model: 3-D VAR(4) with Additive Constant Series: {'GDP' 'M1' '3-mo T-bill'} n: 3 nAR: 4 nMA: 0 nX: 0 a: [0.524224 0.106746 -0.671714] additive constants asolve: [1 1 1 logical] additive constant indicators AR: {4x1 cell} stable autoregressive process ARsolve: {4x1 cell of logicals} autoregressive lag indicators Q: [3x3] covariance matrix Qsolve: [3x3 logical] covariance matrix indicators
Likelihood Ratio Tests. You can compare the restricted (diagonal) AR models to their unrestricted (full) counterparts using the lratiotest function. The test rejects or fails to reject the hypothesis that the restricted models are adequate, with a default 5% tolerance. This is an in-sample test. To perform the test:
Count the parameters in the models using the vgxcount function:
[n1,n1p] = vgxcount(EstSpec1); [n2,n2p] = vgxcount(EstSpec2); [n3,n3p] = vgxcount(EstSpec3); [n4,n4p] = vgxcount(EstSpec4);
Compute the likelihood ratio tests:
reject1 = lratiotest(LLF2,LLF1,n2p - n1p) reject3 = lratiotest(LLF4,LLF3,n4p - n3p) reject4 = lratiotest(LLF4,LLF2,n4p - n2p)
reject1 = 1 reject3 = 1 reject4 = 0
The 1 results indicate that the likelihood ratio test rejected both the restricted models, AR(1) and AR(3), in favor of the corresponding unrestricted models. Therefore, based on this test, the unrestricted AR(2) and AR(4) models are preferable. However, the test does not reject the unrestricted AR(2) model in favor of the unrestricted AR(4) model. (This test regards the AR(2) model as an AR(4) model with restrictions that the autoregression matrices AR(3) and AR(4) are 0.) Therefore, it seems that the unrestricted AR(2) model is best among the models fit.
Akaike Information Criterion. To find the best model among a set, minimize the Akaike information criterion. This is an in-sample calculation. Here is how to calculate the criterion for the four models:
AIC = aicbic([LLF1 LLF2 LLF3 LLF4],[n1p n2p n3p n4p])
AIC = 1.0e+03 * 1.5129 1.4462 1.5122 1.4628
The best model according to this criterion is the unrestricted AR(2) model. Notice, too, that the unrestricted AR(4) model has lower Akaike information than either of the restricted models. Based on this criterion, the unrestricted AR(2) model is best, with the unrestricted AR(4) model coming next in preference.
Comparing Forecasts with Forecast Period Data. To compare the predictions of the four models against the forecast data YF, use the vgxpred function. This function returns both a prediction of the mean time series, and an error covariance matrix that gives confidence intervals about the means. This is an out-of-sample calculation.
[FY1,FYCov1] = vgxpred(EstSpec1,TF,[],Yest); [FY2,FYCov2] = vgxpred(EstSpec2,TF,[],Yest); [FY3,FYCov3] = vgxpred(EstSpec3,TF,[],Yest); [FY4,FYCov4] = vgxpred(EstSpec4,TF,[],Yest);
A plot shows the predictions in the shaded region to the right:
figure vgxplot(EstSpec2,Yest,FY2,FYCov2)
It is now straightforward to calculate the sum of squares error between the predictions and the data, YF:
error1 = YF - FY1; error2 = YF - FY2; error3 = YF - FY3; error4 = YF - FY4; SSerror1 = error1(:)' * error1(:); SSerror2 = error2(:)' * error2(:); SSerror3 = error3(:)' * error3(:); SSerror4 = error4(:)' * error4(:); figure bar([SSerror1 SSerror2 SSerror3 SSerror4],.5) ylabel('Sum of squared errors') set(gca,'XTickLabel',... {'AR2 diag' 'AR2 full' 'AR4 diag' 'AR4 full'}) title('Sum of Squared Forecast Errors')
The predictive performance of the four models is similar.
The full AR(2) model seems to be the best and most parsimonious fit. Its model parameters are:
vgxdisp(EstSpec2)
Model : 3-D VAR(2) with Additive Constant Conditional mean is AR-stable and is MA-invertible Series : GDP Series : M1 Series : 3-mo T-bill a Constant: 0.687401 0.3856 -0.915879 AR(1) Autoregression Matrix: 0.272374 -0.0162214 0.0928186 0.0563884 0.240527 -0.389905 0.280759 -0.0712716 -0.32747 AR(2) Autoregression Matrix: 0.242554 0.140464 -0.177957 0.00130726 0.380042 -0.0484981 0.260414 0.024308 -0.43541 Q Innovations Covariance: 0.632182 0.105925 0.216806 0.105925 0.991607 -0.155881 0.216806 -0.155881 1.00082
This example shows two ways to make predictions or forecasts based on the EstSpec4 fitted model:
Running vgxpred based on the last few rows of YF.
Simulating several time series with the vgxsim function.
In both cases, transform the forecasts so they are directly comparable to the original time series.
Forecasting with vgxpred. This example shows predictions 10 steps into the future.
Generate the prediction time series from the fitted model beginning at the latest times:
[ypred,ycov] = vgxpred(EstSpec2,10,[],YF);
Transform the predictions by undoing the scaling and differencing applied to the original data. Make sure to insert the last observation at the beginning of the time series before using cumsum to undo the differencing. And, since differencing occurred after taking logarithms, insert the logarithm before using cumsum:
yfirst = [gdp,m1,tb3]; yfirst = yfirst(49:end,:); % Remove NaNs dates = dates(49:end); endpt = yfirst(end,:); endpt(1:2) = log(endpt(1:2)); ypred(:,1:2) = ypred(:,1:2)/100; % Rescale percentage ypred = [endpt; ypred]; % Prepare for cumsum ypred(:,1:3) = cumsum(ypred(:,1:3)); ypred(:,1:2) = exp(ypred(:,1:2)); lastime = dates(end); timess = lastime:91:lastime+910; % Insert forecast horizon figure subplot(3,1,1) plot(timess,ypred(:,1),':r') hold on plot(dates,yfirst(:,1),'k') datetick('x') grid on title('GDP') subplot(3,1,2); plot(timess,ypred(:,2),':r') hold on plot(dates,yfirst(:,2),'k') datetick('x') grid on title('M1') subplot(3,1,3); plot(timess,ypred(:,3),':r') hold on plot(dates,yfirst(:,3),'k') datetick('x') grid on title('3-mo T-bill') hold off
The plot shows the extrapolations in dotted red, and the original data series in solid black.
Look at the last few years in this plot to get a sense of how the predictions relate to the latest data points:
ylast = yfirst(170:end,:); timeslast = dates(170:end); figure subplot(3,1,1) plot(timess,ypred(:,1),'--r') hold on plot(timeslast,ylast(:,1),'k') datetick('x') grid on title('GDP') subplot(3,1,2); plot(timess,ypred(:,2),'--r') hold on plot(timeslast,ylast(:,2),'k') datetick('x') grid on title('M1') subplot(3,1,3); plot(timess,ypred(:,3),'--r') hold on plot(timeslast,ylast(:,3),'k') datetick('x') grid on title('3-mo T-bill') hold off
The forecast shows increasing GDP, little growth in M1, and a slight decline in the interest rate. However, the forecast has no error bars. For a forecast with errors, see Forecasting with vgxsim.
Forecasting with vgxsim. This example shows forecasting 10 steps into the future, with a simulation replicated 2000 times, and generates the means and standard deviations.
Simulate a time series from the fitted model beginning at the latest times:
rng(1); % For reproducibility
ysim = vgxsim(EstSpec2,10,[],YF,[],2000);
Transform the predictions by undoing the scaling and differencing applied to the original data. Make sure to insert the last observation at the beginning of the time series before using cumsum to undo the differencing. And, since differencing occurred after taking logarithms, insert the logarithm before using cumsum:
yfirst = [gdp,m1,tb3]; endpt = yfirst(end,:); endpt(1:2) = log(endpt(1:2)); ysim(:,1:2,:) = ysim(:,1:2,:)/100; ysim = [repmat(endpt,[1,1,2000]);ysim]; ysim(:,1:3,:) = cumsum(ysim(:,1:3,:)); ysim(:,1:2,:) = exp(ysim(:,1:2,:));
Compute the mean and standard deviation of each series, and plot the results. The plot has the mean in black, with ±1 standard deviation in red.
ymean = mean(ysim,3); ystd = std(ysim,0,3); figure subplot(3,1,1) plot(timess,ymean(:,1),'k') datetick('x') grid on hold on plot(timess,ymean(:,1)+ystd(:,1),'--r') plot(timess,ymean(:,1)-ystd(:,1),'--r') title('GDP') subplot(3,1,2); plot(timess,ymean(:,2),'k') hold on datetick('x') grid on plot(timess,ymean(:,2)+ystd(:,2),'--r') plot(timess,ymean(:,2)-ystd(:,2),'--r') title('M1') subplot(3,1,3); plot(timess,ymean(:,3),'k') hold on datetick('x') grid on plot(timess,ymean(:,3)+ystd(:,3),'--r') plot(timess,ymean(:,3)-ystd(:,3),'--r') title('3-mo T-bill') hold off
The series show increasing growth in GDP, moderate to little growth in M1, and uncertainty about the direction of T-bill rates.
[1] Sharpe, W. F. "Capital Asset Prices: A Theory of Market Equilibrium under Conditions of Risk." Journal of Finance.Vol. 19, 1964, pp. 425–442.
[2] Linter, J. "The Valuation of Risk Assets and the Selection of Risky Investments in Stocks." Review of Economics and Statistics. Vol. 14, 1965, pp. 13–37.
[3] Jarrow, A. Finance Theory. Prentice-Hall, Inc., 1988.
vgxinfer | vgxplot | vgxpred | vgxproc | vgxset | vgxvarx