# Implement Seemingly Unrelated Regression Analyses

This example will be removed in a future release.

## Contents

This example shows how to prepare exogenous data for several seemingly unrelated regression (SUR) analyses. The response and exogenous series are random paths from a standard Gaussian distribution.

In seemingly unrelated regression (SUR), each response variable is a function of a subset of the exogenous series, but not of any endogenous variable. That is, for and , the model for response at period is

The indices of the regression coefficients and exogenous predictors indicate that:

- You can associate each response with a different subset of exogenous predictors.
- The response series might not share intercepts or regression coefficients.

SUR accommodates intra-period innovation heteroscedasticity and correlation, but inter-period innovation independence and homoscedasticity, i.e.,

## Simulate Data from the True Model

Suppose that the true model is

where , are multivariate Gaussian random variables each having mean zero and jointly having covariance matrix

Suppose that the paths represent different econometric measurements, e.g. stock returns.

Simulate four exogenous predictor paths from the standard Gaussian distribution.

rng(1); % For reproducibility n = 3; % Number of response series nExo = 4; % Number of exogenous series T = 100; X = randn(100,nExo);

The multivariate time series analysis functions of Econometrics Toolbox™ require you to input the exogenous data in a `T`-by-1 cell vector. Cell of the cell vector is a design matrix indicating the linear relationship of the exogenous variables with each response series at period . Specifically, each design matrix in the cell array:

- Has rows, each corresponding to a response series.
- Has columns since, in this example, all exogenous variables are in the regression component of each response series.

To create the cell vector of design matrices for this case, first expand the exogenous predictor data by finding its Kronecker product with the -by- identity matrix.

```
ExpandX1 = kron(X,eye(n));
r1 = size(ExpandX1,2); % Number of regression variables
```

`ExpandX1` is an -by- numeric matrix formed by multiplying each element of `X` to the -by- identity matrix, and then putting the product in the corresponding position of a -by-1 block matrix of -by- matrices.

Create the cell vector of design matrices by putting each consecutive -by- block matrices of `ExpandX1` into the cells of a -by-1 cell vector. Verify that one of the cells contains the expected design matrix (e.g. the third cell)).

CellX1 = mat2cell(ExpandX1,n*ones(T,1)); CellX1{3} X(3,:)

ans = Columns 1 through 7 -0.7585 0 0 1.9302 0 0 1.8562 0 -0.7585 0 0 1.9302 0 0 0 0 -0.7585 0 0 1.9302 0 Columns 8 through 12 0 0 1.3411 0 0 1.8562 0 0 1.3411 0 0 1.8562 0 0 1.3411 ans = -0.7585 1.9302 1.8562 1.3411

In period 3, all observed predictors are associated with each response series.

Create a multivariate time series model object that characterizes the true model using vgxset.

aTrue = [1; -1; 0.5]; bTrue = [2; 4; -2; -1.5; 2.5; 0.5; 0.5; -1.75; -1.5; 0.75; -0.05; 0.7]; InnovCov = [1 0.5 -0.05; 0.5 1 0.25; -0.05 0.25 1]; TrueMdl = vgxset('n',n,'b',bTrue,'a',aTrue,'Q',InnovCov) Y = vgxsim(TrueMdl,100,CellX1);

Warning: VGXSET will be removed in a future release. Use VARM model instead. TrueMdl = Model: 3-D VARMAX(0,0,12) with Additive Constant n: 3 nAR: 0 nMA: 0 nX: 12 a: [1 -1 0.5] additive constants b: [12x1] regression coefficients Q: [3x3] covariance matrix Warning: VGXSIM will be removed in a future release. Use SIMULATE function of VARM model instead.

## SUR Using All Predictors for Each Response Series

Create a multivariate time series model suitable for SUR using `vgxset`. You must specify the number of response series (`'n'`), the number of regression variables (`'nX'`), and whether to include different regression intercepts for each response series (`'Constant'`).

Mdl1 = vgxset('n',n,'nX',r1,'Constant',true)

Warning: VGXSET will be removed in a future release. Use VARM model instead. Mdl1 = Model: 3-D VARMAX(0,0,12) with Additive Constant n: 3 nAR: 0 nMA: 0 nX: 12 a: [] b: [] Q: []

`Mdl1` is a multivariate time series model object. Unlike `TrueMdl`, none of the coefficients, intercepts, and intra-period covariance matrix have values. Therefore, `Mdl1` is suitable for estimation.

Estimate the regression coefficients using vgxvarx. Extract the residuals. Display the estimated model using docid:econ_ug.broukyg-1

[EstMdl1,~,~,W] = vgxvarx(Mdl1,Y,CellX1); vgxdisp(EstMdl1)

Warning: VGXVARX will be removed in a future release. Use ESTIMATE function of VARM model instead. Warning: VGXDISP will be removed in a future release. Use SUMMARIZE function of VARM model instead. Model : 3-D VARMAX(0,0,12) with Additive Constant a Constant: 0.978981 -1.06438 0.453232 b Regression Parameter: 1.76856 3.85757 -2.20089 -1.55085 2.44067 0.464144 0.69588 -1.71386 -1.6414 0.670357 -0.0564374 0.565809 Q Innovations Covariance: 1.38503 0.667301 -0.159136 0.667301 0.973123 0.216492 -0.159136 0.216492 0.993384

`EstMdl` is a multivariate time series model containing the estimated parameters. `W` is a -by- matrix of residuals. By default, `vgxvarx` models a full, intra-period innovations covariance matrix.

Alternatively, and in this case, you can use the backslash operator on `X` and `Y`. However, you must include a column of ones in `X` for the intercepts.

coeff = [ones(T,1) X]\Y

coeff = 0.9790 -1.0644 0.4532 1.7686 3.8576 -2.2009 -1.5508 2.4407 0.4641 0.6959 -1.7139 -1.6414 0.6704 -0.0564 0.5658

`coeff` is a `nExo + 1`-by- `n` matrix of estimated regression coefficients and intercepts. The estimated intercepts are in the first row, and the rest of the matrix contains the estimated regression coefficients

Compare all estimates to their true values.

fprintf('\n'); fprintf(' Intercepts \n'); fprintf(' True | vgxvarx | backslash\n'); fprintf('--------------------------------------\n'); for j = 1:n fprintf(' %8.4f | %8.4f | %8.4f\n',aTrue(j),EstMdl1.a(j),coeff(1,j)); end cB = coeff'; cB = cB(:); fprintf('\n'); fprintf(' Coefficients \n'); fprintf(' True | vgxvarx | backslash\n'); fprintf('--------------------------------------\n'); for j = 1:r1 fprintf(' %8.4f | %8.4f | %8.4f\n',bTrue(j),... EstMdl1.b(j),cB(n + j)); end fprintf('\n'); fprintf(' Innovations Covariance\n'); fprintf(' True | vgxvarx\n'); fprintf('----------------------------------------------------------\n'); for j = 1:n fprintf('%8.4f %8.4f %8.4f | %8.4f %8.4f %8.4f\n',... InnovCov(j,:),EstMdl1.Q(j,:)); end

Intercepts True | vgxvarx | backslash -------------------------------------- 1.0000 | 0.9790 | 0.9790 -1.0000 | -1.0644 | -1.0644 0.5000 | 0.4532 | 0.4532 Coefficients True | vgxvarx | backslash -------------------------------------- 2.0000 | 1.7686 | 1.7686 4.0000 | 3.8576 | 3.8576 -2.0000 | -2.2009 | -2.2009 -1.5000 | -1.5508 | -1.5508 2.5000 | 2.4407 | 2.4407 0.5000 | 0.4641 | 0.4641 0.5000 | 0.6959 | 0.6959 -1.7500 | -1.7139 | -1.7139 -1.5000 | -1.6414 | -1.6414 0.7500 | 0.6704 | 0.6704 -0.0500 | -0.0564 | -0.0564 0.7000 | 0.5658 | 0.5658 Innovations Covariance True | vgxvarx ---------------------------------------------------------- 1.0000 0.5000 -0.0500 | 1.3850 0.6673 -0.1591 0.5000 1.0000 0.2500 | 0.6673 0.9731 0.2165 -0.0500 0.2500 1.0000 | -0.1591 0.2165 0.9934

The estimates from implementing `vgxvarx` and the backslash operator are the same, and are fairly close to their corresponding true values.

One way to check the relationship strength between the predictors and responses is to compute the coefficient of determination (i.e., the fraction of variation explained by the predictors), which is

where is the estimated variance of residual series , and is the estimated variance of response series .

R2 = 1 - sum(diag(cov(W)))/sum(diag(cov(Y)))

R2 = 0.9118

The SUR model and predictor data explain approximately 91% of the variation in the response data.

## SUR Using a Unique Predictor for Each Response Series

For each period , create block design matrices such that response series is linearly associated to predictor series , . Put the block design matrices in cells of a -by-1 cell vector in chronological order.

CellX2 = cell(T,1); for j = 1:T CellX2{j} = diag(X(j,1:n)); end r2 = size(CellX2{1},2);

Create a multivariate time series model by using `vgxset` and specifying the number of response series, the number of regression variables, and whether to include different regression intercepts for each response series.

Mdl2 = vgxset('n',n,'nX',r2,'Constant',true);

Warning: VGXSET will be removed in a future release. Use VARM model instead.

Estimate the regression coefficients using `vgxvarx`. Display the estimated parameters. Compute the coefficient of determination.

[EstMdl2,~,~,W2] = vgxvarx(Mdl2,Y,CellX2); vgxdisp(EstMdl2) R2 = 1 - sum(diag(cov(W2)))/sum(diag(cov(Y)))

Warning: VGXVARX will be removed in a future release. Use ESTIMATE function of VARM model instead. Warning: VGXDISP will be removed in a future release. Use SUMMARIZE function of VARM model instead. Model : 3-D VARMAX(0,0,3) with Additive Constant a Constant: 1.07752 -1.43445 0.674376 b Regression Parameter: 1.01491 3.83837 -2.71834 Q Innovations Covariance: 4.96205 4.91571 -1.86546 4.91571 20.8263 -11.0945 -1.86546 -11.0945 7.75392 R2 = 0.1177

The model and predictors explain approximately 12% of the variation in the response series. This should not be surprising since the model is not the same as the response-generating process.

## SUR Using a Shared Intercept for All Response Series

Create block design matrices such that each response series is linearly associated to all predictor series . Prepend the resulting design matrix with a vector of ones representing the common intercept.

ExpandX3 = [ones(T*n,1) kron(X,eye(n))]; r3 = size(ExpandX3,2);

Put the block design matrices into the cells of a -by-1 cell vector in chronological order.

CellX3 = mat2cell(ExpandX3,n*ones(T,1));

Create a multivariate time series model by using `vgxset` and specifying the number of response series and the number of regression variables. By default, `vgxset` excludes regression intercepts.

Mdl3 = vgxset('n',n,'nX',r3);

Warning: VGXSET will be removed in a future release. Use VARM model instead.

Estimate the regression coefficients using `vgxvarx`. Display the estimated parameters. Compute the coefficient of determination.

[EstMdl3,~,~,W3] = vgxvarx(Mdl3,Y,CellX3); vgxdisp(EstMdl3) a = EstMdl3.b(1) R2 = 1 - sum(diag(cov(W3)))/sum(diag(cov(Y)))

Warning: VGXVARX will be removed in a future release. Use ESTIMATE function of VARM model instead. Warning: VGXDISP will be removed in a future release. Use SUMMARIZE function of VARM model instead. Model : 3-D VARMAX(0,0,13) with No Additive Constant b Regression Parameter: 0.388833 1.73468 3.94099 -2.20458 -1.56878 2.48483 0.462187 0.802394 -1.97614 -1.62978 0.63972 0.0190058 0.562466 Q Innovations Covariance: 1.72265 -0.164059 -0.122294 -0.164059 3.02031 0.12577 -0.122294 0.12577 0.997404 a = 0.3888 R2 = 0.9099

The shared, estimated regression intercept is `0.389`, and the other coefficients are similar to the first SUR implementation. The model and predictors explain approximately 91% of the variation in the response series. This should not be surprising since the model almost the same as the response-generating process.