Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Implement Seemingly Unrelated Regression Analyses

This example will be removed in a future release.

This example shows how to prepare exogenous data for several seemingly unrelated regression (SUR) analyses. The response and exogenous series are random paths from a standard Gaussian distribution.

In seemingly unrelated regression (SUR), each response variable is a function of a subset of the exogenous series, but not of any endogenous variable. That is, for and , the model for response at period is

The indices of the regression coefficients and exogenous predictors indicate that:

  • You can associate each response with a different subset of exogenous predictors.

  • The response series might not share intercepts or regression coefficients.

SUR accommodates intra-period innovation heteroscedasticity and correlation, but inter-period innovation independence and homoscedasticity, i.e.,

Simulate Data from the True Model

Suppose that the true model is

where , are multivariate Gaussian random variables each having mean zero and jointly having covariance matrix

Suppose that the paths represent different econometric measurements, e.g. stock returns.

Simulate four exogenous predictor paths from the standard Gaussian distribution.

rng(1);   % For reproducibility
n = 3;    % Number of response series
nExo = 4; % Number of exogenous series
T = 100;
X = randn(100,nExo);

The multivariate time series analysis functions of Econometrics Toolbox™ require you to input the exogenous data in a T-by-1 cell vector. Cell of the cell vector is a design matrix indicating the linear relationship of the exogenous variables with each response series at period . Specifically, each design matrix in the cell array:

  • Has rows, each corresponding to a response series.

  • Has columns since, in this example, all exogenous variables are in the regression component of each response series.

To create the cell vector of design matrices for this case, first expand the exogenous predictor data by finding its Kronecker product with the -by- identity matrix.

ExpandX1 = kron(X,eye(n));
r1 = size(ExpandX1,2); % Number of regression variables

ExpandX1 is an -by- numeric matrix formed by multiplying each element of X to the -by- identity matrix, and then putting the product in the corresponding position of a -by-1 block matrix of -by- matrices.

Create the cell vector of design matrices by putting each consecutive -by- block matrices of ExpandX1 into the cells of a -by-1 cell vector. Verify that one of the cells contains the expected design matrix (e.g. the third cell)).

CellX1 = mat2cell(ExpandX1,n*ones(T,1));
CellX1{3}
X(3,:)
ans =

  Columns 1 through 7

   -0.7585         0         0    1.9302         0         0    1.8562
         0   -0.7585         0         0    1.9302         0         0
         0         0   -0.7585         0         0    1.9302         0

  Columns 8 through 12

         0         0    1.3411         0         0
    1.8562         0         0    1.3411         0
         0    1.8562         0         0    1.3411


ans =

   -0.7585    1.9302    1.8562    1.3411

In period 3, all observed predictors are associated with each response series.

Create a multivariate time series model object that characterizes the true model using vgxset.

aTrue = [1; -1; 0.5];
bTrue = [2; 4; -2; -1.5; 2.5; 0.5; 0.5; -1.75; -1.5; 0.75; -0.05; 0.7];
InnovCov = [1 0.5 -0.05; 0.5 1 0.25; -0.05 0.25 1];
TrueMdl = vgxset('n',n,'b',bTrue,'a',aTrue,'Q',InnovCov)
Y = vgxsim(TrueMdl,100,CellX1);
Warning: VGXSET will be removed in a future release. Use VARM model instead. 

TrueMdl = 

      Model: 3-D VARMAX(0,0,12) with Additive Constant
          n: 3
        nAR: 0
        nMA: 0
         nX: 12
          a: [1 -1 0.5] additive constants
          b: [12x1] regression coefficients
          Q: [3x3] covariance matrix

Warning: VGXSIM will be removed in a future release. Use SIMULATE function of
VARM model instead. 

SUR Using All Predictors for Each Response Series

Create a multivariate time series model suitable for SUR using vgxset. You must specify the number of response series ('n'), the number of regression variables ('nX'), and whether to include different regression intercepts for each response series ('Constant').

Mdl1 = vgxset('n',n,'nX',r1,'Constant',true)
Warning: VGXSET will be removed in a future release. Use VARM model instead. 

Mdl1 = 

      Model: 3-D VARMAX(0,0,12) with Additive Constant
          n: 3
        nAR: 0
        nMA: 0
         nX: 12
          a: []
          b: []
          Q: []

Mdl1 is a multivariate time series model object. Unlike TrueMdl, none of the coefficients, intercepts, and intra-period covariance matrix have values. Therefore, Mdl1 is suitable for estimation.

Estimate the regression coefficients using vgxvarx. Extract the residuals. Display the estimated model using vgxdisp

[EstMdl1,~,~,W] = vgxvarx(Mdl1,Y,CellX1);
vgxdisp(EstMdl1)
Warning: VGXVARX will be removed in a future release. Use ESTIMATE function of
VARM model instead. 
Warning: VGXDISP will be removed in a future release. Use SUMMARIZE function of
VARM model instead. 
  Model  : 3-D VARMAX(0,0,12) with Additive Constant
  a Constant:
        0.978981
        -1.06438
        0.453232
  b Regression Parameter:
         1.76856
         3.85757
        -2.20089
        -1.55085
         2.44067
        0.464144
         0.69588
        -1.71386
         -1.6414
        0.670357
      -0.0564374
        0.565809
  Q Innovations Covariance:
         1.38503       0.667301      -0.159136 
        0.667301       0.973123       0.216492 
       -0.159136       0.216492       0.993384 

EstMdl is a multivariate time series model containing the estimated parameters. W is a -by- matrix of residuals. By default, vgxvarx models a full, intra-period innovations covariance matrix.

Alternatively, and in this case, you can use the backslash operator on X and Y. However, you must include a column of ones in X for the intercepts.

coeff = [ones(T,1) X]\Y
coeff =

    0.9790   -1.0644    0.4532
    1.7686    3.8576   -2.2009
   -1.5508    2.4407    0.4641
    0.6959   -1.7139   -1.6414
    0.6704   -0.0564    0.5658

coeff is a nExo + 1-by- n matrix of estimated regression coefficients and intercepts. The estimated intercepts are in the first row, and the rest of the matrix contains the estimated regression coefficients

Compare all estimates to their true values.

fprintf('\n');
fprintf('               Intercepts      \n');
fprintf('     True    |   vgxvarx  |  backslash\n');
fprintf('--------------------------------------\n');
for j = 1:n
    fprintf('  %8.4f   |  %8.4f  | %8.4f\n',aTrue(j),EstMdl1.a(j),coeff(1,j));
end

cB = coeff';
cB = cB(:);
fprintf('\n');
fprintf('              Coefficients      \n');
fprintf('     True    |   vgxvarx  |  backslash\n');
fprintf('--------------------------------------\n');
for j = 1:r1
    fprintf('  %8.4f   |  %8.4f  | %8.4f\n',bTrue(j),...
        EstMdl1.b(j),cB(n + j));
end

fprintf('\n');
fprintf('                 Innovations Covariance\n');
fprintf('            True            |             vgxvarx\n');
fprintf('----------------------------------------------------------\n');
for j = 1:n
    fprintf('%8.4f %8.4f %8.4f  |   %8.4f %8.4f %8.4f\n',...
        InnovCov(j,:),EstMdl1.Q(j,:));
end
               Intercepts      
     True    |   vgxvarx  |  backslash
--------------------------------------
    1.0000   |    0.9790  |   0.9790
   -1.0000   |   -1.0644  |  -1.0644
    0.5000   |    0.4532  |   0.4532

              Coefficients      
     True    |   vgxvarx  |  backslash
--------------------------------------
    2.0000   |    1.7686  |   1.7686
    4.0000   |    3.8576  |   3.8576
   -2.0000   |   -2.2009  |  -2.2009
   -1.5000   |   -1.5508  |  -1.5508
    2.5000   |    2.4407  |   2.4407
    0.5000   |    0.4641  |   0.4641
    0.5000   |    0.6959  |   0.6959
   -1.7500   |   -1.7139  |  -1.7139
   -1.5000   |   -1.6414  |  -1.6414
    0.7500   |    0.6704  |   0.6704
   -0.0500   |   -0.0564  |  -0.0564
    0.7000   |    0.5658  |   0.5658

                 Innovations Covariance
            True            |             vgxvarx
----------------------------------------------------------
  1.0000   0.5000  -0.0500  |     1.3850   0.6673  -0.1591
  0.5000   1.0000   0.2500  |     0.6673   0.9731   0.2165
 -0.0500   0.2500   1.0000  |    -0.1591   0.2165   0.9934

The estimates from implementing vgxvarx and the backslash operator are the same, and are fairly close to their corresponding true values.

One way to check the relationship strength between the predictors and responses is to compute the coefficient of determination (i.e., the fraction of variation explained by the predictors), which is

where is the estimated variance of residual series , and is the estimated variance of response series .

R2 = 1 - sum(diag(cov(W)))/sum(diag(cov(Y)))
R2 =

    0.9118

The SUR model and predictor data explain approximately 91% of the variation in the response data.

SUR Using a Unique Predictor for Each Response Series

For each period , create block design matrices such that response series is linearly associated to predictor series , . Put the block design matrices in cells of a -by-1 cell vector in chronological order.

CellX2 = cell(T,1);
for j = 1:T
    CellX2{j} = diag(X(j,1:n));
end
r2 = size(CellX2{1},2);

Create a multivariate time series model by using vgxset and specifying the number of response series, the number of regression variables, and whether to include different regression intercepts for each response series.

Mdl2 = vgxset('n',n,'nX',r2,'Constant',true);
Warning: VGXSET will be removed in a future release. Use VARM model instead. 

Estimate the regression coefficients using vgxvarx. Display the estimated parameters. Compute the coefficient of determination.

[EstMdl2,~,~,W2] = vgxvarx(Mdl2,Y,CellX2);
vgxdisp(EstMdl2)
R2 = 1 - sum(diag(cov(W2)))/sum(diag(cov(Y)))
Warning: VGXVARX will be removed in a future release. Use ESTIMATE function of
VARM model instead. 
Warning: VGXDISP will be removed in a future release. Use SUMMARIZE function of
VARM model instead. 
  Model  : 3-D VARMAX(0,0,3) with Additive Constant
  a Constant:
         1.07752
        -1.43445
        0.674376
  b Regression Parameter:
         1.01491
         3.83837
        -2.71834
  Q Innovations Covariance:
         4.96205        4.91571       -1.86546 
         4.91571        20.8263       -11.0945 
        -1.86546       -11.0945        7.75392 

R2 =

    0.1177

The model and predictors explain approximately 12% of the variation in the response series. This should not be surprising since the model is not the same as the response-generating process.

SUR Using a Shared Intercept for All Response Series

Create block design matrices such that each response series is linearly associated to all predictor series . Prepend the resulting design matrix with a vector of ones representing the common intercept.

ExpandX3 = [ones(T*n,1) kron(X,eye(n))];
r3 = size(ExpandX3,2);

Put the block design matrices into the cells of a -by-1 cell vector in chronological order.

CellX3 = mat2cell(ExpandX3,n*ones(T,1));

Create a multivariate time series model by using vgxset and specifying the number of response series and the number of regression variables. By default, vgxset excludes regression intercepts.

Mdl3 = vgxset('n',n,'nX',r3);
Warning: VGXSET will be removed in a future release. Use VARM model instead. 

Estimate the regression coefficients using vgxvarx. Display the estimated parameters. Compute the coefficient of determination.

[EstMdl3,~,~,W3] = vgxvarx(Mdl3,Y,CellX3);
vgxdisp(EstMdl3)
a = EstMdl3.b(1)
R2 = 1 - sum(diag(cov(W3)))/sum(diag(cov(Y)))
Warning: VGXVARX will be removed in a future release. Use ESTIMATE function of
VARM model instead. 
Warning: VGXDISP will be removed in a future release. Use SUMMARIZE function of
VARM model instead. 
  Model  : 3-D VARMAX(0,0,13) with No Additive Constant
  b Regression Parameter:
        0.388833
         1.73468
         3.94099
        -2.20458
        -1.56878
         2.48483
        0.462187
        0.802394
        -1.97614
        -1.62978
         0.63972
       0.0190058
        0.562466
  Q Innovations Covariance:
         1.72265      -0.164059      -0.122294 
       -0.164059        3.02031        0.12577 
       -0.122294        0.12577       0.997404 

a =

    0.3888


R2 =

    0.9099

The shared, estimated regression intercept is 0.389, and the other coefficients are similar to the first SUR implementation. The model and predictors explain approximately 91% of the variation in the response series. This should not be surprising since the model almost the same as the response-generating process.

See Also

Related Topics

Was this topic helpful?