MATLAB Examples

Implement Seemingly Unrelated Regression

This example shows how to include exogenous data for several seemingly unrelated regression (SUR) analyses. The response and exogenous series are random paths from a standard Gaussian distribution.

Contents

In seemingly unrelated regression (SUR), each response variable is a function of a subset of the exogenous series, but not of any endogenous variable. That is, for $j = 1,..,n$ and $t = 1,...,T$, the model for response $j$ at period $t$ is

$$y_{jt} = a_{j} + b_{j1}x_{k_1t} + b_{j2}x_{k_2t} + ... + b_{jk_j}x_{k_jt} + \varepsilon_{jt}$$

The indices of the regression coefficients and exogenous predictors indicate that:

  • You can associate each response with a different subset of exogenous predictors.
  • The response series might not share intercepts or regression coefficients.

SUR accommodates intra-period innovation correlation, but inter-period innovation independence, i.e.,

$$E\left( {{\varepsilon _{it}}{\varepsilon _{js}}|X} \right) = \left\{ {\begin{array}{*{20}{c}}
0;&t \ne s,\;i\ne j\\
\sigma _{ij};&i \ne j,\;t = s\\
{\sigma _i^2 > 0};&i = j,\;t = s
\end{array}} \right..$$

Simulate Data from True Model

Suppose that the true model is

$$\begin{array}{l}
{y_{1t}} = 1 + 2{x_{1t}} - 1.5{x_{2t}} + 0.5{x_{3t}} + 0.75{x_{4t}} + \varepsilon_{1t}\\
{y_{2t}} =  - 1 + 4{x_{1t}} + 2.5{x_{2t}} - 1.75{x_{3t}} - 0.05{x_{4t}} + \varepsilon_{2t}\\
{y_{3t}} = 0.5 - 2{x_{1t}} + 0.5{x_{2t}} - 1.5{x_{3t}} + 0.7{x_{4t}} + \varepsilon_{3t}
\end{array},$$

where $\varepsilon_{jt}$, $j = 1,...,n$ are multivariate Gaussian random variables each having mean zero and jointly having covariance matrix

$$\Sigma  = \left[ {\begin{array}{*{20}{c}}
1&{0.5}&{ - 0.05}\\
{0.5}&1&{0.25}\\
{ - 0.05}&{0.25}&1
\end{array}} \right]$$

Suppose that the paths represent different econometric measurements, e.g. stock returns.

Simulate four exogenous predictor paths from the standard Gaussian distribution.

rng(1);   % For reproducibility
n = 3;    % Number of response series
nExo = 4; % Number of exogenous series
T = 100;
X = randn(100,nExo);

mvregress, the workhorse of estimate, requires you to input the exogenous data in a T-by-1 cell vector. Cell $t$ of the cell vector is a design matrix indicating the linear relationship of the exogenous variables with each response series at period $t$. However, estimate associates each predictor to every response. As a result, estimate requires the predictor data in a matrix.

Create a VAR model object that characterizes the true model. Simulate a length 100 path of responses from the model.

aTrue = [1; -1; 0.5];
bTrue = [[2; 4; -2] [-1.5; 2.5; 0.5] [0.5; -1.75; -1.5] [0.75; -0.05; 0.7]];
InnovCov = [1 0.5 -0.05; 0.5 1 0.25; -0.05 0.25 1];
TrueMdl = varm('Beta',bTrue,'Constant',aTrue,'Covariance',InnovCov)

Y = simulate(TrueMdl,T,'X',X);
TrueMdl = 

  varm with properties:

     Description: "3-Dimensional VARX(0) Model with 4 Predictors"
     SeriesNames: "Y1"  "Y2"  "Y3" 
       NumSeries: 3
               P: 0
        Constant: [1 -1 0.5]'
              AR: {}
           Trend: [3×1 vector of zeros]
            Beta: [3×4 matrix]
      Covariance: [3×3 matrix]

SUR Using All Predictors for Each Response Series

Create a VAR model suitable for SUR using the shorthand syntax of varm.

Mdl1 = varm(n,0);

Mdl1 is a varm model object template representing a three-dimensional VAR(0) model. Unlike TrueMdl, none of the coefficients, intercepts, and intra-period covariance matrix have values. Therefore, Mdl1 is suitable for estimation.

Estimate the regression coefficients using estimate. Extract the residuals. Display the estimated model using summarize.

[EstMdl1,~,~,E] = estimate(Mdl1,Y,'X',X);
summarize(EstMdl1)
 
   <strong>3-Dimensional VARX(0) Model with 4 Predictors</strong>
 
    Effective Sample Size: 100
    Number of Estimated Parameters: 15
    LogLikelihood: -412.026
    AIC: 854.052
    BIC: 893.129
 
                     Value      StandardError    TStatistic      PValue   
                   _________    _____________    __________    ___________

    Constant(1)      0.97898    0.11953            8.1902       2.6084e-16
    Constant(2)      -1.0644    0.10019           -10.623       2.3199e-26
    Constant(3)      0.45323    0.10123            4.4772       7.5611e-06
    Beta(1,1)         1.7686    0.11994            14.745       3.2948e-49
    Beta(2,1)         3.8576    0.10054             38.37      4.1502e-322
    Beta(3,1)        -2.2009    0.10158           -21.667      4.1715e-104
    Beta(1,2)        -1.5508    0.12345           -12.563       3.3861e-36
    Beta(2,2)         2.4407    0.10348            23.587      5.2666e-123
    Beta(3,2)        0.46414    0.10455            4.4395       9.0156e-06
    Beta(1,3)        0.69588    0.13491            5.1583       2.4922e-07
    Beta(2,3)        -1.7139    0.11308           -15.156       6.8911e-52
    Beta(3,3)        -1.6414    0.11425           -14.367       8.3713e-47
    Beta(1,4)        0.67036    0.12731            5.2654        1.399e-07
    Beta(2,4)      -0.056437    0.10672          -0.52885          0.59691
    Beta(3,4)        0.56581    0.10782            5.2476       1.5406e-07

 
<strong>   Innovations Covariance Matrix:</strong>
    1.3850    0.6673   -0.1591
    0.6673    0.9731    0.2165
   -0.1591    0.2165    0.9934

 
<strong>   Innovations Correlation Matrix:</strong>
    1.0000    0.5748   -0.1357
    0.5748    1.0000    0.2202
   -0.1357    0.2202    1.0000

EstMdl is a varm model object containing the estimated parameters. E is a $T$-by- $n$ matrix of residuals.

Alternatively, and in this case, you can use the backslash operator on X and Y. However, you must include a column of ones in X for the intercepts.

coeff = ([ones(T,1) X]\Y)
coeff =

    0.9790   -1.0644    0.4532
    1.7686    3.8576   -2.2009
   -1.5508    2.4407    0.4641
    0.6959   -1.7139   -1.6414
    0.6704   -0.0564    0.5658

coeff is a n-by- nExo + 1 matrix of estimated regression coefficients and intercepts. The estimated intercepts are in the first column, and the rest of the matrix contains the estimated regression coefficients

Compare all estimates to their true values.

fprintf('\n');
fprintf('               Intercepts      \n');
fprintf('     True    |   estimate  |  backslash\n');
fprintf('--------------------------------------\n');
for j = 1:n
    fprintf('  %8.4f   |  %8.4f  | %8.4f\n',aTrue(j),EstMdl1.Constant(j),coeff(1,j));
end

cB = coeff';
cB = cB(:);
fprintf('\n');
fprintf('              Coefficients      \n');
fprintf('     True    |   estimate  |  backslash\n');
fprintf('--------------------------------------\n');
for j = 1:numel(EstMdl1.Beta)
    fprintf('  %8.4f   |  %8.4f  | %8.4f\n',bTrue(j),...
        EstMdl1.Beta(j),cB(n + j));
end

fprintf('\n');
fprintf('                 Innovations Covariance\n');
fprintf('            True            |             estimate\n');
fprintf('----------------------------------------------------------\n');
for j = 1:n
    fprintf('%8.4f %8.4f %8.4f  |   %8.4f %8.4f %8.4f\n',...
        InnovCov(j,:),EstMdl1.Covariance(j,:));
end
               Intercepts      
     True    |   estimate  |  backslash
--------------------------------------
    1.0000   |    0.9790  |   0.9790
   -1.0000   |   -1.0644  |  -1.0644
    0.5000   |    0.4532  |   0.4532

              Coefficients      
     True    |   estimate  |  backslash
--------------------------------------
    2.0000   |    1.7686  |   1.7686
    4.0000   |    3.8576  |   3.8576
   -2.0000   |   -2.2009  |  -2.2009
   -1.5000   |   -1.5508  |  -1.5508
    2.5000   |    2.4407  |   2.4407
    0.5000   |    0.4641  |   0.4641
    0.5000   |    0.6959  |   0.6959
   -1.7500   |   -1.7139  |  -1.7139
   -1.5000   |   -1.6414  |  -1.6414
    0.7500   |    0.6704  |   0.6704
   -0.0500   |   -0.0564  |  -0.0564
    0.7000   |    0.5658  |   0.5658

                 Innovations Covariance
            True            |             estimate
----------------------------------------------------------
  1.0000   0.5000  -0.0500  |     1.3850   0.6673  -0.1591
  0.5000   1.0000   0.2500  |     0.6673   0.9731   0.2165
 -0.0500   0.2500   1.0000  |    -0.1591   0.2165   0.9934

The estimates from implementing estimate and the backslash operator are the same, and are fairly close to their corresponding true values.

One way to check the relationship strength between the predictors and responses is to compute the coefficient of determination (i.e., the fraction of variation explained by the predictors), which is

$$R^2 = 1 - \frac{\sum_j^n\hat\sigma^2_{\varepsilon j}}{\sum_j^n\hat\sigma^2_{Y j}},$$

where $\hat\sigma^2_{\varepsilon j}$ is the estimated variance of residual series $j$, and $\hat\sigma^2_{Y j}$ is the estimated variance of response series $j$.

R2 = 1 - sum(diag(cov(E)))/sum(diag(cov(Y)))
R2 =

    0.9118

The SUR model and predictor data explain approximately 93% of the variation in the response data.