MATLAB Examples

Specify Gradient for HMC Sampler

This example shows how to set up a Bayesian linear regression model for efficient posterior sampling using the Hamiltonian Monte Carlo (HMC) sampler. The prior distribution of the coefficients is a multivariate t, and the disturbance variance has an inverse gamma prior.

Consider the multiple linear regression model that predicts U.S. real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR).

$$\texttt{GNPR}_t=\beta_0+\beta_1\texttt{IPI}_t+\beta_2\texttt{E}_t+\beta_3\texttt{WR}_t+\varepsilon_t.$$

For all $t$, $\varepsilon_t$ is a series of independent Gaussian disturbances with a mean of 0 and variance $\sigma^2$. Assume these prior distributions.

  • $\beta_j\vert\sigma^2$ is a 4-D t distribution with 30 degrees of freedom for each component, correlation matrix C, location ct, and scale st.
  • $\sigma^2 \sim IG(10r_1,10r_2)$, with shape $10r_1$ and scale $10r_2$.

bayeslm treats these assumptions and the data likelihood as if the corresponding posterior is analytically intractable.

Declare a MATLAB® function that:

  • Accepts values of $\beta$ and $\sigma^2$ together in a column vector, and accepts values of the hyperparameters.
  • Returns the log of the joint prior distribution, $\pi\left(\beta,\sigma^2\right)$, given the values of $\beta$ and $\sigma^2$, and returns the gradient of the log prior density with respect to $\beta$ and $\gamma$.
function [logPDF,gradient] = priorMVTIGHMC(params,ct,st,dof,C,a,b)
%priorMVTIGHMC Log density of multivariate t times inverse gamma and
%gradient
%   priorMVTIGHMC passes params(1:end-1) to the multivariate t density
%   function with dof degrees of freedom for each component and positive
%   definite correlation matrix C, and passes params(end) to the inverse
%   gamma distribution with shape a and scale b.  priorMVTIG returns the
%   log of the product of the two evaluated densities and its gradient.
%   After applying the log, terms involving constants only are dropped from
%   logPDF.
%   
%   params: Parameter values at which the densities are evaluated, an
%           m-by-1 numeric vector.
%
%       ct: Multivariate t distribution component centers, an (m-1)-by-1
%           numeric vector.  Elements correspond to the first m-1 elements
%           of params.
%
%       st: Multivariate t distribution component scales, an (m-1)-by-1
%           numeric vector.  Elements correspond to the first m-1 elements 
%           of params.
%
%      dof: Degrees of freedom for the multivariate t distribution, a
%           numeric scalar or (m-1)-by-1 numeric vector. priorMVTIG expands
%           scalars such that dof = dof*ones(m-1,1). Elements of dof
%           correspond to the elements of params(1:end-1).
%
%        C: correlation matrix for the multivariate t distribution, an
%           (m-1)-by-(m-1) symmetric, positive definite matrix. Rows and
%           columns correspond to the elements of params(1:end-1).
% 
%        a: Inverse gamma shape parameter, a positive numeric scalar.
%
%        b: Inverse gamma scale parameter, a positive scalar.
%
beta = params(1:(end-1));
numCoeffs = numel(beta);
sigma2 = params(end);

tVal = (beta - ct)./st;
expo = -(dof + numCoeffs)/2;
logmvtDensity = expo*log(1 + (1/dof)*(tVal'/C*tVal));
logigDensity = (-a - 1)*log(sigma2) - 1/(sigma2*b);
logPDF = logmvtDensity + logigDensity;

grad_beta = (expo/(1 + (1/dof)*(tVal'/C*tVal)))*((2/dof)*(tVal'/C)'./st);
grad_sigma = (-a - 1)/sigma2 + 1/(sigma2^2*b);
gradient = [grad_beta; grad_sigma];
end

Create an anonymous function that operates like priorMVTIGHMC, but accepts the parameter values only and holds the hyperparameter values fixed to arbitrarily chosen values.

rng(1); % For reproducibility
dof = 30;
V = rand(4,4);
Sigma = V'*V;
st = sqrt(diag(Sigma));
C = Sigma./(st*st');
ct = -10*rand(4,1);
a = 10*rand;
b = 10*rand;
logPDF = @(params)priorMVTIGHMC(params,ct,st,dof,C,a,b);

Create a custom joint prior model for the linear regression parameters. Specify the number of predictors, p. Also, specify the function handle for priorMVTIGHMC and the variable names.

p = 3;
PriorMdl = bayeslm(p,'ModelType','custom','LogPDF',logPDF,...
    'VarNames',["IPI" "E" "WR"]);

PriorMdl is a customblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance.

Load the Nelson-Plosser data set. Create variables for the response and predictor series.

load Data_NelsonPlosser
X = DataTable{:,PriorMdl.VarNames(2:end)};
y = DataTable{:,'GNPR'};

Estimate the marginal posterior distributions of $\beta$ and $\sigma^2$ using the HMC sampler. To monitor the progress of the sampler, specify a verbosity level of 2. For numerical stability, specify reparameterization of the disturbance variance.

options = sampleroptions('Sampler','hmc','VerbosityLevel',2)
PosteriorMdl = estimate(PriorMdl,X,y,'Options',options,'Reparameterize',true);
options = 

  struct with fields:

                        Sampler: 'HMC'
           StepSizeTuningMethod: 'dual-averaging'
         MassVectorTuningMethod: 'iterative-sampling'
    NumStepSizeTuningIterations: 100
          TargetAcceptanceRatio: 0.6500
                  NumStepsLimit: 2000
                 VerbosityLevel: 2
                       NumPrint: 100

o Tuning mass vector using method: iterative-sampling 

o Initial step size for dual-averaging = 0.00012207

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 | -3.335909e+02 |   2.280e-04 |         150 |   6.300e-01 |          22 |
Finished mass vector tuning iteration 1 of 5. 

o Initial step size for dual-averaging = 2

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 | -3.358213e+02 |   1.232e+00 |           4 |   6.300e-01 |           0 |

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 | -3.338032e+02 |   8.900e-02 |           3 |   8.050e-01 |           0 |
Finished mass vector tuning iteration 2 of 5. 

o Initial step size for dual-averaging = 0.5

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 | -3.353511e+02 |   1.712e-01 |          29 |   6.200e-01 |           1 |

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 | -3.345878e+02 |   1.541e-01 |           7 |   7.750e-01 |           0 |
|      200 | -3.334264e+02 |   6.323e-02 |          26 |   8.333e-01 |           0 |
Finished mass vector tuning iteration 3 of 5. 

o Initial step size for dual-averaging = 0.0625

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 | -3.338346e+02 |   5.204e-02 |          96 |   6.400e-01 |           5 |

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 | -3.376211e+02 |   2.622e-03 |           9 |   8.050e-01 |           0 |
|      200 | -3.335348e+02 |   4.683e-02 |          92 |   8.633e-01 |           0 |
Finished mass vector tuning iteration 4 of 5. 

o Initial step size for dual-averaging = 0.125

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 | -3.335993e+02 |   5.472e-02 |          91 |   6.500e-01 |           3 |

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 | -3.357536e+02 |   1.579e-02 |          20 |   8.000e-01 |           0 |
|      200 | -3.323844e+02 |   4.925e-02 |          24 |   8.567e-01 |           0 |
Finished mass vector tuning iteration 5 of 5. 

o Tuning step size using method: dual-averaging. Target acceptance ratio = 0.65

o Initial step size for dual-averaging = 0.125

|==================================================================================|
|   ITER   |    LOG PDF    |  STEP SIZE  |  NUM STEPS  |  ACC RATIO  |  DIVERGENT  |
|==================================================================================|
|      100 | -3.361831e+02 |   5.985e-02 |          84 |   6.400e-01 |           0 |

Method: MCMC sampling with 10000 draws
Number of observations: 62
Number of predictors:   4
 
           |   Mean      Std          CI95         Positive  Distribution 
--------------------------------------------------------------------------
 Intercept | -23.8912  3.0874  [-30.040, -17.582]    0.000     Empirical  
 IPI       |   4.3810  0.1079   [ 4.168,  4.593]     1.000     Empirical  
 E         |   0.0011  0.0002   [ 0.001,  0.002]     1.000     Empirical  
 WR        |   2.5003  0.3362   [ 1.834,  3.168]     1.000     Empirical  
 Sigma2    |  40.0373  6.7743   [28.935, 55.248]     1.000     Empirical  
 

PosteriorMdl is an empiricalblm model object storing the draws from the posterior distributions.

View trace plots and autocorrelation function (ACF) plots of the draws from the posteriors.

for j = 1:4
    figure;
    subplot(2,1,1)
    plot(PosteriorMdl.BetaDraws(j,:));
    title(sprintf('Trace plot - Beta %d',j));
    xlabel('MCMC draw')
    ylabel('Simulation index')
    subplot(2,1,2)
    autocorr(PosteriorMdl.BetaDraws(j,:))
end

figure;
subplot(2,1,1)
plot(PosteriorMdl.Sigma2Draws);
title('Trace plot - Disturbance Variance');
xlabel('MCMC draw')
ylabel('Simulation index')
subplot(2,1,2)
autocorr(PosteriorMdl.Sigma2Draws)

The trace plots show adequate mixing and no transient behavior. The autocorrelation plots show low correlation among subsequent samples.