Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

simulate

Simulate regression coefficients and disturbance variance of Bayesian linear regression model

Syntax

[BetaSim,sigma2Sim] = simulate(Mdl)
[BetaSim,sigma2Sim] = simulate(Mdl,X,y)
[BetaSim,sigma2Sim] = simulate(___,Name,Value)

Description

example

[BetaSim,sigma2Sim] = simulate(Mdl) returns a random vector of regression coefficients (BetaSim) and a random disturbance variance (sigma2Sim) drawn from the Bayesian linear regression model Mdl of β and σ2.

  • If Mdl is a joint prior model (returned by bayeslm), then simulate draws from the joint prior distribution.

  • If Mdl is a joint posterior model (returned by estimate), then simulate draws from the joint posterior distributions.

example

[BetaSim,sigma2Sim] = simulate(Mdl,X,y) draws from the marginal posterior distributions produced or updated by incorporating the predictor data X and corresponding response data y.

  • If Mdl is a joint prior model, then simulate produces the marginal posterior distributions by “updating” the prior model with information about the parameters that it gleans from the data.

  • If Mdl is a marginal posterior model, then simulate updates the posteriors with information about the parameters that it gleans from the additional data. That is, the complete data likelihood is composed of the additional data X and y, and the data that created Mdl.

NaNs in the data indicate missing values, which simulate removes using list-wise deletion.

example

[BetaSim,sigma2Sim] = simulate(___,Name,Value) uses any of the input arguments in the previous syntaxes and additional options specified by one or more Name,Value pair arguments. For example, you can specify a value for one of β or σ2 to simulate from the conditional posterior distribution of one parameter given the specified value of the other parameter.

Examples

collapse all

Consider the multiple linear regression model that predicts U.S. real gross national product (GNPR) using a linear combination of industrial production index (IPI), total employment (E), and real wages (WR).

For all , is a series of independent Gaussian disturbances with a mean of 0 and variance .

Assume that the prior distributions are:

  • . is a 4-by-1 vector of means and is a scaled 4-by-4 positive definite covariance matrix.

  • . and are the shape and scale, respectively, of an inverse gamma distribution.

These assumptions and the data likelihood imply a normal-inverse-gamma conjugate model.

Load the Nelson-Plosser data set. Create variables for the response and predictor series.

load Data_NelsonPlosser
varNames = {'IPI' 'E' 'WR'};
X = DataTable{:,varNames};
y = DataTable{:,'GNPR'};

Create a normal-inverse-gamma conjugate prior model for the linear regression parameters. Specify the number of predictors, p, and the variable names.

p = 3;
PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',varNames);

Mdl is a conjugateblm Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance.

Simulate a set of regression coefficients and a value of the disturbance variance form the prior distribution.

rng(1); % For reproducibility
[betaSimPrior,sigma2SimPrior] = simulate(PriorMdl)
betaSimPrior =

  -33.5917
  -49.1445
  -37.4492
  -25.3632


sigma2SimPrior =

    0.1962

betaSimPrior is the randomly drawn 4-by-1 vector of regression coefficients corresponding to the names in PriorMdl.VarNames. sigma2SimPrior is the randomly drawn scalar disturbance variance.

Estimate the posterior distribution.

PosteriorMdl = estimate(PriorMdl,X,y);
Method: Analytic posterior distributions
Number of observations: 62
Number of predictors:   4
Log marginal likelihood     -259.348
 
           |   Mean      Std          CI95        Positive       Distribution      
-----------------------------------------------------------------------------------
 Intercept | -24.2494  8.7821  [-41.514, -6.985]    0.003   t (-24.25, 8.65^2, 68) 
 IPI       |   4.3913  0.1414   [ 4.113,  4.669]    1.000   t (4.39, 0.14^2, 68)   
 E         |   0.0011  0.0003   [ 0.000,  0.002]    1.000   t (0.00, 0.00^2, 68)   
 WR        |   2.4683  0.3490   [ 1.782,  3.154]    1.000   t (2.47, 0.34^2, 68)   
 Sigma2    |  44.1347  7.8020   [31.427, 61.855]    1.000   IG(34.00, 0.00069)     
 

PosteriorMdl is a conjugateblm Bayesian linear regression model object representing the posterior distribution of the regression coefficients and disturbance variance.

Simulate a set of regression coefficients and a value of the disturbance variance form the posterior distribution.

[betaSimPost,sigma2SimPost] = simulate(PosteriorMdl)
betaSimPost =

  -25.9351
    4.4379
    0.0012
    2.4072


sigma2SimPost =

   41.9575

betaSimPost and sigma2SimPost are commensurate with betaSimPrior and sigma2SimPrior, but are drawn from the posterior instead.

Consider the model in Simulate Parameter Value from Prior and Posterior Distributions.

Load the data, create a conjugate prior model, and then estimate it.

load Data_NelsonPlosser
varNames = {'IPI' 'E' 'WR'};
X = DataTable{:,varNames};
y = DataTable{:,'GNPR'};
p = 3;
PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',varNames);
[PosteriorMdl,~,~,~,~,summaryTbl] = estimate(PriorMdl,X,y,'Display',false);

Although, the marginal and conditional posterior distributions of and are analytically tractable, the goal of this example is to illustrate how to implement the Gibbs sampler to reproduce known results.

Estimate the model again, but use a Gibbs sampler instead. That is, alternate between sampling from the conditional posterior distributions of the parameters. Sample 10000 times, and create variables for preallocation. Start the sampler by drawing from the conditional posterior of given .

m = 1e4;
BetaDraws = zeros(p + 1,m);
sigma2Draws = zeros(1,m + 1);
sigma2Draws(1) = 2;
rng(1); % For reproducibility
for j = 1:m
    BetaDraws(:,j) = simulate(PriorMdl,X,y,'Sigma2',sigma2Draws(j));
    [~,sigma2Draws(j + 1)] = simulate(PriorMdl,X,y,'Beta',BetaDraws(:,j));
end
sigma2Draws = sigma2Draws(2:end); % Remove initial value from MCMC sample

Graph trace plots of the parameters.

figure;
for j = 1:(p + 1);
    subplot(2,2,j);
    plot(BetaDraws(j,:))
    ylabel('MCMC draw')
    xlabel('Simulation index')
    title(sprintf('Trace plot -- %s',PriorMdl.VarNames{j}));
end
figure;
plot(sigma2Draws)
ylabel('MCMC draw')
xlabel('Simulation index')
title('Trace plot -- Sigma2')

The MCMC samples appear to have converged and are mixing well.

Apply a burn in period of 1000 draws, and then compute the means and standard deviations of the MCMC samples. Compare them with estimates from estimate.

bp = 1000;
postBetaMean = mean(BetaDraws(:,(bp + 1):end),2);
postSigma2Mean = mean(sigma2Draws(:,(bp + 1):end));
postBetaStd = std(BetaDraws(:,(bp + 1):end),[],2);
postSigma2Std = std(sigma2Draws((bp + 1):end));
[summaryTbl(:,1:2),table([postBetaMean; postSigma2Mean],...
    [postBetaStd; postSigma2Std],'VariableNames',{'GibbsMean','GibbsStd'})]
ans =

  5x4 table

                   Mean          Std        GibbsMean     GibbsStd 
                 _________    __________    _________    __________

    Intercept      -24.249        8.7821      -24.293         8.748
    IPI             4.3913        0.1414       4.3917       0.13941
    E            0.0011202    0.00032931    0.0011229    0.00032875
    WR              2.4683       0.34895       2.4654       0.34364
    Sigma2          44.135         7.802       44.011        7.7816

The estimates are very close. MCMC variations account for the differences.

Consider a Bayesian linear regression model containing a one predictor, a t distributed disturbance variance with a profiled degrees of freedom parameter . That is, let:

  • .

These assumptions imply:

is vector of latent scale parameters that attributes low precision to observations that are far from the regression line. is a hyperparameter controlling the influence of on the observations.

For this problem, the Gibbs sampler is well-suited to estimate the coefficients because you can simulate the parameters of a Bayesian linear regression model conditioned on , and then simulate from its conditional distribution.

Generate responses from where and .

rng('default');
n = 100;
x = linspace(0,2,n)';
b0 = 1;
b1 = 2;
sigma = 0.5;
e = randn(n,1);
y = b0 + b1*x + sigma*e;

Introduce outlying responses by inflating all responses below by a factor of 3.

y(x < 0.25) = y(x < 0.25)*3;

Fit a linear model to the data. Plot the data and the fitted regression line.

Mdl = fitlm(x,y)
figure;
plot(Mdl);
hl = legend;
hold on;
Mdl = 


Linear regression model:
    y ~ 1 + x1

Estimated Coefficients:
                   Estimate      SE       tStat       pValue  
                   ________    _______    ______    __________

    (Intercept)     2.6814     0.28433    9.4304    2.0859e-15
    x1             0.78974     0.24562    3.2153     0.0017653


Number of observations: 100, Error degrees of freedom: 98
Root Mean Squared Error: 1.43
R-squared: 0.0954,  Adjusted R-Squared 0.0862
F-statistic vs. constant model: 10.3, p-value = 0.00177

The simulated outliers appear to influence the fitted regression line.

Implement this Gibbs sampler.

  1. Draw parameters from the posterior distribution of . To do this, deflate the observations by , create a diffuse prior model with two regression coefficients, draw a set of parameters from the posterior. The first regression coefficient corresponds to the intercept, so specify that bayeslm should not include one.

  2. Compute residuals.

  3. Draw values from the conditional posterior of .

Run the Gibbs sampler for 20000 iterations and apply a burn in period of 5000. Specify , preallocate for the posterior draws, initialize to a vector of ones.

m = 20000;
nu = 1;
burnin = 5000;
lambda = ones(n,m + 1);
estBeta = zeros(2,m + 1);
estSigma2 = zeros(1,m + 1);
for j = 1:m
    yDef = y./sqrt(lambda(:,j));
    xDef = [ones(n,1) x]./sqrt(lambda(:,j));
    PriorMdl = bayeslm(2,'Model','diffuse','Intercept',false);
    [estBeta(:,j + 1),estSigma2(1,j + 1)] = simulate(PriorMdl,xDef,yDef);
    ep = y - [ones(n,1) x]*estBeta(:,j + 1);
    sp = (nu + 1)/2;
    sc =  2./(nu + ep.^2/estSigma2(1,j + 1));
    lambda(:,j + 1) = 1./gamrnd(sp,sc);
end

It is good practice to diagnose the MCMC sampler by viewing trace plot. This example skips this important task for brevity.

Compute the mean of the draws from the posterior of the regression coefficients. Remove the burn-in-period draws.

postEstBeta = mean(estBeta(:,(burnin+1):end),2)
postEstBeta =

    1.3971
    1.7051

The estimate of the intercept is lower, and the slope is higher, than estimates returned by fitlm.

Plot the robust regression line with the regression line fitted by least squares.

h = gca;
xlim = h.XLim';
plotY = [ones(2,1) xlim]*postEstBeta;
plot(xlim,plotY,'LineWidth',2);
hl.String{4} = 'Robust Bayes';

The regression line fit using robust Bayesian regression appears to be a better fit.

The maximum a posteriori probability (MAP) estimate is the posterior mode, that is, the parameter value that yields the maximum of the posterior probability distribution function. If the posterior is analytically intractable, then you can use Monte Carlo sampling to estimate the MAP.

This example is based on Simulate Parameter Value from Prior and Posterior Distributions.

Load the Nelson-Plosser data set. Create variables for the response and predictor series.

load Data_NelsonPlosser
varNames = {'IPI' 'E' 'WR'};
X = DataTable{:,varNames};
y = DataTable{:,'GNPR'};

Create a normal-inverse-gamma conjugate prior model for the linear regression parameters. Specify the number of predictors, p, and the variable names.

p = 3;
PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',varNames)
PriorMdl = 

 
           |  Mean     Std            CI95         Positive       Distribution     
-----------------------------------------------------------------------------------
 Intercept |  0      70.7107  [-141.273, 141.273]    0.500   t (0.00, 57.74^2,  6) 
 IPI       |  0      70.7107  [-141.273, 141.273]    0.500   t (0.00, 57.74^2,  6) 
 E         |  0      70.7107  [-141.273, 141.273]    0.500   t (0.00, 57.74^2,  6) 
 WR        |  0      70.7107  [-141.273, 141.273]    0.500   t (0.00, 57.74^2,  6) 
 Sigma2    | 0.5000   0.5000    [ 0.138,  1.616]     1.000   IG(3.00,    1)        
 

Estimate the posterior distribution. Return posterior estimates of the regression coefficients.

[PosteriorMdl,estBetaMean,EstBetaCov] = estimate(PriorMdl,X,y);
Method: Analytic posterior distributions
Number of observations: 62
Number of predictors:   4
Log marginal likelihood     -259.348
 
           |   Mean      Std          CI95        Positive       Distribution      
-----------------------------------------------------------------------------------
 Intercept | -24.2494  8.7821  [-41.514, -6.985]    0.003   t (-24.25, 8.65^2, 68) 
 IPI       |   4.3913  0.1414   [ 4.113,  4.669]    1.000   t (4.39, 0.14^2, 68)   
 E         |   0.0011  0.0003   [ 0.000,  0.002]    1.000   t (0.00, 0.00^2, 68)   
 WR        |   2.4683  0.3490   [ 1.782,  3.154]    1.000   t (2.47, 0.34^2, 68)   
 Sigma2    |  44.1347  7.8020   [31.427, 61.855]    1.000   IG(34.00, 0.00069)     
 

The display includes the marginal posterior distributions. estBetaMean is a 4-by-1 vector representing the mean of the marginal posterior of . estBetaCov is a 4-by-4 matrix representing the covariance matrix of the posterior of .

Draw 10000 parameter values from the posterior distribution.

rng(1); % For reproducibility
[BetaSim,sigma2Sim] = simulate(PosteriorMdl,'NumDraws',1e5);

BetaSim is a 4-by-10000 matrix of randomly drawn regression coefficients. sigma2Sim is a 1-by-10000 vector of randomly drawn disturbance variances.

Transpose and standardize the matrix of regression coefficients. Compute the correlation matrix of the regression coefficients.

estBetaStd = sqrt(diag(EstBetaCov)');
BetaSim = BetaSim';
BetaSimStd = (BetaSim - estBetaMean')./estBetaStd;
BetaCorr = corrcov(EstBetaCov);
BetaCorr = (BetaCorr + BetaCorr')/2; % Enforce symmetry

Because the marginal posterior distributions are known, evaluate the posterior pdf at all simulated values.

betaPDF = mvtpdf(BetaSimStd,BetaCorr,68);
a = 34;
b = 0.00069;
igPDF = @(x,ap,bp)1./(gamma(ap).*bp.^ap).*x.^(-ap-1).*exp(-1./(x.*bp));...
    % Inverse gamma pdf
sigma2PDF = igPDF(sigma2Sim,a,b);

Find the simulated values that maximize the respective pdfs, that is, the posterior modes.

[~,idxMAPBeta] = max(betaPDF);
[~,idxMAPSigma2] = max(sigma2PDF);
betaMAP = BetaSim(idxMAPBeta,:);
sigma2MAP = sigma2Sim(idxMAPSigma2);

betaMAP and sigma2MAP are the MAP estimates.

Because the posterior of is symmetric and unimodal, the posterior mean and MAP should be the same. Compare the MAP estimate of with its posterior mean.

table(betaMAP',PosteriorMdl.Mu,'VariableNames',{'MAP','Mean'},...
    'RowNames',PriorMdl.VarNames)
ans =

  4x2 table

                    MAP         Mean   
                 _________    _________

    Intercept      -24.559      -24.249
    IPI             4.3964       4.3913
    E            0.0011389    0.0011202
    WR              2.4473       2.4683

The estimates are fairly close to one another.

Estimate the analytical mode of the posterior of . Compare it to the estimated MAP of .

igMode = 1/(b*(a+1))
sigma2MAP
igMode =

   41.4079


sigma2MAP =

   41.4075

The estimates are also fairly close.

Input Arguments

collapse all

Bayesian linear regression model, specified as an object in this table.

ValueBayesian Linear Regression Model Description
conjugateblm model objectDependent, normal-inverse-gamma conjugate model returned by bayeslm or estimate
semiconjugateblm model objectIndependent, normal-inverse-gamma semiconjugate model returned by bayeslm
diffuseblm model objectDiffuse prior model returned by bayeslm
empiricalblm model objectSamples from prior distributions characterize prior model returned by bayeslm or estimate
customblm model objectPrior distribution function that you declare returned by bayeslm

Typically, model objects returned by estimate represent marginal posterior distributions. When you estimate a posterior using estimate, if you specify estimation of a conditional posterior, then estimate returns the prior model.

If Mdl is a diffuseblm model, then you must also supply X and y because simulate cannot draw from an improper prior distribution.

Predictor data for the multiple linear regression model, specified as a numObservations-by-PriorMdl.NumPredictors numeric matrix. numObservations is the number of observations, and must be equal to the length of y.

If Mdl is a posterior distribution, then the columns of X must correspond to the columns of the predictor data used to estimate the posterior.

Data Types: double

Response data for the multiple linear regression model, specified as a numeric vector with numObservations elements.

Data Types: double

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'Sigma2',2 specifies simulating from the conditional posterior distribution of the regression coefficients given the data and that the specified disturbance variance is 2.

Options for All Models

collapse all

Number of draws to sample from the distribution Mdl, specified as the comma-separated pair consisting of 'NumDraws' and a positive integer.

Tip

If Mdl is an empiricalblm or a customblm, then it is good practice to specify a burn-in period using BurnIn and a thinning multiplier using Thin. For details on the adjusted sample size, see Algorithms.

Example: 'NumDraws',1e7

Data Types: double

Options for All Models Except Empirical

collapse all

Value of the regression coefficients for simulation from conditional distribution of the disturbance variance, specified as the comma-separated pair consisting of 'Beta' and a (Mdl.Intercept + Mdl.NumPredictors)-by-1 numeric vector. That is, when using a posterior distribution, simulate draws from π(σ2|y,X,β = Beta), where y is y, X is X, and Beta is the value of 'Beta'. If Mdl.Intercept is true, then Beta(1) corresponds to the model intercept. All other values correspond to the predictor variables composing the columns of X.

You cannot specify Beta and Sigma2 simultaneously.

By default, simulate does not draw from the conditional posterior of σ2.

Example: 'Beta',1:3

Data Types: double

Value of the disturbance variance for simulation from the conditional distribution of the regression coefficients, specified as the comma-separated pair consisting of 'Sigma2' and a positive numeric scalar. That is, when using a posterior distribution, simulate draws from π(β|y,X,Sigma2), where y is y, X is X, and Sigma2 is the value of 'Sigma2'.

You cannot specify Beta and Sigma2 simultaneously.

By default, simulate does not draw from the conditional posterior of β.

Example: 'Sigma2',1

Data Types: double

Options for Semiconjugate and Custom Prior Distributions

collapse all

Number of draws to remove from beginning of sample to reduce transient effects, specified as the comma-separated pair consisting of 'BurnIn' and nonnegative scalar. For details on how simulate reduces the full sample, see Algorithms.

Tip

To help you specify the appropriate burn-in period size, determine the extent of the transient behavior in the sample by specifying 'BurnIn',0, simulating a few thousand observations using simulate, and then plotting the paths.

Example: 'BurnIn',0

Data Types: double

Adjusted sample size multipler, specified as the comma-separated pair consisting of 'Thin' and a positive integer.

The actual sample size is BurnIn + NumDraws*Thin. After discarding the burn-in, simulate discards every Thin1 draws, and then retains the next. For more details on how simulate reduces the full sample, see Algorithms.

Tip

To reduce potential large serial correlation in the Monte Carlo sample or to reduce the memory consumption of the draws stored in PosteriorMdl, specify a large value of Thin.

Example: 'Thin',5

Data Types: double

Starting values of the regression coefficients for the sample, specified as the comma-separated pair consisting of 'BetaStart' and a numeric column vector with (Mdl.Intercept + Mdl.NumPredictors) elements. By default, BetaStart is the ordinary least squares estimate.

Tip

It is good practice to run simulate many times using different parameter starting values. Verify that your estimates from each run converge to similar values.

Example: 'BetaStart',[1; 2; 3]

Data Types: double

Starting values of the disturbance variance for the sample, specified as the comma-separated pair consisting of 'Sigma2Start' and a positive numeric scalar. By default, Sigma2Start is the residual mean squared error of the OLS estimator.

Tip

It is good practice to run simulate many times using different parameter starting values. Verify that your estimates from each run converge to similar values.

Example: 'Sigma2Start',4

Data Types: double

Options for Custom Prior Distributions

collapse all

Reparameterize σ2 as log(σ2) during posterior estimation and simulation, specified as the comma-separated pair consisting of 'Reparameterize' and a value in this table.

ValueDescription
falsesimulate does not reparameterize σ2.
truesimulate reparameterizes σ2 as log(σ2). simulate converts results back to the original scale, and does not change the functional form of PriorMdl.LogPDF.

Tip

If you experience numerical instabilities during the posterior estimation or simulation of σ2, then specify 'Reparameterize',true.

Example: 'Reparameterize',true

Data Types: logical

MCMC sampler, specified as the comma-separated pair consisting of 'Sampler' and a value in this table.

ValueDescription
'slice'Slice sampler
'metropolis'Random walk Metropolis sampler
'hmc'Hamiltonian Monte Carlo (HMC) sampler

Tip

  • To increase the quality of the MCMC draws, tune the sampler.

    1. Before calling simulate, specify the tuning parameters and their values by using sampleroptions. For example, to specify the slice sampler width width, use

      options = sampleroptions('Sampler',"slice",'Width',width);

    2. Specify the object containing the tuning-parameter specifications returned by sampleroptions by using the 'Options' name-value pair argument. For example, to use the tuning-parameter specifications in options, specify

      'Options',options

  • If you specify the HMC sampler, then a best practice is to provide the gradient for some variables, at least. simulate resorts the numerical computation of any missing partial derivatives (NaN values) in the gradient vector.

Example: 'Sampler',"hmc"

Data Types: string

Sampler options, specified as the comma-separated pair consisting of 'Options' and a structure array returned by sampleroptions. Use 'Options' to specify the MCMC sampler and its tuning parameter values.

Example: 'Options',sampleroptions('Sampler',"hmc")

Data Types: struct

Typical sampling-interval width around the current value in the marginal distributions for the slice sampler, specified as the comma-separated pair consisting of 'Width' and a positive numeric scalar or a (PriorMdl.Intercept + PriorMdl.NumPredictors + 1)-by-1 numeric vector of positive values. The first element corresponds to the model intercept, if one exists in the model. The next PriorMdl.NumPredictors elements correspond to the coefficients of the predictor variables ordered by the predictor data columns. The last element corresponds to the model variance.

  • If Width is a scalar, then simulate applies Width to all PriorMdl.NumPredictors + PriorMdl.Intercept + 1 marginal distributions.

  • If Width is a numeric vector, then simulate applies the first element to the intercept (if one exists), the next PriorMdl.NumPredictors elements to the regression coefficients corresponding to the predictor variables in X, and the last element to the disturbance variance.

  • If the sample size (size(X,1)) is less than 100, then Width is 10 by default.

  • If the sample size is at least 100, then, by default, simulate sets Width to the vector of corresponding posterior standard deviations, assuming a diffuse prior model (diffuseblm).

simulate dispatches Width to the slicesample function. For more details, see slicesample.

Tip

  • For maximum flexibility, specify the slice sampler width width by using the 'Options' name-value pair argument. For example:

    'Options',sampleroptions('Sampler',"slice",'Width',width)

  • The typical width of the slice sampler does not affect convergence of the MCMC sample. However, it does affect the number of required function evaluations, that is, the efficiency of the algorithm. If the width is too small, then the algorithm can implement an excessive number of function evaluations to determine the appropriate sampling width. If the width is too large, then the algorithm might have to decrease the width to an appropriate size, which requires function evaluations.

Example: 'Width',[100*ones(3,1);10]

Output Arguments

collapse all

Simulated regression coefficients, returned as a (Mdl.Intercept + Mdl.NumPredictors)-by-NumDraws numeric matrix. The first row corresponds to the intercept, if one exists in the model, and all other rows correspond the regression coefficients of the predictor variables composing the columns of X. Columns correspond to individual, successive, independent draws from the distribution.

Simulated disturbance variance, returned as a 1-by-NumDraws numeric vector of positive values. Columns are commensurate with the columns of BetaSim.

Limitations

  • simulate cannot draw values from an improper distribution, that is, a distribution whose density does not integrate to 1.

  • If Mdl is an empiricalblm model object, then you cannot specify Beta or Sigma2. That is, you cannot simulate from the conditional posterior distributions using an empirical distribution.

More About

collapse all

Bayesian Linear Regression Model

A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables.

For times t = 1,...,T:

  • yt is the observed response.

  • xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t.

  • β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables composing the columns of xt.

  • εt is the random disturbance having a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is

    (β,σ2|y,x)=t=1Tϕ(yt;xtβ,σ2).

    ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;.

Before considering the data, a joint prior distribution assumption is imposed on (β,σ2). In a Bayesian analysis, the beliefs about the distribution of the parameters are updated using information about the parameters gleaned from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.

Algorithms

  • Whenever simulate must estimate a posterior distribution (for example, when Mdl represents a prior distribution and you supply X and y) and the posterior is analytically tractable, simulate simulates directly from the posterior. Otherwise, simulate resorts to Monte Carlo simulation to estimate the posterior. For more details, see Posterior Estimation and Inference.

  • If Mdl is a joint posterior model, then simulate simulates data from it differently than if Mdl is a joint prior model and you supply X and y. Hence, if you set the same random seed, and then generate random values both ways, then you will not obtain the same values. However, corresponding empirical distributions based on a sufficient number of draws are effectively equivalent.

  • This figure describes how simulate reduces the sample using the values of NumDraws, Thin, and BurnIn.

    Rectangles represent successive draws from the distribution. simulate removes the white rectangles from the sample. The remaining NumDraws black rectangles compose the sample.

  • If Mdl is a semiconjugateblm model object, then simulate samples from the posterior distribution by applying the Gibbs sampler. Specifically and by default:

    1. simulate uses the default value of Sigma2Start for σ2, and draws a value of β from π(β|σ2,X,y).

    2. simulate draws a value of σ2 from π(σ2|β,X,y) using the previously generated value of β.

    3. Repeat steps 1 and 2 until convergence. To assess convergence, draw a trace plot of the sample.

    If you specify BetaStart, then simulate draws a value of σ2 from π(σ2|β,X,y) to start the Gibbs sampler. simulate does not return this generated value of σ2.

  • If Mdl is an empiricalblm model object and you do not supply X and y, then simulate draws from Mdl.BetaDraws and Mdl.Sigma2Draws. If NumDraws is fewer than or equal to numel(Mdl.Sigma2Draws), then simulate returns the first NumDraws elements of Mdl.BetaDraws and Mdl.Sigma2Draws as random draws for the corresponding parameter. Otherwise, simulate randomly resamples NumDraws elements from Mdl.BetaDraws and Mdl.Sigma2Draws.

  • If Mdl is a customblm model object, then simulate uses an MCMC sampler to draw from the posterior distribution. At each iteration, the software concatenates the current values of the regression coefficients and disturbance variance into a (Mdl.Intercept + Mdl.NumPredictors + 1)-by-1 vector, and passes it to Mdl.LogPDF. The value of the disturbance variance is the last element of this vector.

  • The HMC sampler requires both the log density and its gradient. The gradient should be a (NumPredictors+Intercept+1)-by-1 vector. If the derivatives of certain parameters are difficult to compute, then, in the corresponding locations of the gradient, supply NaN values instead. simulate replaces NaN values with numerical derivatives.

Introduced in R2017a

Was this topic helpful?