# Documentation

### This is machine translation

Translated by
Mouseover text to see original. Click the button below to return to the English verison of the page.

# empiricalblm

Bayesian linear regression model with samples from prior or posterior distributions

## Description

The Bayesian linear regression model object `empiricalblm` contains samples from the prior distributions of β and σ2, which MATLAB® uses to characterize the prior or posterior distributions.

The data likelihood is $\prod _{t=1}^{T}\varphi \left({y}_{t};{x}_{t}\beta ,{\sigma }^{2}\right),$ where ϕ(yt;xtβ,σ2) is the Gaussian probability density evaluated at yt with mean xtβ and variance σ2. Because the form of the prior distribution functions are unknown, the resulting posterior distributions are not analytically tractable. Hence, to estimate or simulate from posterior distributions, MATLAB implements sampling importance resampling.

You can create a Bayesian linear regression model with an empirical prior directly using `bayeslm` or `empiricalblm`. However, for empirical priors, estimating the posterior distribution requires that the prior closely resemble the posterior. Hence, empirical models are better suited for updating posterior distributions estimated using Monte Carlo sampling (e.g., semiconjugate and custom prior models) given new data.

## Creation

### Empirical Model Objects Returned by `estimate`

For semiconjugate, empirical, or custom prior models, `estimate` estimates the posterior distribution using Monte Carlo sampling. That is, `estimate` characterizes the posterior distribution by a large number of draws from that distribution. `estimate` stores the draws in the `BetaDraws` and `Sigma2Draws` properties of the returned Bayesian linear regression model object. Hence, when you estimate `semiconjugateblm`, `empiricalblm`, or `customblm` model objects, `estimate` returns an `empiricalblm` model object.

### Direct Empirical Model Creation

If you want to update an estimated posterior distribution using new data, and you have draws from the posterior distribution of β and σ2, then you can create an empirical model using `empiricalblm`.

## Syntax

#### Description

example

````PriorMdl = empiricalblm(NumPredictors,'BetaDraws',BetaDraws,'Sigma2Draws',Sigma2Draws)` creates a Bayesian linear regression model object (`PriorMdl`) composed of `NumPredictors` predictors and an intercept. The random samples from the prior distributions of β and σ2, `BetaDraws` and `Sigma2Draws`, respectively, characterize the prior distributions. `PriorMdl` is a template defining the prior distributions and dimensionality of β.```

example

````PriorMdl = empiricalblm(NumPredictors,'BetaDraws',BetaDraws,'Sigma2Draws',Sigma2Draws,Name,Value)` uses additional options specified by one or more `Name,Value` pair arguments. `Name` is a property name, except `NumPredictors`, and `Value` is the corresponding value. `Name` must appear inside single quotes (`''`). You can specify several `Name,Value` pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.```

## Properties

expand all

You can set property values when you create the model object using name-value pair argument syntax, or after model creation using dot notation. For example, to specify that there is no model intercept in `PriorMdl`, a Bayesian linear regression model containing three model coefficients, enter

`PriorMdl.Intercept = false;`

Number of predictor variables in the Bayesian multiple linear regression model, specified as a nonnegative integer.

`NumPredictors` must be the same as the number of columns in your predictor data, which you specify during model estimation or simulation.

When specifying `NumPredictors`, exclude any intercept term for the value.

After creating a model, if you change the value `NumPredictors` using dot notation, then `VarNames` reverts to its default value.

Example: `'NumPredictors',2`

Data Types: `double`

Indicate whether to include regression model intercept, specified as the comma-separated pair consisting of `'Intercept'` and a value in this table.

ValueDescription
`false`Exclude an intercept from the regression model. Hence, β is a `p`-dimensional vector, where `p` is the value of the `NumPredictors` property.
`true`Include an intercept in the regression model. Hence, β is a (`p` + 1)-dimensional vector. MATLAB prepends a T-by-1 vector of ones to the predictor data during estimation and simulation.

If you include a column of ones in the predictor data for an intercept term, then set `Intercept` to `false`.

Example: `'Intercept',false`

Predictor variable names for displays, specified as a string vector or cell vector of character vectors. `VarNames` must contain `NumPredictors` elements. `VarNames(j)` is the name of variable in column `j` of the predictor data set, which you specify during estimation, simulation, and forecasting.

The default is `{'Beta(1)','Beta(2),...,Beta(p)}`, where `p` is the value of `NumPredictors`.

Example: `'VarNames',["UnemploymentRate"; "CPI"]`

Data Types: `string` | `cell` | `char`

Random sample from the prior distribution of β, specified as a (`Intercept` + `NumPredictors`)-by-`NumDraws` numeric matrix. Rows correspond to regression coefficients; the first row corresponds to the intercept and the following rows correspond to columns in the predictor data. Columns correspond to successive draws from the prior distribution.

`BetaDraws` and `SigmaDraws` must have the same number of columns.

`NumDraws` should be reasonably large.

Data Types: `double`

Random sample from the prior distribution of σ2, specified as a 1-by-`NumDraws` numeric matrix. Columns correspond to successive draws from the prior distribution.

`BetaDraws` and `Sigma2Draws` must have the same number of columns.

`NumDraws` should be reasonably large.

Data Types: `double`

## Object Functions

 `estimate` Fit parameters of Bayesian linear regression model to data `simulate` Simulate regression coefficients and disturbance variance of Bayesian linear regression model `forecast` Forecast responses of Bayesian linear regression model `plot` Visualize prior and posterior densities of Bayesian linear regression model parameters `summarize` Distribution summary statistics of Bayesian linear regression model

## Examples

expand all

Consider the multiple linear regression model that predicts U.S. real gross national product (`GNPR`) using a linear combination of industrial production index (`IPI`), total employment (`E`), and real wages (`WR`).

For all , is a series of independent Gaussian disturbances with a mean of 0 and variance .

Assume that the prior distributions are:

• . is a 4-by-1 vector of means and is a scaled 4-by-4 positive definite covariance matrix.

• . and are the shape and scale, respectively, of an inverse gamma distribution.

These assumptions and the data likelihood imply a normal-inverse-gamma semiconjugate model. That is, the conditional posteriors are conjugate to the prior with respect to the data likelihood, but the marginal posterior is analytically intractable.

Create a normal-inverse-gamma semiconjugate prior model for the linear regression parameters. Specify the number of predictors, `p`.

```p = 3; PriorMdl = bayeslm(p,'ModelType','semiconjugate') ```
```PriorMdl = | Mean Std CI95 Positive Distribution ------------------------------------------------------------------------------- Intercept | 0 100 [-195.996, 195.996] 0.500 N (0.00, 100.00^2) Beta(1) | 0 100 [-195.996, 195.996] 0.500 N (0.00, 100.00^2) Beta(2) | 0 100 [-195.996, 195.996] 0.500 N (0.00, 100.00^2) Beta(3) | 0 100 [-195.996, 195.996] 0.500 N (0.00, 100.00^2) Sigma2 | 0.5000 0.5000 [ 0.138, 1.616] 1.000 IG(3.00, 1) ```

`Mdl` is a `semiconjugateblm` Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance. At the command window, `bayeslm` displays a summary of the prior distributions.

Load the Nelson-Plosser data set. Create variables for the response and predictor series.

```load Data_NelsonPlosser VarNames = {'IPI'; 'E'; 'WR'}; X = DataTable{:,VarNames}; y = DataTable{:,'GNPR'}; ```

Estimate the marginal posterior distributions of and .

```rng(1); % For reproducibility PosteriorMdl = estimate(PriorMdl,X,y); ```
```Method: Gibbs sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution ------------------------------------------------------------------------- Intercept | -23.9922 9.0520 [-41.734, -6.198] 0.005 Empirical Beta(1) | 4.3929 0.1458 [ 4.101, 4.678] 1.000 Empirical Beta(2) | 0.0011 0.0003 [ 0.000, 0.002] 0.999 Empirical Beta(3) | 2.4711 0.3576 [ 1.762, 3.178] 1.000 Empirical Sigma2 | 46.7474 8.4550 [33.099, 66.126] 1.000 Empirical ```

`PosteriorMdl` is an `empiricalblm` model object storing draws from the posterior distributions of and given the data. `estimate` displays a summary of the marginal posterior distributions to the command window. Rows of the summary correspond to regression coefficients and the disturbance variance, and columns to characteristics of the posterior distribution. The characteristics include:

• `CI95`, which contains the 95% Bayesian equitailed credible intervals for the parameters. For example, the posterior probability that the regression coefficient of `WR` is in [1.762, 3.178] is 0.95.

• `Positive`, which contains the posterior probability that the parameter is greater than 0. For example, the probability that the intercept is greater than 0 is 0.005.

In this case, the marginal posterior is analytically intractable. Hence, `estimate` uses Gibbs sampling to draw from the posterior and estimate the posterior characteristics.

This example is based on Create Empirical Prior Model.

Create a normal-inverse-gamma semiconjugate prior model for the linear regression parameters. Specify the number of predictors, `p`, and the names of the regression coefficients.

```p = 3; PriorMdl = bayeslm(p,'ModelType','semiconjugate','VarNames',["IPI" "E" "WR"]); ```

Load the Nelson-Plosser data set. Partition the data by reserving the last 5 periods in the series.

```load Data_NelsonPlosser X0 = DataTable{1:end-5,PriorMdl.VarNames(2:end)}; y0 = DataTable{1:end-5,'GNPR'}; X1 = DataTable{(end-4):end,PriorMdl.VarNames(2:end)}; y1 = DataTable{(end-4):end,'GNPR'}; ```

Estimate the marginal posterior distributions of and .

```rng(1); % For reproducibility PosteriorMdl0 = estimate(PriorMdl,X0,y0); ```
```Method: Gibbs sampling with 10000 draws Number of observations: 57 Number of predictors: 4 | Mean Std CI95 Positive Distribution --------------------------------------------------------------------------- Intercept | -34.3887 10.5218 [-55.350, -13.615] 0.001 Empirical IPI | 3.9076 0.2786 [ 3.356, 4.459] 1.000 Empirical E | 0.0011 0.0003 [ 0.000, 0.002] 0.999 Empirical WR | 3.2146 0.4967 [ 2.228, 4.196] 1.000 Empirical Sigma2 | 45.3098 8.5597 [31.620, 64.972] 1.000 Empirical ```

`PosteriorMdl0` is an `empiricalblm` model object storing the Gibbs-sampling draws from the posterior distribution.

Update the posterior distribution based on the last 5 periods of data by passing those observations and the posterior distribution to `estimate`.

```PosteriorMdl1 = estimate(PosteriorMdl0,X1,y1); ```
```Method: Importance sampling/resampling with 10000 draws Number of observations: 5 Number of predictors: 4 | Mean Std CI95 Positive Distribution ------------------------------------------------------------------------- Intercept | -24.3152 9.3408 [-41.163, -5.301] 0.008 Empirical IPI | 4.3893 0.1440 [ 4.107, 4.658] 1.000 Empirical E | 0.0011 0.0004 [ 0.000, 0.002] 0.998 Empirical WR | 2.4763 0.3694 [ 1.630, 3.170] 1.000 Empirical Sigma2 | 46.5211 8.2913 [33.646, 65.402] 1.000 Empirical ```

To update the posterior distributions based on draws, `estimate` uses sampling importance resampling.

This example is based on Estimate Marginal Posterior Distribution.

Create and estimate the marginal posterior distributions. Extract posterior distribution summary statistics for .

```p = 3; PriorMdl = bayeslm(p,'ModelType','semiconjugate','VarNames',["IPI" "E" "WR"]); load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'}; rng(1); % For reproducibility [PosteriorMdl,estBeta,estBetaCov] = estimate(PriorMdl,X,y); ```
```Method: Gibbs sampling with 10000 draws Number of observations: 62 Number of predictors: 4 | Mean Std CI95 Positive Distribution ------------------------------------------------------------------------- Intercept | -23.9922 9.0520 [-41.734, -6.198] 0.005 Empirical IPI | 4.3929 0.1458 [ 4.101, 4.678] 1.000 Empirical E | 0.0011 0.0003 [ 0.000, 0.002] 0.999 Empirical WR | 2.4711 0.3576 [ 1.762, 3.178] 1.000 Empirical Sigma2 | 46.7474 8.4550 [33.099, 66.126] 1.000 Empirical ```

Suppose that if the coefficient of real wages is below 2.5, then a policy is enacted. Although the posterior distribution of `WR` is known, and so you can calculate probabilities directly, you can estimate the probability using Monte Carlo simulation instead.

Draw `1e6` samples from the marginal posterior distribution of .

```NumDraws = 1e6; rng(1); BetaSim = simulate(PosteriorMdl,'NumDraws',NumDraws); ```

`BetaSim` is a 4-by- `1e6` matrix containing the draws. Rows correspond to the regression coefficient and columns to successive draws.

Isolate the draws corresponding to the coefficient of real wages, and then identify which draws are less than 2.5.

```isWR = PosteriorMdl.VarNames == "WR"; wrSim = BetaSim(isWR,:); isWRLT2p5 = wrSim < 2.5; ```

Find the marginal posterior probability that the regression coefficient of `WR` is below 2.5 by computing the proportion of draws that are less than 2.5.

```probWRLT2p5 = mean(isWRLT2p5) ```
```probWRLT2p5 = 0.5283 ```

The posterior probability that the coefficient of real wages is less than 2.5 is about `0.53`.

This example is based on Estimate Marginal Posterior Distribution.

Create and estimate the marginal posterior distributions. Hold out the last 10 periods of data from estimation so that they can be used for forecasting real GNP. Turn the estimation display off.

```p = 3; PriorMdl = bayeslm(p,'ModelType','semiconjugate','VarNames',["IPI" "E" "WR"]); load Data_NelsonPlosser fhs = 10; % Forecast horizon size X = DataTable{1:(end - fhs),PriorMdl.VarNames(2:end)}; y = DataTable{1:(end - fhs),'GNPR'}; XF = DataTable{(end - fhs + 1):end,PriorMdl.VarNames(2:end)}; % Future predictor data yFT = DataTable{(end - fhs + 1):end,'GNPR'}; % True future responses rng(1); % For reproducibility PosteriorMdl = estimate(PriorMdl,X,y,'Display',false);```

Forecast responses using the posterior predictive distribution and using the future predictor data `XF`. Plot the true values of the response and the forecasted values.

```yF = forecast(PosteriorMdl,XF); figure; plot(dates,DataTable.GNPR); hold on plot(dates((end - fhs + 1):end),yF) h = gca; hp = patch([dates(end - fhs + 1) dates(end) dates(end) dates(end - fhs + 1)],... h.YLim([1,1,2,2]),[0.8 0.8 0.8])```
```hp = Patch with properties: FaceColor: [0.8000 0.8000 0.8000] FaceAlpha: 1 EdgeColor: [0 0 0] LineStyle: '-' Faces: [1 2 3 4] Vertices: [4x2 double] Show all properties ```
```uistack(hp,'bottom'); legend('True GNPR','Forecasted GNPR','Forecast Horizon','Location','NW') title('Real Gross National Product: 1909 - 1970'); ylabel('rGNP'); xlabel('Year'); hold off```

`yF` is a 10-by-1 vector of future values of real GNP corresponding to the future predictor data.

Estimate the forecast root mean squared error (RMSE).

`frmse = sqrt(mean((yF - yFT).^2))`
```frmse = 25.1938 ```

Forecast RMSE is a relative measure of forecast accuracy. Specifically, you estimate several models using different assumptions. The model with the lowest forecast RMSE is the best performing model of the ones being compared.

expand all

## Algorithms

• After implementing sampling importance resampling to sample from the posterior distribution, `estimate`, `simulate`, and `forecast` compute the effective sample size (ESS), which is the number of samples required to yield reasonable posterior statistics and inferences. Its formula is

`$ESS=\frac{1}{{\sum }_{j}{w}_{j}^{2}}.$`
If ESS < `0.01*NumDraws`, then MATLAB throws a warning. The warning implies that, given the sample from the prior distribution, the sample from the proposal distribution is too small to yield good quality posterior statistics and inferences.

• If the effective sample size is too small, then:

• Increase the sample size of the draws from the prior distributions.

• Adjust the prior distribution hyperparameters, and then resample from them.

• Specify `BetaDraws` and `Sigma2Draws` as samples from informative prior distributions. That is, if the proposal draws come from nearly flat distributions, then the algorithm can be inefficient.