# Documentation

### This is machine translation

Translated by
Mouseover text to see original. Click the button below to return to the English verison of the page.

# forecast

Forecast responses of Bayesian linear regression model

## Syntax

``yF = forecast(Mdl,XF)``
``yF = forecast(Mdl,XF,X,y)``
``yF = forecast(___,Name,Value)``
``````[yF,YFCov] = forecast(___)``````

## Description

example

````yF = forecast(Mdl,XF)` returns `numPeriods` forecasted responses from the Bayesian linear regression model `Mdl` given the predictor data in `XF`, a matrix with `numPeriods` rows.To estimate the forecast, `forecast` uses the mean of the `numPeriods`-dimensional posterior predictive distribution. If `Mdl` is a joint prior model (returned by `bayeslm`), then `forecast` uses only the joint prior distribution and the innovation distribution to form the predictive distribution.If `Mdl` is a posterior model (returned by `estimate`), then `forecast` uses the posterior predictive distribution. `NaN`s in the data indicate missing values, which `forecast` removes using list-wise deletion.```

example

````yF = forecast(Mdl,XF,X,y)` forecasts using the posterior predictive distribution produced or updated by incorporating the predictor data `X` and corresponding response data `y`. If `Mdl` is a joint prior model, then `forecast` produces the posterior predictive distribution by “updating” the prior model with information about the parameters that it gleans from the data.If `Mdl` is a posterior model, then `forecast` updates the posteriors with information about the parameters that it gleans from the additional data. That is, the complete data likelihood is composed of the additional data `X` and `y`, and the data that created `Mdl`. ```

example

````yF = forecast(___,Name,Value)` uses any of the input arguments in the previous syntaxes and additional options specified by one or more `Name,Value` pair arguments. For example, you can specify a value for one of β or σ2 to forecast from the conditional predictive distribution of one parameter given the specified value of the other parameter.```

example

``````[yF,YFCov] = forecast(___)``` additionally returns the covariance matrix of the `numPeriods`-dimensional posterior predictive distribution. Standard deviations of the forecasts are the square roots of the diagonal elements.```

## Examples

collapse all

Consider the multiple linear regression model that predicts U.S. real gross national product (`GNPR`) using a linear combination of industrial production index (`IPI`), total employment (`E`), and real wages (`WR`).

For all , is a series of independent Gaussian disturbances with a mean of 0 and variance .

Assume that the prior distributions are:

• . is a 4-by-1 vector of means and is a scaled 4-by-4 positive definite covariance matrix.

• . and are the shape and scale, respectively, of an inverse gamma distribution.

These assumptions and the data likelihood imply a normal-inverse-gamma conjugate model.

Create a normal-inverse-gamma conjugate prior model for the linear regression parameters. Specify the number of predictors, `p`, and the variable names.

```p = 3; VarNames = ["IPI" "E" "WR"]; PriorMdl = bayeslm(p,'ModelType','conjugate','VarNames',VarNames); ```

`Mdl` is a `conjugateblm` Bayesian linear regression model object representing the prior distribution of the regression coefficients and disturbance variance.

Load the Neloson-Plosser data set. Create variables for the predictor and response data. Hold out the last 10 periods of data from estimation so that they can be used for forecasting real GNP.

```load Data_NelsonPlosser fhs = 10; % Forecast horizon size X = DataTable{1:(end - fhs),PriorMdl.VarNames(2:end)}; y = DataTable{1:(end - fhs),'GNPR'}; XF = DataTable{(end - fhs + 1):end,PriorMdl.VarNames(2:end)}; % Future predictor data yFT = DataTable{(end - fhs + 1):end,'GNPR'}; % True future responses ```

Estimate the marginal posterior distributions. Turn the estimation display off.

```PosteriorMdl = estimate(PriorMdl,X,y,'Display',false); ```

`PosteriorMdl` is a `conjugateblm` model object that contains the posetrior distributions of and .

Forecast responses using the posterior predictive distribution and using the future predictor data `XF`. Plot the true values of the response and the forecasted values.

```yF = forecast(PosteriorMdl,XF); figure; plot(dates,DataTable.GNPR); hold on plot(dates((end - fhs + 1):end),yF) h = gca; p = patch([dates(end - fhs + 1) dates(end) dates(end) dates(end - fhs + 1)],... h.YLim([1,1,2,2]),[0.8 0.8 0.8]); uistack(p,'bottom'); legend('True GNPR','Forecasted GNPR','Forecast Horizon','Location','NW') title('Real Gross National Product: 1909 - 1970'); ylabel('rGNP'); xlabel('Year'); hold off ```

`yF` is a 10-by-1 vector of future values of real GNP corresponding to the future predictor data.

Estimate the forecast root mean squared error (RMSE).

```frmse = sqrt(mean((yF - yFT).^2)) ```
```frmse = 25.5397 ```

Forecast RMSE is a relative measure of forecast accuracy. Specifically, you estimate several models using different assumptions. The model with the lowest forecast RMSE is the best performing model of the ones being compared.

This example is based on Forecast Responses Using Posterior Predictive Distribution.

Create a normal-inverse-gamma semiconjugate prior model for the linear regression parameters. Specify the number of predictors, `p`, and the names of the regression coefficients.

```p = 3; PriorMdl = bayeslm(p,'ModelType','semiconjugate','VarNames',["IPI" "E" "WR"]); ```

Load the Nelson-Plosser data set. Create variables for the response and predictor series.

```load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'}; ```

Hold out the last 10 periods of data from estimation so that they can be used for forecasting real GNP. Turn the estimation display off.

```fhs = 10; % Forecast horizon size X = DataTable{1:(end - fhs),PriorMdl.VarNames(2:end)}; y = DataTable{1:(end - fhs),'GNPR'}; XF = DataTable{(end - fhs + 1):end,PriorMdl.VarNames(2:end)}; % Future predictor data yFT = DataTable{(end - fhs + 1):end,'GNPR'}; % True future responses ```

Forecast responses using the posterior predictive distribution and using the future predictor data `XF`. Specify the in-sample observations `X` and `y` (that is, the observations from which MATLAB® composes the posterior).

```yF = forecast(PriorMdl,XF,X,y) ```
```yF = 491.5404 518.1725 539.0625 566.7594 597.7005 633.4666 644.7270 672.7937 693.5321 678.2268 ```

This example is based on Forecast Responses Using Posterior Predictive Distribution.

Create a normal-inverse-gamma semiconjugate prior model for the linear regression parameters. Specify the number of predictors, `p`, and the names of the regression coefficients.

```p = 3; PriorMdl = bayeslm(p,'ModelType','semiconjugate','VarNames',["IPI" "E" "WR"]);```

Load the Nelson-Plosser data set. Create variables for the response and predictor series.

```load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'};```

Hold out the last 10 periods of data from estimation so that they can be used for forecasting real GNP. Turn the estimation display off.

```fhs = 10; % Forecast horizon size X = DataTable{1:(end - fhs),PriorMdl.VarNames(2:end)}; y = DataTable{1:(end - fhs),'GNPR'}; XF = DataTable{(end - fhs + 1):end,PriorMdl.VarNames(2:end)}; % Future predictor data yFT = DataTable{(end - fhs + 1):end,'GNPR'}; % True future responses```

Forecast responses using the conditional posterior predictive distribution of beta given and using the future predictor data `XF`. Specify the in-sample observations `X` and `y` (that is, the observations from which MATLAB® composes the posterior). Plot the true values of the response and the forecasted values.

```yF = forecast(PriorMdl,XF,X,y,'Sigma2',2); figure; plot(dates,DataTable.GNPR); hold on plot(dates((end - fhs + 1):end),yF) h = gca; hp = patch([dates(end - fhs + 1) dates(end) dates(end) dates(end - fhs + 1)],... h.YLim([1,1,2,2]),[0.8 0.8 0.8])```
```hp = Patch with properties: FaceColor: [0.8000 0.8000 0.8000] FaceAlpha: 1 EdgeColor: [0 0 0] LineStyle: '-' Faces: [1 2 3 4] Vertices: [4x2 double] Show all properties ```
```uistack(hp,'bottom'); legend('True GNPR','Forecasted GNPR','Forecast Horizon','Location','NW') title('Real Gross National Product: 1909 - 1970'); ylabel('rGNP'); xlabel('Year'); hold off```

This example is based on Forecast Responses Using Posterior Predictive Distribution.

Create a normal-inverse-gamma semiconjugate prior model for the linear regression parameters. Specify the number of predictors, `p`, and the names of the regression coefficients.

```p = 3; PriorMdl = bayeslm(p,'ModelType','semiconjugate','VarNames',["IPI" "E" "WR"]); ```

Load the Nelson-Plosser data set. Create variables for the response and predictor series.

```load Data_NelsonPlosser X = DataTable{:,PriorMdl.VarNames(2:end)}; y = DataTable{:,'GNPR'}; ```

Hold out the last 10 periods of data from estimation so that they can be used for forecasting real GNP. Turn the estimation display off.

```fhs = 10; % Forecast horizon size X = DataTable{1:(end - fhs),PriorMdl.VarNames(2:end)}; y = DataTable{1:(end - fhs),'GNPR'}; XF = DataTable{(end - fhs + 1):end,PriorMdl.VarNames(2:end)}; % Future predictor data yFT = DataTable{(end - fhs + 1):end,'GNPR'}; % True future responses ```

Forecast responses and their covariance matrix using the posterior predictive distribution and using the future predictor data `XF`. Specify the in-sample observations `X` and `y` (that is, the observations from which MATLAB® composes the posterior).

```[yF,YFCov] = forecast(PriorMdl,XF,X,y); ```

Because the predictive posterior distribution is not analytical, a reasonable approximation to a set of 95% credible intervals is

for all in the forecast horizon. Estimate 95% credible intervals for the forecasts using this formula.

```n = sum(all(~isnan([X y]'))); cil = yF - norminv(0.975)*sqrt(diag(YFCov)); ciu = yF + norminv(0.975)*sqrt(diag(YFCov)); ```

Plot the data, forecasts, and forecast intervals.

```figure; plot(dates(end-30:end),DataTable.GNPR(end-30:end)); hold on h = gca; plot(dates((end - fhs + 1):end),yF) plot(dates((end - fhs + 1):end),[cil ciu],'k--') hp = patch([dates(end - fhs + 1) dates(end) dates(end) dates(end - fhs + 1)],... h.YLim([1,1,2,2]),[0.8 0.8 0.8]); uistack(hp,'bottom'); legend('True GNPR','Forecasted GNPR','Forecast horizon',... 'Credible interval','Location','NW') title('Real Gross National Product: 1909 - 1970'); ylabel('rGNP'); xlabel('Year'); hold off ```

## Input Arguments

collapse all

Bayesian linear regression model, specified as an object in this table.

ValueBayesian Linear Regression Model Description
`conjugateblm` model objectDependent, normal-inverse-gamma conjugate model returned by `bayeslm` or `estimate`
`semiconjugateblm` model objectIndependent, normal-inverse-gamma semiconjugate model returned by `bayeslm`
`diffuseblm` model objectDiffuse prior model returned by `bayeslm`
`empiricalblm` model objectSamples from prior distributions characterize prior model returned by `bayeslm` or `estimate`
`customblm` model objectPrior distribution function that you declare returned by `bayeslm`

Typically, model objects returned by `estimate` represent marginal posterior distributions. When you estimate a posterior using `estimate`, if you specify estimation of a conditional posterior, then `estimate` returns the prior model.

Forecast-horizon predictor data, specified as a `numPeriods`-by-`PriorMdl.NumPredictors` numeric matrix. `numPeriods` determines the length of the forecast horizon. Columns of `XF` must be commensurate with the columns of any other predictor data set, i.e., `X` or those that went into forming the posterior distribution in `Mdl`.

Data Types: `double`

Predictor data for the multiple linear regression model, specified as a `numObservations`-by-`PriorMdl.NumPredictors` numeric matrix. `numObservations` is the number of observations, and must be equal to the length of `y`.

If `Mdl` is a posterior distribution, then the columns of `X` must correspond to the columns of the predictor data used to estimate the posterior.

Data Types: `double`

Response data for the multiple linear regression model, specified as a numeric vector with `numObservations` elements.

Data Types: `double`

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside single quotes (`' '`). You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `'Sigma2',2` specifies forcasting from the conditional predictive distribution given that the specified disturbance variance is `2`.

#### Options for All Models Except Empirical

collapse all

Value of the regression coefficients for forecasting from the conditional predictive distribution given the regression coefficients, specified as the comma-separated pair consisting of `'Beta'` and a (`Mdl.Intercept` + `Mdl.NumPredictors`)-by-1 numeric vector. That is, when using a posterior distribution, `forecast` forecasts from $\pi \left(\stackrel{^}{y}|y,X,\beta =Beta\right)$, where `y` is `y`, `X` is `X`, and `Beta` is the value of `'Beta'`. If `Mdl.Intercept` is `true`, then `Beta(1)` corresponds to the model intercept. All other values correspond to the predictor variables composing the columns of `X`.

You cannot specify `Beta` and `Sigma2` simultaneously.

By default, `forecast` does not forecast from the conditional predictive distribution given β.

Example: `'Beta',1:3`

Data Types: `double`

Value of the disturbance variance forecasting from the conditional predictive distribution given the disturbance variance, specified as the comma-separated pair consisting of `'Sigma2'` and a positive numeric scalar. That is, when using a posterior distribution, `forecast` draws from $\pi \left(\stackrel{^}{y}|y,X,{\sigma }^{2}=Sigma2\right)$, where `y` is `y`, `X` is `X`, and `Sigma2` is the value of `'Sigma2'`.

You cannot specify `Beta` and `Sigma2` simultaneously.

By default, `forecast` does not draw from the conditional predictive posterior of σ2.

Example: `'Sigma2',1`

Data Types: `double`

#### Options for Semiconjugate, Empirical, and Custom Prior Distributions

collapse all

Monte Carlo simulation adjusted sample size, specified as the comma-separated pair consisting of `'NumDraws'` and a positive integer. `forecast` actually draws `BurnIn``NumDraws*``Thin` samples. Hence, `forecast` bases the estimates off of `NumDraws` samples. For details on how `forecast` reduces the full Monte Carlo sample, see Algorithms.

If `Mdl` is a `semiconjugateblm` model and you specify `Beta` or `Sigma2`, then MATLAB® ignores `NumDraws`.

Example: `'NumDraws',1e7`

Data Types: `double`

#### Options for Semiconjugate and Custom Prior Distributions

collapse all

Number of draws to remove from beginning of Monte Carlo sample to reduce transient effects, specified as the comma-separated pair consisting of `'BurnIn'` and nonnegative scalar. For details on how `forecast` reduces the full Monte Carlo sample, see Algorithms.

### Tip

To help you specify the appropriate burn-in period size, determine the extent of the transient behavior in the Monte Carlo sample by specifying `'BurnIn',0`, simulating a few thousand observations using `simulate`, and then plotting the paths.

Example: `'BurnIn',0`

Data Types: `double`

Monte Carlo adjusted sample size multipler, specified as the comma-separated pair consisting of `'Thin'` and a positive integer.

The actual Monte Carlo sample size is `BurnIn` + `NumDraws``*Thin`. After discarding the burn-in, `forecast` discards every `Thin``1` draws, and then retains the next. For details on how `forecast` reduces the full Monte Carlo sample, see Algorithms.

### Tip

To reduce potential large serial correlation in the Monte Carlo sample or to reduce the memory consumption of the draws stored in `Mdl`, specify a large value of `Thin`.

Example: `'Thin',5`

Data Types: `double`

Starting values of the regression coefficients for the Markov Chain Monte Carlo (MCMC) sample, specified as the comma-separated pair consisting of `'BetaStart'` and a numeric column vector with (`PriorMdl.Intercept` + `PriorMdl.NumPredictors`) elements. By default, `BetaStart` is the ordinary least squares estimate.

### Tip

It is good practice to run `forecast` many times using different parameter starting values. Verify that the solutions from each run converge to similar values.

Example: `'BetaStart',[1; 2; 3]`

Data Types: `double`

Starting values of the disturbance variance for the MCMC sample, specified as the comma-separated pair consisting of `'Sigma2Start'` and a positive numeric scalar. By default, `Sigma2Start` is the residual mean squared error of the OLS estimator.

### Tip

It is good practice to run `forecast` many times using different parameter starting values. Verify that the solutions from each run converge to similar values.

Example: `'Sigma2Start',4`

Data Types: `double`

#### Options for Custom Prior Distributions

collapse all

Reparameterize σ2 as log(σ2) during posterior estimation and simulation, specified as the comma-separated pair consisting of `'Reparameterize'` and a value in this table.

ValueDescription
`false``forecast` does not reparameterize σ2.
`true``forecast` reparameterizes σ2 as log(σ2). `forecast` converts results back to the original scale, and does not change the functional form of `PriorMdl.LogPDF`.

### Tip

If you experience numerical instabilities during the posterior estimation or simulation of σ2, then specify `'Reparameterize',true`.

Example: `'Reparameterize',true`

Data Types: `logical`

MCMC sampler, specified as the comma-separated pair consisting of `'Sampler'` and a value in this table.

ValueDescription
`'slice'`Slice sampler
`'metropolis'`Random walk Metropolis sampler
`'hmc'`Hamiltonian Monte Carlo (HMC) sampler

### Tip

• To increase the quality of the MCMC draws, tune the sampler.

1. Before calling `forecast`, specify the tuning parameters and their values by using `sampleroptions`. For example, to specify the slice sampler width `width`, use

`options = sampleroptions('Sampler',"slice",'Width',width);`

2. Specify the object containing the tuning-parameter specifications returned by `sampleroptions` by using the `'Options'` name-value pair argument. For example, to use the tuning-parameter specifications in `options`, specify

``'Options',options``

• If you specify the HMC sampler, then a best practice is to provide the gradient for some variables, at least. `forecast` resorts the numerical computation of any missing partial derivatives (`NaN` values) in the gradient vector.

Example: `'Sampler',"hmc"`

Data Types: `string`

Sampler options, specified as the comma-separated pair consisting of `'Options'` and a structure array returned by `sampleroptions`. Use `'Options'` to specify the MCMC sampler and its tuning parameter values.

Example: `'Options',sampleroptions('Sampler',"hmc")`

Data Types: `struct`

Typical sampling-interval width around the current value in the marginal distributions for the slice sampler, specified as the comma-separated pair consisting of `'Width'` and a positive numeric scalar or a (`PriorMdl.Intercept` + `PriorMdl.NumPredictors` + `1`)-by-1 numeric vector of positive values. The first element corresponds to the model intercept, if one exists in the model. The next `PriorMdl.NumPredictors` elements correspond to the coefficients of the predictor variables ordered by the predictor data columns. The last element corresponds to the model variance.

• If `Width` is a scalar, then `forecast` applies `Width` to all `PriorMdl.NumPredictors` + `PriorMdl.Intercept` + `1` marginal distributions.

• If `Width` is a numeric vector, then `forecast` applies the first element to the intercept (if one exists), the next `PriorMdl.NumPredictors` elements to the regression coefficients corresponding to the predictor variables in `X`, and the last element to the disturbance variance.

• If the sample size (`size(X,1)`) is less than 100, then `Width` is `10` by default.

• If the sample size is at least 100, then, by default, `forecast` sets `Width` to the vector of corresponding posterior standard deviations, assuming a diffuse prior model (`diffuseblm`).

`forecast` dispatches `Width` to the `slicesample` function. For more details, see `slicesample`.

### Tip

• For maximum flexibility, specify the slice sampler width `width` by using the `'Options'` name-value pair argument. For example:

`'Options',sampleroptions('Sampler',"slice",'Width',width)`

• The typical width of the slice sampler does not affect convergence of the MCMC sample. However, it does affect the number of required function evaluations, that is, the efficiency of the algorithm. If the width is too small, then the algorithm can implement an excessive number of function evaluations to determine the appropriate sampling width. If the width is too large, then the algorithm might have to decrease the width to an appropriate size, which requires function evaluations.

Example: `'Width',[100*ones(3,1);10]`

## Output Arguments

collapse all

Forecasted responses, that is, the mean of the predictive distribution, returned as a `numPeriods`-by-1 numeric vector. Rows correspond to the rows of `XF`.

Covariance matrix of the predictive distribution, returned as a `numPeriods`-by-`numPeriods` numeric, symmetric, positive definite matrix. Rows and columns correspond to the rows of `yF`.

To obtain a vector of standard deviations for the forecasted responses, enter `sqrt(diag(YFCov))`.

## Limitations

If `Mdl` is an `empiricalblm` model object, then you cannot specify `Beta` or `Sigma2`. That is, you cannot forecast from conditional predictive distributions using an empirical prior distribution.

collapse all

### Bayesian Linear Regression Model

A Bayesian linear regression model treats the parameters β and σ2 in the multiple linear regression (MLR) model yt = xtβ + εt as random variables.

For times t = 1,...,T:

• yt is the observed response.

• xt is a 1-by-(p + 1) row vector of observed values of p predictors. To accommodate a model intercept, x1t = 1 for all t.

• β is a (p + 1)-by-1 column vector of regression coefficients corresponding to the variables composing the columns of xt.

• εt is the random disturbance having a mean of zero and Cov(ε) = σ2IT×T, while ε is a T-by-1 vector containing all disturbances. These assumptions imply that the data likelihood is

`$\ell \left(\beta ,{\sigma }^{2}|y,x\right)=\prod _{t=1}^{T}\varphi \left({y}_{t};{x}_{t}\beta ,{\sigma }^{2}\right).$`
ϕ(yt;xtβ,σ2) is the Gaussian probability density with mean xtβ and variance σ2 evaluated at yt;.

Before considering the data, a joint prior distribution assumption is imposed on (β,σ2). In a Bayesian analysis, the beliefs about the distribution of the parameters are updated using information about the parameters gleaned from the likelihood of the data. The result is the joint posterior distribution of (β,σ2) or the conditional posterior distributions of the parameters.

## Tips

• Monte Carlo simulation is subject to variation. That is, if `forecast` uses Monte Carlo simulation, then estimates and inferences might vary when you call `forecast` multiple times under seemingly equivalent conditions. To reproduce estimation results, set a random number seed using `rng` before calling `forecast`.

• If `forecast` throws an error while estimating the posterior distribution using a custom prior model, then try adjusting initial parameter values using `BetaStart` or `Sigma2Start`, or try adjusting the declared log prior function and then reconstructing the model. The error can indicate that the log of the prior distribution is `-Inf` at the specified initial values.

## Algorithms

• Whenever `forecast` must estimate a posterior distribution (for example, when `Mdl` represents a prior distribution and you supply `X` and `y`) and the posterior is analytically tractable, `forecast` evaluates the closed form solutions to Bayes estimators. Otherwise, `forecast` resorts to Monte Carlo simulation to forecast using the posterior predictive distribution. For more details, see Posterior Estimation and Inference.

• This figure describes how `forecast` reduces the Monte Carlo sample using the values of `NumDraws`, `Thin`, and `BurnIn`.

Rectangles represent successive draws from the distribution. `forecast` removes the white rectangles from the Monte Carlo sample. The remaining `NumDraws` black rectangles compose the Monte Carlo sample.