# Bayesian Lasso Regression

This example shows how to perform variable selection using Bayesian lasso regression.

## Contents

Lasso regression is a linear regression technique that combines regularization and variable selection. Regularization helps prevent overfitting by decreasing the magnitude of the regression coefficients. What sets lasso regression apart from other regularization techniques, e.g., ridge regression, is lasso attributes a value of exactly 0 to regression coefficients corresponding to predictors that are not important.

Consider a simple linear regression model

where is a vector of *n* responses, is a matrix of *p* corresponding observed predictor variables, is a vector of *p* regression coefficients, is the intercept, is a length *n* column vector of ones, and is a random vector of I.I.D. Gaussian disturbances with a mean of 0 and variance of . The objective function for lasso regression is

is the penalty term, and not fit to the data like the other parameters. You must choose a value for it before estimation, and you might have to tune it. Once you choose a value, is minimized with respect to the regression coefficients. The resulting coefficients are the lasso estimates.

For more details on lasso regression, see docid:econ_ug.bvmwdog.

## Load Data

Load the acetylene data set. Create a variable for all predictors.

```
load acetylene
X = [x1 x2 x3];
```

The sample size is 16, and there are 3 predictors. For more details, enter `Description` at the command line.

## Fit Simple Linear Regression Model

Fit a simple linear regression model to the data. Extract and display the coefficients and the MSE.

SLRMdl = fitlm(X,y) slrBeta = SLRMdl.Coefficients.Estimate slrMSE = SLRMdl.MSE

SLRMdl = Linear regression model: y ~ 1 + x1 + x2 + x3 Estimated Coefficients: Estimate SE tStat pValue ________ ________ ________ ________ (Intercept) -121.27 55.436 -2.1876 0.049222 x1 0.12685 0.042182 3.0073 0.010918 x2 0.34816 0.17702 1.9668 0.072763 x3 -19.022 107.93 -0.17624 0.86304 Number of observations: 16, Error degrees of freedom: 12 Root Mean Squared Error: 3.77 R-squared: 0.92, Adjusted R-Squared 0.9 F-statistic vs. constant model: 45.9, p-value = 7.52e-07 slrBeta = -121.2696 0.1269 0.3482 -19.0217 slrMSE = 14.1908

`SLRMdl` is a `LinearModel` model object. An estimation summary appears at the command line. The *t* statistics suggest that, at 5% level of significance, `x1` is an important variable for prediction, but `x2` and `x3` might not be as important.

## Fit Lasso Regression Model

Fit a lasso regression model to the data. `lasso` standardizes the data by default. Specify refraining from standardization.

```
[LassoBetaEstimates,FitInfo] = lasso(X,y,'Standardize',false);
```

By default, `lasso` fits the lasso regression model 100 times using a sequence of values for . Hence, `lassoBeta` is a 3-by-100 matrix of regression coefficient values. Rows correspond to predictor variables in `X` and columns correspond to values of . `FitInfo` is a structure that includes the values of (`FitInfo.Lambda`) and the mean squared error for each fit (`FitInfo.MSE`).

Plot the magnitude of the regression coefficients with respect to their norm.

lassoPlot(LassoBetaEstimates); legend('\beta_1','\beta_2','\beta_3','Location','NW');

Choose and display the lasso regression estimate corresponding to the smallest value of . Also, choose and display the corresponding MSE.

lassoBeta = LassoBetaEstimates(:,1); lassoBeta = [FitInfo.Intercept(:,1); lassoBeta] lassoMSE = FitInfo.MSE(:,1)

lassoBeta = -130.6923 0.1340 0.3481 0 lassoMSE = 10.6709

The lasso plot suggests that `x3` is not important because is zero for all regularization values. It appears that `x1` is the most important predictor because has a nonzero magnitude for most regularization values. The MSE from lasso regression is less than the MSE from simple linear regression.

## Fit Bayesian Linear Regression Model

A Bayesian view of lasso uses these assumptions:

- , where and are hyperparameters.
- , where is a hyperparameter.
- for . is a latent Gaussian scale parameter for each regression coefficient.
- , where is a constant.

Consequently,

- , where is the double exponential distribution.
- , where is the inverse Gaussian distribution.

For more details, see docid:econ_ug.bvmwdr5.

You can estimate or sample from the posterior distribution by using one of these samplers: the slice, random walk metropolis, Hamiltonian Monte Carlo (HMC), or Gibbs sampler.

For the HMC sampler, you must create a custom Bayesian linear regression model that specifies the prior distributions of , , and . `estimate` sets up the HMC sampler for you, but, if you need to, you can tune its parameters.

You must set the Gibbs sampler up yourself.

## Fit Using HMC Sampler

Declare the function `priorLasso.m`, which returns the log pdf of the joint prior distribution of the intercept, regression coefficients, disturbance variance.

function logPDF = priorLasso(params,mu,v,a,b,lambda) %priorLasso Log density of Bayesian lasso regression prior distributions % priorLasso assumes: % % * params(1) is the intercept and is normally distributed with a mean % of mu and scaled variance v. % % * params(2:end - 1) are the regression coefficients and % individually have double exponential distributions with location % parameter 0 and scale sqrt(params(end)/lambda). % % * params(end) is the disturbance variance and has an inverse gamma % distribution with shape a and scale b. % % priorLasso returns the log of the product of all densities. % % params: Parameter values at which the densities are evaluated, an % m-by-1 numeric vector. % % mu: Center of normal distribution for the intercept, a numeric % numeric vector. % % v: Scale of normal distribution for the intercept, a positive % numeric scalar. % % a: Inverse gamma shape parameter, a positive numeric scalar. % % b: Inverse gamma scale parameter, a positive scalar. % % lambda: Lasso penalty term, a nonnegative numeric scalar. sigma2PDF = params(end)^(-a-1)*exp(-1/(params(end)*b))/(gamma(a)*b^a); interceptPDF = normpdf(params(1),mu,sqrt(params(end)*v)); betaPDF = 1/2*sqrt(params(end))*exp(-lambda*abs(params(2:end-1))/sqrt(params(end))); logPDF = log(sigma2PDF) + log(interceptPDF) + sum(log(betaPDF)); end

You must set values for the hyperparameters. Set `mu` to 0, `v` to `1e5`, `a` to 3, `b` to 1, and to 15. Create an anonymous function that passes the hyperparameter values to `priorLasso`, but leaves `params` as an argument.

mu = 0; v = 1e5; a = 3; b = 1; lambda = 15; logPDF = @(params)priorLasso(params,mu,v,a,b,lambda);

Ideally, you should use a range of values for . For simplicity, this example uses one value representing a high penalty.

Create a Bayesian linear regression model with the custom prior. Specify that there are three predictor variables.

p = 3; PriorMdl = bayeslm(p,'ModelType','custom','LogPDF',logPDF);

`PriorMdl` is a `customblm` model object.

Estimate the posterior distribution and return estimates. Specify the HMC sampler, reparametrizing the variance, and drawing 5000 samples.

rng(1); % For reproducibility [EstMdl,customBLMBeta,~,customBLMMSE] = estimate(PriorMdl,X,y,... 'Sampler',"hmc",'NumDraws',5000,'Reparameterize',true); customBLMBeta customBLMMSE

Method: MCMC sampling with 5000 draws Number of observations: 16 Number of predictors: 4 | Mean Std CI95 Positive Distribution ------------------------------------------------------------------------------ Intercept | -130.3866 14.4624 [-158.817, -101.690] 0.000 Empirical Beta(1) | 0.1348 0.0122 [ 0.110, 0.159] 1.000 Empirical Beta(2) | 0.2470 0.1650 [-0.056, 0.584] 0.942 Empirical Beta(3) | 0.0022 0.3881 [-0.774, 0.782] 0.509 Empirical Sigma2 | 14.3424 6.1225 [ 6.691, 29.655] 1.000 Empirical customBLMBeta = -130.3866 0.1348 0.2470 0.0022 customBLMMSE = 14.3424

`EstMdl` is an `empiricalblm` model object that contains the HMC sample draws from the posterior distribution. receives most of the penalty, while the values of the other coefficients are close to those from lasso regression.

Plot the traces of the HMC sampler draws.

figure; for j = 1:(p + 1) subplot(2,2,j) plot(EstMdl.BetaDraws(j,:)) title(sprintf('%s',EstMdl.VarNames{j})); xlabel('Simulation index'); ylabel('HMC sampler draw') end figure; plot(EstMdl.Sigma2Draws) title('Sigma2'); xlabel('Simulation index'); ylabel('HMC sampler draw');

The trace plots indicate adaquate mixing of the MCMC sample.

## Fit using Gibbs Sampler

Preallocate a 4-by-10000 matrix of zeros, a 1-by-10000 vector of zeros, and a 3-by-10000 matrix of ones for the posterior draws of the regression coefficients, disturbance variance, and latent scales.

m = 10000; GibbsBetaDraws = zeros(p+1,m); gibbsSigma2Draws = zeros(1,m); GibbsPsiDraws = ones(p,m); psi = GibbsPsiDraws(:,1);

Implement the Gibbs sampling by following these steps for each draw.

- Create a conjugate prior model for .
- Sample from the posterior of using the current value of for each
- Sample from using the current values of , , and .

for k = 1:m muGibbs = [mu; 0; 0; 0]; VGibbs = diag([v; psi]); TmpMdl = bayeslm(p,'ModelType','conjugate','Mu',muGibbs,'V',VGibbs,... 'A',a,'B',b); [GibbsBetaDraws(:,k),gibbsSigma2Draws(k)] = simulate(TmpMdl,X,y); muIGPsi = abs(lambda*sqrt(gibbsSigma2Draws(k))./GibbsBetaDraws(2:end,k)); vIGPsi = lambda^2; GibbsPsiDraws(:,k) = 1./random('inversegaussian',muIGPsi,vIGPsi); psi = GibbsPsiDraws(:,k); end

Plot the traces of the Gibbs sample draws.

figure; for j = 1:(p + 1) subplot(2,2,j) plot(GibbsBetaDraws(j,:)) title(sprintf('%s',EstMdl.VarNames{j})); xlabel('Simulation index'); ylabel('Gibbs draw'); end figure; plot(gibbsSigma2Draws) title('Sigma2'); xlabel('Simulation index'); ylabel('Gibbs draw'); figure; for j = 1:p subplot(2,2,j) plot(GibbsPsiDraws(j,:)) title(sprintf('psi %d',j)); xlabel('Simulation index'); ylabel('Gibbs draw') end

The trace plots indicate that the MCMC sample is mixing well.

Estimate the posterior means of the regression coefficients and the disturbance variance (MSE). Treat the first 5000 draws as a burn in sample.

gibbsBeta = mean(GibbsBetaDraws(:,5001:end),2) gibbsMSE = mean(gibbsSigma2Draws(5001:end))

gibbsBeta = -130.8014 0.1350 0.2553 0.0004 gibbsMSE = 9.7098

The results from Gibbs sampling are similar to the results using the custom Bayesian linear regression model.

The Bayesian linear regression estimates are fairly close to those from lasso. Discrepancies come from two sources:

- The Bayesian estimates are the mean of the posterior distributions, rather than the mode.
- The Bayesian model setup implies that , where is the penalty for the frequentist method and is the penalty for the Bayesian method.

Plot histograms of the posterior draws post burn in.

figure; for j = 1:(p + 1) subplot(2,2,j) histogram(GibbsBetaDraws(j,5001:end),'Normalization','pdf') title(sprintf('%s',EstMdl.VarNames{j})); end

The coefficients are approximately bell shaped, so the mean is a good approximation of the mode.

Compute the corresponding value of , and then implement lasso regression using . For use a reasonable estimate, such as the posterior mean from the Gibbs sample.

lambda1 = lambda*sqrt(gibbsMSE)/numel(y); [lassoBeta2,FitInfo2] = lasso(X,y,'Lambda',lambda1,'Standardize',false); lassoBeta2 = [FitInfo2.Intercept(:,1); lassoBeta2] lassoMSE2 = FitInfo2.MSE(:,1)

lassoBeta2 = -130.7732 0.1351 0.2504 0 lassoMSE2 = 10.9617

Display all estimates in a table for comparison.

table([slrBeta; slrMSE],[lassoBeta2; lassoMSE2],[customBLMBeta; customBLMMSE],... [gibbsBeta; gibbsMSE],'VariableNames',{'SLR' 'Lasso' 'CustomBLM' 'Gibbs'},... 'RowNames',[EstMdl.VarNames; 'MSE'])

ans = 5x4 table SLR Lasso CustomBLM Gibbs _______ _______ _________ _______ Intercept -121.27 -130.77 -130.39 -130.8 Beta(1) 0.12685 0.13506 0.13475 0.13504 Beta(2) 0.34816 0.25035 0.24703 0.25532 Beta(3) -19.022 0 0.0021912 0.0004 MSE 14.191 10.962 14.342 9.7098