Simulate 10000 observations from this model

$$y={x}_{100}+2{x}_{200}+e.$$

$$X=\{{x}_{1},...,{x}_{1000}\}$$ is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.

*e* is random normal error with mean 0 and standard deviation 0.3.

Create a set of 15 logarithmically-spaced regularization strengths from $$1{0}^{-4}$$ through $$1{0}^{-1}$$.

Hold out 30% of the data for testing. Identify the test-sample indices.

Train a linear regression model using lasso penalties with the strengths in `Lambda`

. Specify the regularization strengths, optimizing the objective function using SpaRSA, and the data partition. To increase execution speed, transpose the predictor data and specify that the observations are in columns.

`Mdl1`

is a `RegressionLinear`

model. Because `Lambda`

is a 15-dimensional vector of regularization strengths, you can think of `Mdl1`

as 15 trained models, one for each regularization strength.

Estimate the test-sample mean squared error for each regularized model.

Higher values of `Lambda`

lead to predictor variable sparsity, which is a good quality of a regression model. Retrain the model using the entire data set and all options used previously, except the data-partition specification. Determine the number of nonzero coefficients per model.

In the same figure, plot the MSE and frequency of nonzero coefficients for each regularization strength. Plot all variables on the log scale.

Select the index or indices of `Lambda`

that balance minimal classification error and predictor-variable sparsity (for example, `Lambda(11)`

).

`MdlFinal`

is a trained `RegressionLinear`

model object that uses `Lambda(11)`

as a regularization strength.