Accelerating the pace of engineering and science

# lasso

Regularized least-squares regression using lasso or elastic net algorithms

## Syntax

B = lasso(X,Y)
[B,FitInfo] = lasso(X,Y)
[B,FitInfo] = lasso(X,Y,Name,Value)

## Description

B = lasso(X,Y) returns fitted least-squares regression coefficients for a set of regularization coefficients Lambda.

[B,FitInfo] = lasso(X,Y) returns a structure containing information about the fits.

[B,FitInfo] = lasso(X,Y,Name,Value) fits regularized regressions with additional options specified by one or more Name,Value pair arguments.

## Input Arguments

 X Numeric matrix with n rows and p columns. Each row represents one observation, and each column represents one predictor (variable). Y Numeric vector of length n, where n is the number of rows of X. Y(i) is the response to row i of X.

### Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

 'Alpha' Scalar value from 0 to 1 (excluding 0) representing the weight of lasso (L1) versus ridge (L2) optimization. Alpha = 1 represents lasso regression, Alpha close to 0 approaches ridge regression, and other values represent elastic net optimization. See Definitions.Default: 1 'CV' Method lasso uses to estimate mean squared error: K, a positive integer — lasso uses K-fold cross validation.cvp, a cvpartition object — lasso uses the cross-validation method expressed in cvp. You cannot use a 'leaveout' partition with lasso.'resubstitution' — lasso uses X and Y to fit the model and to estimate the mean squared error, without cross validation. Default: 'resubstitution' 'DFmax' Maximum number of nonzero coefficients in the model. lasso returns results only for Lambda values that satisfy this criterion. Default: Inf 'Lambda' Vector of nonnegative Lambda values. See Definitions. If you do not supply Lambda, lasso calculates the largest value of Lambda that gives a nonnull model. In this case, LambdaRatio gives the ratio of the smallest to the largest value of the sequence, and NumLambda gives the length of the vector.If you supply Lambda, lasso ignores LambdaRatio and NumLambda. Default: Geometric sequence of NumLambda values, the largest just sufficient to produce B = 0 'LambdaRatio' Positive scalar, the ratio of the smallest to the largest Lambda value when you do not set Lambda. If you set LambdaRatio = 0, lasso generates a default sequence of Lambda values, and replaces the smallest one with 0. Default: 1e-4 'MCReps' Positive integer, the number of Monte Carlo repetitions for cross validation. If CV is 'resubstitution' or a cvpartition of type 'resubstitution', MCReps must be 1.If CV is a cvpartition of type 'holdout', MCReps must be greater than 1. Default: 1 'NumLambda' Positive integer, the number of Lambda values lasso uses when you do not set Lambda. lasso can return fewer than NumLambda fits if the if the residual error of the fits drops below a threshold fraction of the variance of Y. Default: 100 'Options' Structure that specifies whether to cross validate in parallel, and specifies the random stream or streams. Create the Options structure with statset. Option fields: UseParallel — Set to true to compute in parallel. Default is false.UseSubstreams — Set to true to compute in parallel in a reproducible fashion. To compute reproducibly, set Streams to a type allowing substreams: 'mlfg6331_64' or 'mrg32k3a'. Default is false. Streams — A RandStream object or cell array consisting of one such object. If you do not specify Streams, lasso uses the default stream. 'PredictorNames' Cell array of strings representing names of the predictor variables, in the order in which they appear in X. Default: {} 'RelTol' Convergence threshold for the coordinate descent algorithm (see Friedman, Tibshirani, and Hastie [3]). The algorithm terminates when successive estimates of the coefficient vector differ in the L2 norm by a relative amount less than RelTol. Default: 1e-4 'Standardize' Boolean value specifying whether lasso scales X before fitting the models. Default: true 'Weights' Observation weights, a nonnegative vector of length n, where n is the number of rows of X. lasso scales Weights to sum to 1. Default: 1/n * ones(n,1)

## Output Arguments

B

Fitted coefficients, a p-by-L matrix, where p is the number of predictors (columns) in X, and L is the number of Lambda values.

FitInfo

Structure containing information about the model fits.

Field in FitInfoDescription
InterceptIntercept term β0 for each linear model, a 1-by-L vector
LambdaLambda parameters in ascending order, a 1-by-L vector
AlphaValue of Alpha parameter, a scalar
DFNumber of nonzero coefficients in B for each value of Lambda, a 1-by-L vector
MSEMean squared error (MSE), a 1-by-L vector

If you set the CV name-value pair to cross validate, the FitInfo structure contains additional fields.

Field in FitInfoDescription
SEThe standard error of MSE for each Lambda, as calculated during cross validation, a 1-by-L vector
LambdaMinMSEThe Lambda value with minimum MSE, a scalar
Lambda1SEThe largest Lambda such that MSE is within one standard error of the minimum, a scalar
IndexMinMSEThe index of Lambda with value LambdaMinMSE, a scalar
Index1SEThe index of Lambda with value Lambda1SE, a scalar

## Examples

expand all

### Remove Redundant Predictors

Construct a data set with redundant predictors, and identify those predictors using cross-validated lasso.

Create a matrix X of 100 five-dimensional normal variables and a response vector Y from just two components of X, with small added noise.

```X = randn(100,5);
r = [0;2;0;-3;0]; % only two nonzero coefficients
Y = X*r + randn(100,1)*.1; % small added noise```

Construct the default lasso fit.

`B = lasso(X,Y);`

Find the coefficient vector for the 25th value in B.

`B(:,25)`
```ans =

0
1.6093
0
-2.5865
0```

lasso identifies and removes the redundant predictors.

### Plot a Regularized Fit with Cross Validation

Visually examine the cross-validated error of various levels of regularization.

Load the acetylene data and prepare the data with interactions for fitting.

```load acetylene
Xs = [x1 x2 x3];
X = x2fx(Xs,'interaction');
X(:,1) = []; % No constant term```

Construct the lasso fit using ten-fold cross validation. Include the FitInfo output so you can plot the result.

`[B FitInfo] = lasso(X,y,'CV',10);`

Plot the cross-validated fits.

`lassoPlot(B,FitInfo,'PlotType','CV');`

expand all

### Lasso

For a given value of λ, a nonnegative parameter, lasso solves the problem

$\underset{{\beta }_{0},\beta }{\mathrm{min}}\left(\frac{1}{2N}\sum _{i=1}^{N}{\left({y}_{i}-{\beta }_{0}-{x}_{i}^{T}\beta \right)}^{2}+\lambda \sum _{j=1}^{p}|{\beta }_{j}|\right),$

where

• N is the number of observations.

• yi is the response at observation i.

• xi is data, a vector of p values at observation i.

• λ is a nonnegative regularization parameter corresponding to one value of Lambda.

• The parameters β0 and β are scalar and p-vector respectively.

As λ increases, the number of nonzero components of β decreases.

The lasso problem involves the L1 norm of β, as contrasted with the elastic net algorithm.

### Elastic Net

For an α strictly between 0 and 1, and a nonnegative λ, elastic net solves the problem

$\underset{{\beta }_{0},\beta }{\mathrm{min}}\left(\frac{1}{2N}\sum _{i=1}^{N}{\left({y}_{i}-{\beta }_{0}-{x}_{i}^{T}\beta \right)}^{2}+\lambda {P}_{\alpha }\left(\beta \right)\right),$

where

${P}_{\alpha }\left(\beta \right)=\frac{\left(1-\alpha \right)}{2}{‖\beta ‖}_{2}^{2}+\alpha {‖\beta ‖}_{1}=\sum _{j=1}^{p}\left(\frac{\left(1-\alpha \right)}{2}{\beta }_{j}^{2}+\alpha |{\beta }_{j}|\right).$

Elastic net is the same as lasso when α = 1. As α shrinks toward 0, elastic net approaches ridge regression. For other values of α, the penalty term Pα(β) interpolates between the L1 norm of β and the squared L2 norm of β.

## References

[1] Tibshirani, R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, Vol 58, No. 1, pp. 267–288, 1996.

[2] Zou, H. and T. Hastie. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B, Vol. 67, No. 2, pp. 301–320, 2005.

[3] Friedman, J., R. Tibshirani, and T. Hastie. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, Vol 33, No. 1, 2010. http://www.jstatsoft.org/v33/i01

[4] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, 2nd edition. Springer, New York, 2008.