Lasso is a regularization technique. Use
Reduce the number of predictors in a generalized linear model.
Identify important predictors.
Select among redundant predictors.
Produce shrinkage estimates with potentially lower predictive errors than ordinary least squares.
Elastic net is a related technique. Use it when you have several
highly correlated variables.
elastic net regularization when you set the
pair to a number strictly between
For details about lasso and elastic net computations and algorithms, see Generalized Linear Model Lasso and Elastic Net. For a discussion of generalized linear models, see What Are Generalized Linear Models?.
Lasso is a regularization technique for estimating generalized linear models. Lasso includes a penalty term that constrains the size of the estimated coefficients. Therefore, it resembles Ridge Regression. Lasso is a shrinkage estimator: it generates coefficient estimates that are biased to be small. Nevertheless, a lasso estimator can have smaller error than an ordinary maximum likelihood estimator when you apply it to new data.
Unlike ridge regression, as the penalty term increases, the lasso technique sets more coefficients to zero. This means that the lasso estimator is a smaller model, with fewer predictors. As such, lasso is an alternative to stepwise regression and other model selection and dimensionality reduction techniques.
Elastic net is a related technique. Elastic net is akin to a hybrid of ridge regression and lasso regularization. Like lasso, elastic net can generate reduced models by generating zero-valued coefficients. Empirical studies suggest that the elastic net technique can outperform lasso on data with highly correlated predictors.
For a nonnegative value of λ,
lassoglm solves the
The function Deviance in this equation is the deviance of the model fit to the
responses using the intercept β0 and the
predictor coefficients β. The formula for Deviance depends on the
distr parameter you supply to
lassoglm. Minimizing the λ-penalized deviance is
equivalent to maximizing the λ-penalized loglikelihood.
N is the number of observations.
λ is a nonnegative regularization
parameter corresponding to one value of
The parameters β0 and β are a scalar and a vector of length p, respectively.
As λ increases, the number of nonzero components of β decreases.
The lasso problem involves the L1 norm of β, as contrasted with the elastic net algorithm.
For α strictly between 0 and 1, and nonnegative λ, elastic net solves the problem
Elastic net is the same as lasso when α = 1. For other values of α,
the penalty term Pα(β)
interpolates between the L1 norm
of β and the squared L2 norm
of β. As α shrinks
toward 0, elastic net approaches
 Tibshirani, R. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society, Series B, Vol. 58, No. 1, pp. 267–288, 1996.
 Zou, H. and T. Hastie. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society, Series B, Vol. 67, No. 2, pp. 301–320, 2005.
 Friedman, J., R. Tibshirani, and T. Hastie.
Regularization Paths for Generalized Linear Models via Coordinate
Descent. Journal of Statistical Software, Vol. 33, No. 1, 2010.
 Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, 2nd edition. Springer, New York, 2008.
 McCullagh, P., and J. A. Nelder. Generalized Linear Models, 2nd edition. Chapman & Hall/CRC Press, 1989.