Note: This page has been translated by MathWorks. Please click here

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Lasso is a regularization technique. Use `lassoglm`

to:

Reduce the number of predictors in a generalized linear model.

Identify important predictors.

Select among redundant predictors.

Produce shrinkage estimates with potentially lower predictive errors than ordinary least squares.

Elastic net is a related technique. Use it when you have several
highly correlated variables. `lassoglm`

provides
elastic net regularization when you set the `Alpha`

name-value
pair to a number strictly between `0`

and `1`

.

For details about lasso and elastic net computations and algorithms, see Generalized Linear Model Lasso and Elastic Net. For a discussion of generalized linear models, see What Are Generalized Linear Models?.

*Lasso* is a regularization technique for
estimating generalized linear models. Lasso includes a penalty term
that constrains the size of the estimated coefficients. Therefore,
it resembles ridge
regression. Lasso is a *shrinkage estimator*:
it generates coefficient estimates that are biased to be small. Nevertheless,
a lasso estimator can have smaller error than an ordinary maximum
likelihood estimator when you apply it to new data.

Unlike ridge regression, as the penalty term increases, the lasso technique sets more coefficients to zero. This means that the lasso estimator is a smaller model, with fewer predictors. As such, lasso is an alternative to stepwise regression and other model selection and dimensionality reduction techniques.

*Elastic net* is a related technique. Elastic
net is akin to a hybrid of ridge regression and lasso regularization.
Like lasso, elastic net can generate reduced models by generating
zero-valued coefficients. Empirical studies suggest that the elastic
net technique can outperform lasso on data with highly correlated
predictors.

For a nonnegative value of *λ*, `lasso`

solves
the problem

$$\underset{{\beta}_{0},\beta}{\mathrm{min}}\left(\frac{1}{N}\text{Deviance}\left({\beta}_{0},\beta \right)+\lambda {\displaystyle \sum _{j=1}^{p}\left|{\beta}_{j}\right|}\right).$$

The function Deviance in this equation is the deviance of the model fit to the responses using intercept

*β*_{0}and predictor coefficients*β*. The formula for Deviance depends on the`distr`

parameter you supply to`lassoglm`

. Minimizing the*λ*-penalized deviance is equivalent to maximizing the*λ*-penalized log likelihood.*N*is the number of observations.*λ*is a nonnegative regularization parameter corresponding to one value of`Lambda`

.Parameters

*β*_{0}and*β*are a scalar and a vector of length*p*, respectively.

As *λ* increases, the number of nonzero
components of *β* decreases.

The lasso problem involves the *L*^{1} norm
of *β*, as contrasted with the elastic net
algorithm.

For an *α* strictly between 0 and 1,
and a nonnegative *λ*, elastic net solves the
problem

$$\underset{{\beta}_{0},\beta}{\mathrm{min}}\left(\frac{1}{N}\text{Deviance}\left({\beta}_{0},\beta \right)+\lambda {P}_{\alpha}\left(\beta \right)\right),$$

where

$${P}_{\alpha}\left(\beta \right)=\frac{(1-\alpha )}{2}{\Vert \beta \Vert}_{2}^{2}+\alpha {\Vert \beta \Vert}_{1}={\displaystyle \sum _{j=1}^{p}\left(\frac{(1-\alpha )}{2}{\beta}_{j}^{2}+\alpha \left|{\beta}_{j}\right|\right)}.$$

Elastic net is the same as lasso when *α* = 1. For other values of *α*,
the penalty term *P _{α}*(

`ridge`

regression.[1] Tibshirani, R. *Regression Shrinkage
and Selection via the Lasso.* Journal of the Royal Statistical
Society, Series B, Vol. 58, No. 1, pp. 267–288, 1996.

[2] Zou, H. and T. Hastie. *Regularization
and Variable Selection via the Elastic Net.* Journal of
the Royal Statistical Society, Series B, Vol. 67, No. 2, pp. 301–320,
2005.

[3] Friedman, J., R. Tibshirani, and T. Hastie. *Regularization
Paths for Generalized Linear Models via Coordinate Descent.* Journal
of Statistical Software, Vol. 33, No. 1, 2010. `http://www.jstatsoft.org/v33/i01`

[4] Hastie, T., R. Tibshirani, and J. Friedman. *The
Elements of Statistical Learning,* 2nd edition. Springer,
New York, 2008.

[5] McCullagh, P., and J. A. Nelder. *Generalized
Linear Models,* 2nd edition. Chapman & Hall/CRC Press,
1989.

Was this topic helpful?