Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

lassoglm

Lasso or elastic net regularization for generalized linear model regression

Syntax

B = lassoglm(X,Y)
B = lassoglm(X,Y,distr)
B = lassoglm(X,Y,distr,Name,Value)
[B,FitInfo] = lassoglm(___)

Description

B = lassoglm(X,Y) returns penalized maximum-likelihood fitted coefficients for a generalized linear model of the response Y to the data matrix X. The values in Y are assumed to have a Gaussian probability distribution.

B = lassoglm(X,Y,distr) fits the model using the probability distribution type for Y specified in distr.

B = lassoglm(X,Y,distr,Name,Value) fits regularized generalized linear regressions with additional options specified by one or more Name,Value pair arguments.

[B,FitInfo] = lassoglm(___), for any previous input syntax, also returns a structure containing information about the fits.

Input Arguments

X

Numeric matrix with n rows and p columns. Each row represents one observation, and each column represents one predictor (variable).

Y

When distr is not 'binomial', Y is a numeric vector or categorical array of length n, where n is the number of rows of X. Y(i) is the response to row i of X.

When distr is 'binomial', Y is either:

  • A numeric vector of length n, where each entry represents success (1) or failure (0)

  • A logical vector of length n, where each entry represents success or failure

  • A categorical array of length n, where each entry represents success or failure

  • A two-column numeric matrix, where the first column contains the number of successes for each observation and the second column contains the total number of trials

distr

Distributional family for the nonsystematic variation in the responses. Choices:

  • 'normal'

  • 'binomial'

  • 'poisson'

  • 'gamma'

  • 'inverse gaussian'

By default, lassoglm uses the canonical link function corresponding to distr. Specify another link function using the 'link' name-value pair.

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside single quotes (' '). You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

'Alpha'

Scalar value from 0 to 1 (excluding 0) representing the weight of lasso (L1) versus ridge (L2) optimization. Alpha = 1 represents lasso regression, and other values represent elastic net optimization. Alpha close to 0 approaches ridge regression. See Definitions.

Default: 1

'CV'

Method lassoglm uses to estimate deviance:

  • K, a positive integer — lassoglm uses K-fold cross-validation.

  • cvp, a cvpartition object — lassoglm uses the cross-validation method expressed in cvp. You cannot use a 'leaveout' partition with lassoglm.

  • 'resubstitution'lassoglm uses X and Y to fit the model and to estimate the deviance, without cross-validation.

Default: 'resubstitution'

'DFmax'

Maximum number of nonzero coefficients in the model. lassoglm returns results for Lambda values that satisfy this criterion.

Default: Inf

'Lambda'

Vector of nonnegative Lambda values. See Lasso.

  • If you do not supply Lambda, lassoglm estimates the largest value of Lambda that gives a nonnull model. In this case, LambdaRatio gives the ratio of the smallest to the largest value of the sequence, and NumLambda gives the length of the vector.

  • If you supply Lambda, lassoglm ignores LambdaRatio and NumLambda.

Default: Geometric sequence of NumLambda values, the largest just sufficient to produce B = 0

'LambdaRatio'

Positive scalar, the ratio of the smallest to the largest Lambda value when you do not explicitly set Lambda.

If you set LambdaRatio = 0, lassoglm generates a default sequence of Lambda values, and replaces the smallest one with 0.

Default: 1e-4

'Link'

Specify the mapping between the mean µ of the response and the linear predictor Xb.

ValueDescription
'comploglog'

log(–log((1–µ))) = Xb

'identity', default for the distribution 'normal'

µ = Xb

'log', default for the distribution 'poisson'

log(µ) = Xb

'logit', default for the distribution 'binomial'

log(µ/(1 – µ)) = Xb

'loglog'

log(–log(µ)) = Xb

'probit'

Φ–1(µ) = Xb, where Φ is the normal (Gaussian) CDF function

'reciprocal', default for the distribution 'gamma'

µ–1 = Xb

p (a number), default for the distribution 'inverse gaussian' (with p = –2)

µp = Xb

A cell array of the form {FL FD FI}, containing three function handles, created using @, that define the link (FL), the derivative of the link (FD), and the inverse link (FI), or equivalently, a structure of function handles with field Link containing FL, field Derivative containing FD, and field Inverse containing FI

User-specified link function (see Custom Link Function)

'MaxIter'

Maximum number of iterations allowed, specified as positive integer. If the algorithm executes MaxIter iterations before reaching the convergence tolerance RelTol, the function stops iterating and returns a warning message. The function can return more than one warning when NumLambda is greater than 1.

Default: 1e4

'MCReps'

Positive integer, the number of Monte Carlo repetitions for cross-validation.

  • If CV is 'resubstitution' or a cvpartition of type 'resubstitution', MCReps must be 1.

  • If CV is a cvpartition of type 'holdout', MCReps must be greater than 1.

Default: 1

'NumLambda'

Positive integer, the number of Lambda values lassoglm uses when you do not set Lambda. lassoglm can return fewer than NumLambda fits if the deviance of the fits drops below a threshold fraction of the null deviance (deviance of the fit without any predictors X).

Default: 100

'Offset'

Numeric vector with the same number of rows as X. lassoglm uses Offset as an additional predictor variable, but keeps its coefficient value fixed at 1.0.

'Options'

Structure that specifies whether to cross-validate in parallel, and specifies the random stream or streams. Create the Options structure with statset. Option fields:

  • UseParallel — Set to true to compute in parallel. Default is false.

  • UseSubstreams — Set to true to compute in parallel in a reproducible fashion. To compute reproducibly, set Streams to a type allowing substreams: 'mlfg6331_64' or 'mrg32k3a'. Default is false.

  • StreamsRandStream object or cell array consisting of one such object. If you do not specify Streams, lassoglm uses the default stream.

'PredictorNames'

Cell array of character vectors representing names of the predictor variables, in the order in which they appear in X.

Default: {}

'RelTol'

Convergence threshold for the coordinate descent algorithm (see Friedman, Tibshirani, and Hastie [3]). The algorithm terminates when successive estimates of the coefficient vector differ in the L2 norm by a relative amount less than RelTol.

Default: 1e-4

'Standardize'

Boolean value specifying whether lassoglm scales X before fitting the models. This affects whether the regularization is applied to the coefficients on the standardized scale or original scale. The results are always presented on the original scale.

Default: true

'Weights'

Observation weights, a nonnegative vector of length n, where n is the number of rows of X. At least two values must be positive.

Default: 1/n * ones(n,1)

Output Arguments

B

Fitted coefficients, a p-by-L matrix, where p is the number of predictors (columns) in X, and L is the number of Lambda values.

FitInfo

Structure containing information about the model fits.

Field in FitInfoDescription
AlphaValue of Alpha parameter, a scalar.
DevianceDeviance of the fitted model for each value of Lambda, a 1-by-L vector. If cross-validation was performed, the values for Deviance represent the estimated expected deviance of the model applied to new data, as calculated by cross-validation. Otherwise, Deviance is the deviance of the fitted model applied to the data used to perform the fit.
DFNumber of nonzero coefficients in B for each Lambda value, a 1-by-L vector.
InterceptIntercept term β0 for each linear model, a 1-by-L vector.
LambdaLambda parameters in ascending order, a 1-by-L vector.

If you set the CV name-value pair to cross-validate, the FitInfo structure contains additional fields.

Field in FitInfoDescription
IndexMinDevianceIndex of Lambda with value LambdaMinDeviance, a scalar.
Index1SEIndex of Lambda with value Lambda1SE, a scalar.
LambdaMinDevianceLambda value with minimum expected deviance, as calculated by cross-validation, a scalar.
Lambda1SELargest Lambda such that Deviance is within one standard error of the minimum, a scalar.
SEStandard error of Deviance for each Lambda, as calculated during cross-validation, a 1-by-L vector.

Examples

collapse all

Construct data from a Poisson model, and identify the important predictors using lassoglm.

Create data with 20 predictors, and Poisson responses using just three of the predictors plus a constant.

rng default % For reproducibility
X = randn(100,20);
mu = exp(X(:,[5 10 15])*[.4;.2;.3] + 1);
y = poissrnd(mu);

Construct a cross-validated lasso regularization of a Poisson regression model of the data.

[B, FitInfo] = lassoglm(X,y,'poisson','CV',10);

Examine the cross-validation plot to see the effect of the Lambda regularization parameter.

lassoPlot(B,FitInfo,'plottype','CV');

The green circle and dashed line locate the Lambda with minimum cross-validation error. The blue circle and dashed line locate the point with minimum cross-validation error plus one standard deviation.

Find the nonzero model coefficients corresponding to the two identified points.

minpts = find(B(:,FitInfo.IndexMinDeviance))
minpts =

     3
     5
     6
    10
    11
    15
    16

min1pts = find(B(:,FitInfo.Index1SE))
min1pts =

     5
    10
    15

The coefficients from the minimum-plus-one standard error point are exactly those coefficients used to create the data.

More About

collapse all

Link Function

A link function f(μ) maps a distribution with mean μ to a linear model with data X and coefficient vector b using the formula

f(μ) = Xb.

Find the formulas for the link functions in the Link name-value pair description. The following table lists the link functions that are typically used for each distribution.

Distributional FamilyDefault Link FunctionOther Typical Link Functions
'normal''identity' 
'binomial''logit''comploglog', 'loglog', 'probit'
'poisson''log' 
'gamma''reciprocal' 
'inverse gaussian'-2 

Lasso

For a nonnegative value of λ, lasso solves the problem

minβ0,β(1NDeviance(β0,β)+λj=1p|βj|).

  • The function Deviance in this equation is the deviance of the model fit to the responses using intercept β0 and predictor coefficients β. The formula for Deviance depends on the distr parameter you supply to lassoglm. Minimizing the λ-penalized deviance is equivalent to maximizing the λ-penalized log likelihood.

  • N is the number of observations.

  • λ is a nonnegative regularization parameter corresponding to one value of Lambda.

  • Parameters β0 and β are a scalar and a vector of length p, respectively.

As λ increases, the number of nonzero components of β decreases.

The lasso problem involves the L1 norm of β, as contrasted with the elastic net algorithm.

Elastic Net

For an α strictly between 0 and 1, and a nonnegative λ, elastic net solves the problem

minβ0,β(1NDeviance(β0,β)+λPα(β)),

where

Pα(β)=(1α)2β22+αβ1=j=1p((1α)2βj2+α|βj|).

Elastic net is the same as lasso when α = 1. For other values of α, the penalty term Pα(β) interpolates between the L1 norm of β and the squared L2 norm of β. As α shrinks toward 0, elastic net approaches ridge regression.

References

[1] Tibshirani, R. "Regression Shrinkage and Selection via the Lasso." Journal of the Royal Statistical Society. Series B, Vol. 58, No. 1, 1996, pp. 267–288.

[2] Zou, H. and T. Hastie. "Regularization and Variable Selection via the Elastic Net." Journal of the Royal Statistical Society. Series B, Vol. 67, No. 2, 2005, pp. 301–320.

[3] Friedman, J., R. Tibshirani, and T. Hastie. "Regularization Paths for Generalized Linear Models via Coordinate Descent." Journal of Statistical Software. Vol. 33, No. 1, 2010. http://www.jstatsoft.org/v33/i01

[4] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. 2nd edition. New York: Springer, 2008.

[5] Dobson, A. J. An Introduction to Generalized Linear Models. 2nd edition. New York: Chapman & Hall/CRC Press, 2002.

[6] McCullagh, P., and J. A. Nelder. Generalized Linear Models. 2nd edition. New York: Chapman & Hall/CRC Press, 1989.

[7] Collett, D. Modelling Binary Data, 2nd edition. New York: Chapman & Hall/CRC Press, 2003.

Introduced in R2012a

Was this topic helpful?