This example shows how
lasso identifies and discards unnecessary predictors.
Generate 200 samples of five-dimensional artificial data X from exponential distributions with various means.
rng(3,'twister') % For reproducibility X = zeros(200,5); for ii = 1:5 X(:,ii) = exprnd(ii,200,1); end
Generate response data
eps , where
r has just two nonzero components, and the noise
eps is normal with standard deviation 0.1.
r = [0;2;0;-3;0]; Y = X*r + randn(200,1)*.1;
Fit a cross-validated sequence of models with
lasso , and plot the result.
[b,fitinfo] = lasso(X,Y,'CV',10); lassoPlot(b,fitinfo,'PlotType','Lambda','XScale','log');
The plot shows the nonzero coefficients in the regression for various values of the
Lambda regularization parameter. Larger values of
Lambda appear on the left side of the graph, meaning more regularization, resulting in fewer nonzero regression coefficients.
The dashed vertical lines represent the
Lambda value with minimal mean squared error (on the right), and the
Lambda value with minimal mean squared error plus one standard deviation. This latter value is a recommended setting for
Lambda . These lines appear only when you perform cross validation. Cross validate by setting the
'CV' name-value pair argument. This example uses 10-fold cross validation.
The upper part of the plot shows the degrees of freedom (df), meaning the number of nonzero coefficients in the regression, as a function of Lambda. On the left, the large value of Lambda causes all but one coefficient to be 0. On the right all five coefficients are nonzero, though the plot shows only two clearly. The other three coefficients are so small that you cannot visually distinguish them from 0.
For small values of Lambda (toward the right in the plot), the coefficient values are close to the least-squares estimate.
Lambda value of the minimal cross-validated mean squared error plus one standard deviation. Examine the MSE and coefficients of the fit at that
lam = fitinfo.Index1SE; fitinfo.MSE(lam)
ans = 0.1398
ans = 5×1 0 1.8855 0 -2.9367 0
lasso did a good job finding the coefficient vector
For comparison, find the least-squares estimate of
rhat = X\Y
rhat = 5×1 -0.0038 1.9952 0.0014 -2.9993 0.0031
b(:,lam) has slightly more mean squared error than the mean squared error of
res = X*rhat - Y; % Calculate residuals MSEmin = res'*res/200 % b(:,lam) value is 0.1398
MSEmin = 0.0088
b(:,lam) has only two nonzero components, and therefore can provide better predictive estimates on new data.