regularize

Class: RegressionEnsemble

Find weights to minimize resubstitution error plus penalty term

Syntax

`ens1 = regularize(ens)ens1 = regularize(ens,Name,Value)`

Description

`ens1 = regularize(ens)` finds optimal weights for learners in `ens` by lasso regularization. `regularize` returns a regression ensemble identical to `ens`, but with a populated `Regularization` property.

`ens1 = regularize(ens,Name,Value)` computes optimal weights with additional options specified by one or more `Name,Value` pair arguments. You can specify several name-value pair arguments in any order as `Name1,Value1,…,NameN,ValueN`.

Input Arguments

 `ens` A regression ensemble, created by `fitensemble`.

Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside single quotes (`' '`). You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

 `'lambda'` Vector of nonnegative regularization parameter values for lasso. For the default setting of `lambda`, `regularize` calculates the smallest value `lambda_max` for which all optimal weights for learners are `0`. The default value of `lambda` is a vector including `0` and nine exponentially-spaced numbers from `lambda_max/1000` to `lambda_max`. Default: `[0 logspace(log10(lambda_max/1000),log10(lambda_max),9)]` `'npass'` Maximal number of passes for lasso optimization, a positive integer. Default: `10` `'reltol'` Relative tolerance on the regularized loss for lasso, a numeric positive scalar. Default: `1e-3` `'verbose'` Verbosity level, either `0` or `1`. When set to `1`, `regularize` displays more information as it runs. Default: `0`

Output Arguments

 `ens1` A regression ensemble. Usually you set `ens1` to the same name as `ens`.

Definitions

Lasso

The lasso algorithm finds an optimal set of learner weights αt that minimize

$\sum _{n=1}^{N}{w}_{n}g\left(\left(\sum _{t=1}^{T}{\alpha }_{t}{h}_{t}\left({x}_{n}\right)\right),{y}_{n}\right)+\lambda \sum _{t=1}^{T}|{\alpha }_{t}|.$

Here

• λ ≥ 0 is a parameter you provide, called the lasso parameter.

• ht is a weak learner in the ensemble trained on N observations with predictors xn, responses yn, and weights wn.

• g(f,y) = (fy)2 is the squared error.

Examples

Regularize an ensemble of bagged trees:

```X = rand(2000,20); Y = repmat(-1,2000,1); Y(sum(X(:,1:5),2)>2.5) = 1; bag = fitensemble(X,Y,'Bag',300,'Tree',... 'type','regression'); bag = regularize(bag,'lambda',[0.001 0.1],'verbose',1);```

`regularize` reports on its progress.

To see the resulting regularization structure:

```bag.Regularization ans = Method: 'Lasso' TrainedWeights: [300x2 double] Lambda: [1.0000e-003 0.1000] ResubstitutionMSE: [0.0616 0.0812] CombineWeights: @classreg.learning.combiner.WeightedSum```

See how many learners in the regularized ensemble have positive weights (so would be included in a shrunken ensemble):

```sum(bag.Regularization.TrainedWeights > 0) ans = 116 91```

To shrink the ensemble using the weights from ```Lambda = 0.1```:

```cmp = shrink(bag,'weightcolumn',2) cmp = classreg.learning.regr.CompactRegressionEnsemble: PredictorNames: {1x20 cell} CategoricalPredictors: [] ResponseName: 'Y' ResponseTransform: 'none' NumTrained: 91```

There are `91` members in the regularized ensemble, which is less than 1/3 of the original `300`.

See Also

Was this topic helpful?

Get trial now