# lbfgsState

State of limited-memory BFGS (L-BFGS) solver

Since R2023a

## Description

An `lbfgsState` object stores information about steps in the L-BFGS algorithm.

The L-BFGS algorithm [1] is a quasi-Newton method that approximates the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. The L-BFGS algorithm is best suited for small networks and data sets that you can process in a single batch.

Use `lbfgsState` objects in conjunction with the `lbfgsupdate` function to train a neural network using the L-BFGS algorithm.

## Creation

### Syntax

``solverState = lbfgsState``
``solverState = lbfgsState(Name=Value)``

### Description

example

````solverState = lbfgsState` creates an L-BFGS state object with a history size of 10 and an initial inverse Hessian factor of 1.```

example

````solverState = lbfgsState(Name=Value)` sets the `HistorySize` and `InitialInverseHessianFactor` properties using one or more name-value arguments.```

## Properties

expand all

### L-BFGS State

Number of state updates to store, specified as a positive integer. Values between 3 and 20 suit most tasks.

The L-BFGS algorithm uses a history of gradient calculations to approximate the Hessian matrix recursively. For more information, see Limited-Memory BFGS.

After creating the `lbfgsState` object, this property is read-only.

Data Types: `single` | `double` | `int8` | `int16` | `int32` | `int64` | `uint8` | `uint16` | `uint32` | `uint64`

Initial value that characterizes the approximate inverse Hessian matrix, specified as a positive scalar.

To save memory, the L-BFGS algorithm does not store and invert the dense Hessian matrix B. Instead, the algorithm uses the approximation ${B}_{k-m}^{-1}\approx {\lambda }_{k}I$, where m is the history size, the inverse Hessian factor ${\lambda }_{k}$ is a scalar, and I is the identity matrix, and stores the scalar inverse Hessian factor only. The algorithm updates the inverse Hessian factor at each step.

The `InitialInverseHessianFactor` property is the value of ${\lambda }_{0}$.

After creating the `lbfgsState` object, this property is read-only.

Data Types: `single` | `double` | `int8` | `int16` | `int32` | `int64` | `uint8` | `uint16` | `uint32` | `uint64`

Value that characterizes the approximate inverse Hessian matrix, specified as a positive scalar.

To save memory, the L-BFGS algorithm does not store and invert the dense Hessian matrix B. Instead, the algorithm uses the approximation ${B}_{k-m}^{-1}\approx {\lambda }_{k}I$, where m is the history size, the inverse Hessian factor ${\lambda }_{k}$ is a scalar, and I is the identity matrix, and stores the scalar inverse Hessian factor only. The algorithm updates the inverse Hessian factor at each step.

Data Types: `single` | `double` | `int8` | `int16` | `int32` | `int64` | `uint8` | `uint16` | `uint32` | `uint64`

Step history, specified as a cell array.

The L-BFGS algorithm uses a history of gradient calculations to approximate the Hessian matrix recursively. For more information, see Limited-Memory BFGS.

Data Types: `cell`

Gradients difference history, specified as a cell array.

The L-BFGS algorithm uses a history of gradient calculations to approximate the Hessian matrix recursively. For more information, see Limited-Memory BFGS.

Data Types: `cell`

History indices, specified as a row vector.

`HistoryIndices` is a 1-by-`HistorySize` vector, where `StepHistory(i)` and `GradientsDifferenceHistory(i)` correspond to iteration `HistoryIndices(i)`.

Data Types: `double`

### Iteration Information

Loss, specified as a `dlarray` scalar, a numeric scalar, or `[]`.

If the state object is the output of the `lbfgsupdate` function, then `Loss` is the first output of the loss function that you pass to the `lbfgsupdate` function. Otherwise, `Loss` is `[]`.

Gradients, specified as a `dlarray` object, a numeric array, a cell array, a structure, a table, or `[]`.

If the state object is the output of the `lbfgsupdate` function, then `Gradients` is the second output of the loss function that you pass to the `lbfgsupdate` function. Otherwise, `Gradients` is `[]`.

Additional loss function outputs, specified as a cell array.

If the state object is the output of the `lbfgsupdate` function, then `AdditionalLossFunctionOutputs` is a cell array containing additional outputs of the loss function that you pass to the `lbfgsupdate` function. Otherwise, `AdditionalLossFunctionOutputs` is a 1-by-0 cell array.

Data Types: `cell`

Norm of the step, specified as a `dlarray` scalar, numeric scalar, or `[]`.

If the state object is the output of the `lbfgsupdate` function, then `StepNorm` is the norm of the step that the `lbfgsupdate` function calculates. Otherwise, `StepNorm` is `[]`.

Norm of the gradients, specified as a `dlarray` scalar, a numeric scalar, or `[]`.

If the state object is the output of the `lbfgsupdate` function, then `GradientsNorm` is the norm of the second output of the loss function that you pass to the `lbfgsupdate` function. Otherwise, `GradientsNorm` is `[]`.

Status of the line search algorithm, specified as `""`, `"completed"`, or `"failed"`.

If the state object is the output of the `lbfgsupdate` function, then `LineSearchStatus` is one of these values:

• `"completed"` — The algorithm finds a learning rate that satisfies the `LineSearchMethod` and `MaxNumLineSearchIterations` options that the `lbfgsupdate` function uses.

• `"failed"` — The algorithm fails to find a learning rate that satisfies the `LineSearchMethod` and `MaxNumLineSearchIterations` options that the `lbfgsupdate` function uses.

Otherwise, `LineSearchStatus` is `""`.

Method solver uses to find a suitable learning rate, specified as `"weak-wolfe"`, `"strong-wolfe"`, `"backtracking"`, or `[]`.

If the state object is the output of the `lbfgsupdate` function, then `LineSearchMethod` is the line search method that the `lbfgsupdate` function uses. Otherwise, `LineSearchMethod` is `""`.

Maximum number of line search iterations, specified as a nonnegative integer.

If the state object is the output of the `lbfgsupdate` function, then `MaxNumLineSearchIterations` is the maximum number of line search iterations that the `lbfgsupdate` function uses. Otherwise, `MaxNumLineSearchIterations` is `0`.

Data Types: `double`

## Examples

collapse all

Create an L-BFGS solver state object.

`solverState = lbfgsState`
```solverState = LBFGSState with properties: InverseHessianFactor: 1 StepHistory: {} GradientsDifferenceHistory: {} HistoryIndices: [1x0 double] Iteration Information Loss: [] Gradients: [] AdditionalLossFunctionOutputs: {1x0 cell} GradientsNorm: [] StepNorm: [] LineSearchStatus: "" Show all properties ```

`[XTrain, TTrain] = iris_dataset;`

Convert the predictors to a `dlarray` object with format `"CB"` (channel, batch).

`XTrain = dlarray(XTrain,"CB");`

Define the network architecture.

```numInputFeatures = size(XTrain,1); numClasses = size(TTrain,1); numHiddenUnits = 32; layers = [ featureInputLayer(numInputFeatures) fullyConnectedLayer(numHiddenUnits) reluLayer fullyConnectedLayer(numHiddenUnits) reluLayer fullyConnectedLayer(numClasses) softmaxLayer]; net = dlnetwork(layers);```

Define the `modelLoss` function, listed in the Model Loss Function section of the example. This function takes as input a neural network, input data, and targets. The function returns the loss and the gradients of the loss with respect to the network learnable parameters.

The `lbfgsupdate` function requires a loss function with the syntax `[loss,gradients] = f(net)`. Create a variable that parameterizes the evaluated `modelLoss` function to take a single input argument.

`lossFcn = @(net) dlfeval(@modelLoss,net,XTrain,TTrain);`

Initialize an L-BFGS solver state object with a maximum history size of 3 and an initial inverse Hessian approximation factor of 1.1.

```solverState = lbfgsState( ... HistorySize=3, ... InitialInverseHessianFactor=1.1);```

Train the network for 10 epochs.

```numEpochs = 10; for i = 1:numEpochs [net, solverState] = lbfgsupdate(net,lossFcn,solverState); end```

Model Loss Function

The `modelLoss` function takes as input a neural network `net`, input data `X`, and targets `T`. The function returns the loss and the gradients of the loss with respect to the network learnable parameters.

```function [loss, gradients] = modelLoss(net, X, T) Y = forward(net,X); loss = crossentropy(Y,T); gradients = dlgradient(loss,net.Learnables); end```

expand all

## References

[1] Liu, Dong C., and Jorge Nocedal. "On the limited memory BFGS method for large scale optimization." Mathematical programming 45, no. 1 (August 1989): 503-528. https://doi.org/10.1007/BF01589116.

## Version History

Introduced in R2023a