# lbfgsState

## Description

An `lbfgsState`

object stores information about steps in the
L-BFGS algorithm.

The L-BFGS algorithm [1] is a quasi-Newton method that approximates the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. The L-BFGS algorithm is best suited for small networks and data sets that you can process in a single batch.

Use `lbfgsState`

objects in conjunction with the `lbfgsupdate`

function to train a neural network using the L-BFGS algorithm.

## Creation

### Description

creates an L-BFGS
state object with a history size of 10 and an initial inverse Hessian factor of 1.`solverState`

= lbfgsState

sets the `solverState`

= lbfgsState(`Name=Value`

)`HistorySize`

and `InitialInverseHessianFactor`

properties using one or more name-value
arguments.

## Properties

### L-BFGS State

`HistorySize`

— Number of state updates to store

`10`

(default) | positive integer

Number of state updates to store, specified as a positive integer. Values between 3 and 20 suit most tasks.

The L-BFGS algorithm uses a history of gradient calculations to approximate the Hessian matrix recursively. For more information, see Limited-Memory BFGS.

After creating the `lbfgsState`

object, this property is
read-only.

**Data Types: **`single`

| `double`

| `int8`

| `int16`

| `int32`

| `int64`

| `uint8`

| `uint16`

| `uint32`

| `uint64`

`InitialInverseHessianFactor`

— Initial value that characterizes approximate inverse Hessian matrix

`1`

(default) | positive scalar

This property is read-only.

Initial value that characterizes the approximate inverse Hessian matrix, specified as a positive scalar.

To save memory, the L-BFGS algorithm does not store and invert the dense Hessian matrix
*B*. Instead, the algorithm uses the approximation $${B}_{k-m}^{-1}\approx {\lambda}_{k}I$$, where *m* is the history size, the inverse Hessian
factor $${\lambda}_{k}$$ is a scalar, and *I* is the identity matrix, and stores
the scalar inverse Hessian factor only. The algorithm updates the inverse Hessian factor at
each step.

The `InitialInverseHessianFactor`

property is the value of $${\lambda}_{0}$$.

For more information, see Limited-Memory BFGS.

After creating the `lbfgsState`

object, this property is
read-only.

**Data Types: **`single`

| `double`

| `int8`

| `int16`

| `int32`

| `int64`

| `uint8`

| `uint16`

| `uint32`

| `uint64`

`InverseHessianFactor`

— Value that characterizes approximate inverse Hessian matrix

`1`

(default) | positive scalar

Value that characterizes the approximate inverse Hessian matrix, specified as a positive scalar.

To save memory, the L-BFGS algorithm does not store and invert the dense Hessian matrix
*B*. Instead, the algorithm uses the approximation $${B}_{k-m}^{-1}\approx {\lambda}_{k}I$$, where *m* is the history size, the inverse Hessian
factor $${\lambda}_{k}$$ is a scalar, and *I* is the identity matrix, and stores
the scalar inverse Hessian factor only. The algorithm updates the inverse Hessian factor at
each step.

For more information, see Limited-Memory BFGS.

**Data Types: **`single`

| `double`

| `int8`

| `int16`

| `int32`

| `int64`

| `uint8`

| `uint16`

| `uint32`

| `uint64`

`StepHistory`

— Step history

`{}`

(default) | cell array

Step history, specified as a cell array.

The L-BFGS algorithm uses a history of gradient calculations to approximate the Hessian matrix recursively. For more information, see Limited-Memory BFGS.

**Data Types: **`cell`

`GradientsDifferenceHistory`

— Gradients difference history

`{}`

(default) | cell array

Gradients difference history, specified as a cell array.

The L-BFGS algorithm uses a history of gradient calculations to approximate the Hessian matrix recursively. For more information, see Limited-Memory BFGS.

**Data Types: **`cell`

`HistoryIndices`

— History indices

0-by-1 vector (default) | row vector

History indices, specified as a row vector.

`HistoryIndices`

is a 1-by-`HistorySize`

vector, where `StepHistory(i)`

and
`GradientsDifferenceHistory(i)`

correspond to iteration
`HistoryIndices(i)`

.

For more information, see Limited-Memory BFGS.

**Data Types: **`double`

### Iteration Information

`Loss`

— Loss

`[]`

(default) | `dlarray`

scalar | numeric scalar

This property is read-only.

Loss, specified as a `dlarray`

scalar, a numeric scalar, or `[]`

.

If the state object is the output of the `lbfgsupdate`

function, then `Loss`

is the first output of the loss function that
you pass to the `lbfgsupdate`

function. Otherwise, `Loss`

is `[]`

.

`Gradients`

— Gradients

`[]`

(default) | `dlarray`

object | numeric array | cell array | structure | table

This property is read-only.

Gradients, specified as a `dlarray`

object, a numeric array, a cell array, a structure, a table, or
`[]`

.

If the state object is the output of the `lbfgsupdate`

function, then `Gradients`

is the second output of the loss
function that you pass to the `lbfgsupdate`

function. Otherwise, `Gradients`

is `[]`

.

`AdditionalLossFunctionOutputs`

— Additional loss function outputs

1-by-0 cell array (default) | cell array

This property is read-only.

Additional loss function outputs, specified as a cell array.

If the state object is the output of the `lbfgsupdate`

function, then `AdditionalLossFunctionOutputs`

is a cell array
containing additional outputs of the loss function that you pass to the `lbfgsupdate`

function. Otherwise, `AdditionalLossFunctionOutputs`

is a 1-by-0
cell array.

**Data Types: **`cell`

`StepNorm`

— Norm of step

`[]`

(default) | `dlarray`

scalar | numeric scalar

This property is read-only.

Norm of the step, specified as a `dlarray`

scalar, numeric scalar, or `[]`

.

If the state object is the output of the `lbfgsupdate`

function, then `StepNorm`

is the norm of the step that the
`lbfgsupdate`

function calculates. Otherwise, `StepNorm`

is
`[]`

.

`GradientsNorm`

— Norm of gradients

`[]`

(default) | `dlarray`

scalar | numeric scalar

This property is read-only.

Norm of the gradients, specified as a `dlarray`

scalar, a numeric scalar, or `[]`

.

If the state object is the output of the `lbfgsupdate`

function, then `GradientsNorm`

is the norm of the second output of
the loss function that you pass to the `lbfgsupdate`

function. Otherwise, `GradientsNorm`

is
`[]`

.

`LineSearchStatus`

— Status of line search algorithm

`""`

(default) | `"completed"`

| `"failed"`

This property is read-only.

Status of the line search algorithm, specified as `""`

,
`"completed"`

, or `"failed"`

.

If the state object is the output of the `lbfgsupdate`

function, then `LineSearchStatus`

is one of these values:

`"completed"`

— The algorithm finds a learning rate that satisfies the`LineSearchMethod`

and`MaxNumLineSearchIterations`

options that the`lbfgsupdate`

function uses.`"failed"`

— The algorithm fails to find a learning rate that satisfies the`LineSearchMethod`

and`MaxNumLineSearchIterations`

options that the`lbfgsupdate`

function uses.

Otherwise, `LineSearchStatus`

is `""`

.

`LineSearchMethod`

— Method solver uses to find suitable learning rate

`""`

(default) | `"weak-wolfe"`

| `"strong-wolfe"`

| `"backtracking"`

This property is read-only.

Method solver uses to find a suitable learning rate, specified as
`"weak-wolfe"`

, `"strong-wolfe"`

,
`"backtracking"`

, or `[]`

.

If the state object is the output of the `lbfgsupdate`

function, then `LineSearchMethod`

is the line search method that
the `lbfgsupdate`

function uses. Otherwise, `LineSearchMethod`

is
`""`

.

`MaxNumLineSearchIterations`

— Maximum number of line search iterations

`0`

(default) | nonnegative integer

This property is read-only.

Maximum number of line search iterations, specified as a nonnegative integer.

If the state object is the output of the `lbfgsupdate`

function, then `MaxNumLineSearchIterations`

is the maximum number
of line search iterations that the `lbfgsupdate`

function uses. Otherwise, `MaxNumLineSearchIterations`

is
`0`

.

**Data Types: **`double`

## Examples

### Create L-BFGS Solver State Object

Create an L-BFGS solver state object.

solverState = lbfgsState

solverState = LBFGSState with properties: InverseHessianFactor: 1 StepHistory: {} GradientsDifferenceHistory: {} HistoryIndices: [1x0 double] Iteration Information Loss: [] Gradients: [] AdditionalLossFunctionOutputs: {1x0 cell} GradientsNorm: [] StepNorm: [] LineSearchStatus: "" Show all properties

### Update Learnable Parameters in Neural Network

Load the iris flower dataset.

[XTrain, TTrain] = iris_dataset;

Convert the predictors to a `dlarray`

object with format `"CB"`

(channel, batch).

`XTrain = dlarray(XTrain,"CB");`

Define the network architecture.

numInputFeatures = size(XTrain,1); numClasses = size(TTrain,1); numHiddenUnits = 32; layers = [ featureInputLayer(numInputFeatures) fullyConnectedLayer(numHiddenUnits) reluLayer fullyConnectedLayer(numHiddenUnits) reluLayer fullyConnectedLayer(numClasses) softmaxLayer]; net = dlnetwork(layers);

Define the `modelLoss`

function, listed in the Model Loss Function section of the example. This function takes as input a neural network, input data, and targets. The function returns the loss and the gradients of the loss with respect to the network learnable parameters.

The `lbfgsupdate`

function requires a loss function with the syntax `[loss,gradients] = f(net)`

. Create a variable that parameterizes the evaluated `modelLoss`

function to take a single input argument.

lossFcn = @(net) dlfeval(@modelLoss,net,XTrain,TTrain);

Initialize an L-BFGS solver state object with a maximum history size of 3 and an initial inverse Hessian approximation factor of 1.1.

solverState = lbfgsState( ... HistorySize=3, ... InitialInverseHessianFactor=1.1);

Train the network for 10 epochs.

numEpochs = 10; for i = 1:numEpochs [net, solverState] = lbfgsupdate(net,lossFcn,solverState); end

**Model Loss Function**

The `modelLoss`

function takes as input a neural network `net`

, input data `X`

, and targets `T`

. The function returns the loss and the gradients of the loss with respect to the network learnable parameters.

function [loss, gradients] = modelLoss(net, X, T) Y = forward(net,X); loss = crossentropy(Y,T); gradients = dlgradient(loss,net.Learnables); end

## Algorithms

### Limited-Memory BFGS

The L-BFGS algorithm [1] is a quasi-Newton method that approximates the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. The L-BFGS algorithm is best suited for small networks and data sets that you can process in a single batch.

The algorithm updates learnable parameters *W* at iteration
*k+1* using the update step given by

$${W}_{k+1}={W}_{k}-{\eta}_{k}{B}_{k}^{-1}\nabla J({W}_{k}),$$

where *W _{k}* denotes the weights at iteration

*k*, $${\eta}_{k}$$ is the learning rate at iteration

*k*,

*B*is an approximation of the Hessian matrix at iteration

_{k}*k*, and $$\nabla J({W}_{k})$$ denotes the gradients of the loss with respect to the learnable parameters at iteration

*k*.

The L-BFGS algorithm computes the matrix-vector product $${B}_{k}^{-1}\nabla J({W}_{k})$$ directly. The algorithm does not require computing the inverse of
*B _{k}*.

To save memory, the L-BFGS algorithm does not store and invert the dense Hessian matrix
*B*. Instead, the algorithm uses the approximation $${B}_{k-m}^{-1}\approx {\lambda}_{k}I$$, where *m* is the history size, the inverse Hessian
factor $${\lambda}_{k}$$ is a scalar, and *I* is the identity matrix, and stores
the scalar inverse Hessian factor only. The algorithm updates the inverse Hessian factor at
each step.

To compute the matrix-vector product $${B}_{k}^{-1}\nabla J({W}_{k})$$ directly, the L-BFGS algorithm uses this recursive algorithm:

Set $$r={B}_{k-m}^{-1}\nabla J({W}_{k})$$, where

*m*is the history size.For $$i=m,\text{\hspace{0.17em}}\dots ,\text{\hspace{0.17em}}1$$:

Let $$\beta =\frac{1}{{s}_{k-i}^{\top}{y}_{k-i}}{y}_{k-i}^{\top}r$$, where $${s}_{k-i}$$ and $${y}_{k-i}$$ are the step and gradient differences for iteration $$k-i$$, respectively.

Set $$r=r+\text{}{s}_{k-i}\text{}\left({a}_{k-i}-\beta \right)$$, where $$a$$ is derived from $$s$$, $$y$$, and the gradients of the loss with respect to the loss function. For more information, see [1].

Return $${B}_{k}^{-1}\nabla J({W}_{k})=r$$.

## References

[1] Liu, Dong C.,
and Jorge Nocedal. "On the limited memory BFGS method for large scale optimization."
*Mathematical programming* 45, no. 1 (August 1989): 503-528. https://doi.org/10.1007/BF01589116.

## Version History

**Introduced in R2023a**

## Open Example

You have a modified version of this example. Do you want to open this example with your edits?

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)