lbfgsState
Description
An lbfgsState
object stores information about steps in the
L-BFGS algorithm.
The L-BFGS algorithm [1] is a quasi-Newton method that approximates the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. The L-BFGS algorithm is best suited for small networks and data sets that you can process in a single batch.
Use lbfgsState
objects in conjunction with the lbfgsupdate
function to train a neural network using the L-BFGS algorithm.
Creation
Description
creates an L-BFGS
state object with a history size of 10 and an initial inverse Hessian factor of 1.solverState
= lbfgsState
sets the solverState
= lbfgsState(Name=Value
)HistorySize
and InitialInverseHessianFactor
properties using one or more name-value
arguments.
Properties
L-BFGS State
HistorySize
— Number of state updates to store
10
(default) | positive integer
Number of state updates to store, specified as a positive integer. Values between 3 and 20 suit most tasks.
The L-BFGS algorithm uses a history of gradient calculations to approximate the Hessian matrix recursively. For more information, see Limited-Memory BFGS.
After creating the lbfgsState
object, this property is
read-only.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
InitialInverseHessianFactor
— Initial value that characterizes approximate inverse Hessian matrix
1
(default) | positive scalar
This property is read-only.
Initial value that characterizes the approximate inverse Hessian matrix, specified as a positive scalar.
To save memory, the L-BFGS algorithm does not store and invert the dense Hessian matrix B. Instead, the algorithm uses the approximation , where m is the history size, the inverse Hessian factor is a scalar, and I is the identity matrix, and stores the scalar inverse Hessian factor only. The algorithm updates the inverse Hessian factor at each step.
The InitialInverseHessianFactor
property is the value of .
For more information, see Limited-Memory BFGS.
After creating the lbfgsState
object, this property is
read-only.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
InverseHessianFactor
— Value that characterizes approximate inverse Hessian matrix
1
(default) | positive scalar
Value that characterizes the approximate inverse Hessian matrix, specified as a positive scalar.
To save memory, the L-BFGS algorithm does not store and invert the dense Hessian matrix B. Instead, the algorithm uses the approximation , where m is the history size, the inverse Hessian factor is a scalar, and I is the identity matrix, and stores the scalar inverse Hessian factor only. The algorithm updates the inverse Hessian factor at each step.
For more information, see Limited-Memory BFGS.
Data Types: single
| double
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
StepHistory
— Step history
{}
(default) | cell array
Step history, specified as a cell array.
The L-BFGS algorithm uses a history of gradient calculations to approximate the Hessian matrix recursively. For more information, see Limited-Memory BFGS.
Data Types: cell
GradientsDifferenceHistory
— Gradients difference history
{}
(default) | cell array
Gradients difference history, specified as a cell array.
The L-BFGS algorithm uses a history of gradient calculations to approximate the Hessian matrix recursively. For more information, see Limited-Memory BFGS.
Data Types: cell
HistoryIndices
— History indices
0-by-1 vector (default) | row vector
History indices, specified as a row vector.
HistoryIndices
is a 1-by-HistorySize
vector, where StepHistory(i)
and
GradientsDifferenceHistory(i)
correspond to iteration
HistoryIndices(i)
.
For more information, see Limited-Memory BFGS.
Data Types: double
Iteration Information
Loss
— Loss
[]
(default) | dlarray
scalar | numeric scalar
This property is read-only.
Loss, specified as a dlarray
scalar, a numeric scalar, or []
.
If the state object is the output of the lbfgsupdate
function, then Loss
is the first output of the loss function that
you pass to the lbfgsupdate
function. Otherwise, Loss
is []
.
Gradients
— Gradients
[]
(default) | dlarray
object | numeric array | cell array | structure | table
This property is read-only.
Gradients, specified as a dlarray
object, a numeric array, a cell array, a structure, a table, or
[]
.
If the state object is the output of the lbfgsupdate
function, then Gradients
is the second output of the loss
function that you pass to the lbfgsupdate
function. Otherwise, Gradients
is []
.
AdditionalLossFunctionOutputs
— Additional loss function outputs
1-by-0 cell array (default) | cell array
This property is read-only.
Additional loss function outputs, specified as a cell array.
If the state object is the output of the lbfgsupdate
function, then AdditionalLossFunctionOutputs
is a cell array
containing additional outputs of the loss function that you pass to the lbfgsupdate
function. Otherwise, AdditionalLossFunctionOutputs
is a 1-by-0
cell array.
Data Types: cell
StepNorm
— Norm of step
[]
(default) | dlarray
scalar | numeric scalar
This property is read-only.
Norm of the step, specified as a dlarray
scalar, numeric scalar, or []
.
If the state object is the output of the lbfgsupdate
function, then StepNorm
is the norm of the step that the
lbfgsupdate
function calculates. Otherwise, StepNorm
is
[]
.
GradientsNorm
— Norm of gradients
[]
(default) | dlarray
scalar | numeric scalar
This property is read-only.
Norm of the gradients, specified as a dlarray
scalar, a numeric scalar, or []
.
If the state object is the output of the lbfgsupdate
function, then GradientsNorm
is the norm of the second output of
the loss function that you pass to the lbfgsupdate
function. Otherwise, GradientsNorm
is
[]
.
LineSearchStatus
— Status of line search algorithm
""
(default) | "completed"
| "failed"
This property is read-only.
Status of the line search algorithm, specified as ""
,
"completed"
, or "failed"
.
If the state object is the output of the lbfgsupdate
function, then LineSearchStatus
is one of these values:
"completed"
— The algorithm finds a learning rate that satisfies theLineSearchMethod
andMaxNumLineSearchIterations
options that thelbfgsupdate
function uses."failed"
— The algorithm fails to find a learning rate that satisfies theLineSearchMethod
andMaxNumLineSearchIterations
options that thelbfgsupdate
function uses.
Otherwise, LineSearchStatus
is ""
.
LineSearchMethod
— Method solver uses to find suitable learning rate
""
(default) | "weak-wolfe"
| "strong-wolfe"
| "backtracking"
This property is read-only.
Method solver uses to find a suitable learning rate, specified as
"weak-wolfe"
, "strong-wolfe"
,
"backtracking"
, or []
.
If the state object is the output of the lbfgsupdate
function, then LineSearchMethod
is the line search method that
the lbfgsupdate
function uses. Otherwise, LineSearchMethod
is
""
.
MaxNumLineSearchIterations
— Maximum number of line search iterations
0
(default) | nonnegative integer
This property is read-only.
Maximum number of line search iterations, specified as a nonnegative integer.
If the state object is the output of the lbfgsupdate
function, then MaxNumLineSearchIterations
is the maximum number
of line search iterations that the lbfgsupdate
function uses. Otherwise, MaxNumLineSearchIterations
is
0
.
Data Types: double
Examples
Create L-BFGS Solver State Object
Create an L-BFGS solver state object.
solverState = lbfgsState
solverState = LBFGSState with properties: InverseHessianFactor: 1 StepHistory: {} GradientsDifferenceHistory: {} HistoryIndices: [1x0 double] Iteration Information Loss: [] Gradients: [] AdditionalLossFunctionOutputs: {1x0 cell} GradientsNorm: [] StepNorm: [] LineSearchStatus: "" Show all properties
Update Learnable Parameters in Neural Network
Load the iris flower dataset.
[XTrain, TTrain] = iris_dataset;
Convert the predictors to a dlarray
object with format "CB"
(channel, batch).
XTrain = dlarray(XTrain,"CB");
Define the network architecture.
numInputFeatures = size(XTrain,1); numClasses = size(TTrain,1); numHiddenUnits = 32; layers = [ featureInputLayer(numInputFeatures) fullyConnectedLayer(numHiddenUnits) reluLayer fullyConnectedLayer(numHiddenUnits) reluLayer fullyConnectedLayer(numClasses) softmaxLayer]; net = dlnetwork(layers);
Define the modelLoss
function, listed in the Model Loss Function section of the example. This function takes as input a neural network, input data, and targets. The function returns the loss and the gradients of the loss with respect to the network learnable parameters.
The lbfgsupdate
function requires a loss function with the syntax [loss,gradients] = f(net)
. Create a variable that parameterizes the evaluated modelLoss
function to take a single input argument.
lossFcn = @(net) dlfeval(@modelLoss,net,XTrain,TTrain);
Initialize an L-BFGS solver state object with a maximum history size of 3 and an initial inverse Hessian approximation factor of 1.1.
solverState = lbfgsState( ... HistorySize=3, ... InitialInverseHessianFactor=1.1);
Train the network for 10 epochs.
numEpochs = 10; for i = 1:numEpochs [net, solverState] = lbfgsupdate(net,lossFcn,solverState); end
Model Loss Function
The modelLoss
function takes as input a neural network net
, input data X
, and targets T
. The function returns the loss and the gradients of the loss with respect to the network learnable parameters.
function [loss, gradients] = modelLoss(net, X, T) Y = forward(net,X); loss = crossentropy(Y,T); gradients = dlgradient(loss,net.Learnables); end
Algorithms
Limited-Memory BFGS
The L-BFGS algorithm [1] is a quasi-Newton method that approximates the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. The L-BFGS algorithm is best suited for small networks and data sets that you can process in a single batch.
The algorithm updates learnable parameters W at iteration k+1 using the update step given by
where Wk denotes the weights at iteration k, is the learning rate at iteration k, Bk is an approximation of the Hessian matrix at iteration k, and denotes the gradients of the loss with respect to the learnable parameters at iteration k.
The L-BFGS algorithm computes the matrix-vector product directly. The algorithm does not require computing the inverse of Bk.
To save memory, the L-BFGS algorithm does not store and invert the dense Hessian matrix B. Instead, the algorithm uses the approximation , where m is the history size, the inverse Hessian factor is a scalar, and I is the identity matrix, and stores the scalar inverse Hessian factor only. The algorithm updates the inverse Hessian factor at each step.
To compute the matrix-vector product directly, the L-BFGS algorithm uses this recursive algorithm:
Set , where m is the history size.
For :
Let , where and are the step and gradient differences for iteration , respectively.
Set , where is derived from , , and the gradients of the loss with respect to the loss function. For more information, see [1].
Return .
References
[1] Liu, Dong C., and Jorge Nocedal. "On the limited memory BFGS method for large scale optimization." Mathematical programming 45, no. 1 (August 1989): 503-528. https://doi.org/10.1007/BF01589116.
Version History
Introduced in R2023a
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)