Main Content

lbfgsState

State of limited-memory BFGS (L-BFGS) solver

Since R2023a

    Description

    An lbfgsState object stores information about steps in the L-BFGS algorithm.

    The L-BFGS algorithm [1] is a quasi-Newton method that approximates the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm. The L-BFGS algorithm is best suited for small networks and data sets that you can process in a single batch.

    Use lbfgsState objects in conjunction with the lbfgsupdate function to train a neural network using the L-BFGS algorithm.

    Creation

    Description

    example

    solverState = lbfgsState creates an L-BFGS state object with a history size of 10 and an initial inverse Hessian factor of 1.

    example

    solverState = lbfgsState(Name=Value) sets the HistorySize and InitialInverseHessianFactor properties using one or more name-value arguments.

    Properties

    expand all

    L-BFGS State

    Number of state updates to store, specified as a positive integer. Values between 3 and 20 suit most tasks.

    The L-BFGS algorithm uses a history of gradient calculations to approximate the Hessian matrix recursively. For more information, see Limited-Memory BFGS.

    After creating the lbfgsState object, this property is read-only.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    This property is read-only.

    Initial value that characterizes the approximate inverse Hessian matrix, specified as a positive scalar.

    To save memory, the L-BFGS algorithm does not store and invert the dense Hessian matrix B. Instead, the algorithm uses the approximation Bkm1λkI, where m is the history size, the inverse Hessian factor λk is a scalar, and I is the identity matrix, and stores the scalar inverse Hessian factor only. The algorithm updates the inverse Hessian factor at each step.

    The InitialInverseHessianFactor property is the value of λ0.

    For more information, see Limited-Memory BFGS.

    After creating the lbfgsState object, this property is read-only.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Value that characterizes the approximate inverse Hessian matrix, specified as a positive scalar.

    To save memory, the L-BFGS algorithm does not store and invert the dense Hessian matrix B. Instead, the algorithm uses the approximation Bkm1λkI, where m is the history size, the inverse Hessian factor λk is a scalar, and I is the identity matrix, and stores the scalar inverse Hessian factor only. The algorithm updates the inverse Hessian factor at each step.

    For more information, see Limited-Memory BFGS.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Step history, specified as a cell array.

    The L-BFGS algorithm uses a history of gradient calculations to approximate the Hessian matrix recursively. For more information, see Limited-Memory BFGS.

    Data Types: cell

    Gradients difference history, specified as a cell array.

    The L-BFGS algorithm uses a history of gradient calculations to approximate the Hessian matrix recursively. For more information, see Limited-Memory BFGS.

    Data Types: cell

    History indices, specified as a row vector.

    HistoryIndices is a 1-by-HistorySize vector, where StepHistory(i) and GradientsDifferenceHistory(i) correspond to iteration HistoryIndices(i).

    For more information, see Limited-Memory BFGS.

    Data Types: double

    Iteration Information

    This property is read-only.

    Loss, specified as a dlarray scalar, a numeric scalar, or [].

    If the state object is the output of the lbfgsupdate function, then Loss is the first output of the loss function that you pass to the lbfgsupdate function. Otherwise, Loss is [].

    This property is read-only.

    Gradients, specified as a dlarray object, a numeric array, a cell array, a structure, a table, or [].

    If the state object is the output of the lbfgsupdate function, then Gradients is the second output of the loss function that you pass to the lbfgsupdate function. Otherwise, Gradients is [].

    This property is read-only.

    Additional loss function outputs, specified as a cell array.

    If the state object is the output of the lbfgsupdate function, then AdditionalLossFunctionOutputs is a cell array containing additional outputs of the loss function that you pass to the lbfgsupdate function. Otherwise, AdditionalLossFunctionOutputs is a 1-by-0 cell array.

    Data Types: cell

    This property is read-only.

    Norm of the step, specified as a dlarray scalar, numeric scalar, or [].

    If the state object is the output of the lbfgsupdate function, then StepNorm is the norm of the step that the lbfgsupdate function calculates. Otherwise, StepNorm is [].

    This property is read-only.

    Norm of the gradients, specified as a dlarray scalar, a numeric scalar, or [].

    If the state object is the output of the lbfgsupdate function, then GradientsNorm is the norm of the second output of the loss function that you pass to the lbfgsupdate function. Otherwise, GradientsNorm is [].

    This property is read-only.

    Status of the line search algorithm, specified as "", "completed", or "failed".

    If the state object is the output of the lbfgsupdate function, then LineSearchStatus is one of these values:

    • "completed" — The algorithm finds a learning rate that satisfies the LineSearchMethod and MaxNumLineSearchIterations options that the lbfgsupdate function uses.

    • "failed" — The algorithm fails to find a learning rate that satisfies the LineSearchMethod and MaxNumLineSearchIterations options that the lbfgsupdate function uses.

    Otherwise, LineSearchStatus is "".

    This property is read-only.

    Method solver uses to find a suitable learning rate, specified as "weak-wolfe", "strong-wolfe", "backtracking", or [].

    If the state object is the output of the lbfgsupdate function, then LineSearchMethod is the line search method that the lbfgsupdate function uses. Otherwise, LineSearchMethod is "".

    This property is read-only.

    Maximum number of line search iterations, specified as a nonnegative integer.

    If the state object is the output of the lbfgsupdate function, then MaxNumLineSearchIterations is the maximum number of line search iterations that the lbfgsupdate function uses. Otherwise, MaxNumLineSearchIterations is 0.

    Data Types: double

    Examples

    collapse all

    Create an L-BFGS solver state object.

    solverState = lbfgsState
    solverState = 
      LBFGSState with properties:
    
                 InverseHessianFactor: 1
                          StepHistory: {}
           GradientsDifferenceHistory: {}
                       HistoryIndices: [1x0 double]
    
       Iteration Information
                                 Loss: []
                            Gradients: []
        AdditionalLossFunctionOutputs: {1x0 cell}
                        GradientsNorm: []
                             StepNorm: []
                     LineSearchStatus: ""
    
      Show all properties
    
    

    Load the iris flower dataset.

    [XTrain, TTrain] = iris_dataset;

    Convert the predictors to a dlarray object with format "CB" (channel, batch).

    XTrain = dlarray(XTrain,"CB");

    Define the network architecture.

    numInputFeatures = size(XTrain,1);
    numClasses = size(TTrain,1);
    numHiddenUnits = 32;
    
    layers = [
        featureInputLayer(numInputFeatures)
        fullyConnectedLayer(numHiddenUnits)
        reluLayer
        fullyConnectedLayer(numHiddenUnits)
        reluLayer
        fullyConnectedLayer(numClasses)
        softmaxLayer];
    
    net = dlnetwork(layers);

    Define the modelLoss function, listed in the Model Loss Function section of the example. This function takes as input a neural network, input data, and targets. The function returns the loss and the gradients of the loss with respect to the network learnable parameters.

    The lbfgsupdate function requires a loss function with the syntax [loss,gradients] = f(net). Create a variable that parameterizes the evaluated modelLoss function to take a single input argument.

    lossFcn = @(net) dlfeval(@modelLoss,net,XTrain,TTrain);

    Initialize an L-BFGS solver state object with a maximum history size of 3 and an initial inverse Hessian approximation factor of 1.1.

    solverState = lbfgsState( ...
        HistorySize=3, ...
        InitialInverseHessianFactor=1.1);

    Train the network for 10 epochs.

    numEpochs = 10;
    for i = 1:numEpochs
        [net, solverState] = lbfgsupdate(net,lossFcn,solverState);
    end

    Model Loss Function

    The modelLoss function takes as input a neural network net, input data X, and targets T. The function returns the loss and the gradients of the loss with respect to the network learnable parameters.

    function [loss, gradients] = modelLoss(net, X, T)
    
    Y = forward(net,X);
    loss = crossentropy(Y,T);
    gradients = dlgradient(loss,net.Learnables);
    
    end

    Algorithms

    expand all

    References

    [1] Liu, Dong C., and Jorge Nocedal. "On the limited memory BFGS method for large scale optimization." Mathematical programming 45, no. 1 (August 1989): 503-528. https://doi.org/10.1007/BF01589116.

    Version History

    Introduced in R2023a