rlBehaviorCloningRegularizerOptions

Regularizer options object to train DDPG, TD3 and SAC agents

Since R2023a

Description

Use an rlBehaviorCloningRegularizerOptions object to specify behavioral cloning regularizer options to train a DDPG, TD3, or SAC agent. The only option you can specify is the regularizer weight, which balances the actor loss with the behavioral cloning penalty and is mostly useful to train agents offline (specifically to deal with possible differences between the probability distribution of the dataset and the one generated by the environment). To enable the behavioral cloning regularizer when training an agent, set the BatchDataRegularizerOptions property of the agent options object to an rlBehaviorCloningRegularizerOptions object that has your preferred regularizer weight.

Creation

Syntax

bcOpts = rlBehaviorCloningRegularizerOptions

bcOpts = rlBehaviorCloningRegularizerOptions(PropertyName=Value)

Description

bcOpts = rlBehaviorCloningRegularizerOptions returns a default behavioral cloning regularizer options object.

bcOpts = rlBehaviorCloningRegularizerOptions(PropertyName=Value) creates the behavioral cloning regularizer option set bcOpts and sets its properties using one or more name-value arguments.

example

Properties

expand all

`BehaviorCloningRegularizerWeight` — Behavioral cloning regularizer weight
`2.5` (default) | positive scalar

Behavioral cloning regularizer weight, specified as a positive scalar. This weight controls the tradeoff between the actor loss and the behavioral cloning penalty.

Specifically, the behavioral cloning regularizer k²(π(s_i)-a_i)² is added to the actor loss L_actor, where a_i is an action from the mini-batch (which stores N experiences) and π(s_i) is an action from the current actor given the observation s_i (also taken from the mini-batch). The actor is therefore updated to minimize the loss function, L'_actor:

$L'_{a c t o r} = \frac{1}{N} \sum_{i = 1}^{N} (λ L_{a c t o r} (s_{i}, π (s_{i})) + k^{2} {(π (s_{i}) - a_{i})}^{2})$

Here the normalization term λ depends on the behavioral cloning weight W_bc, which regulates the importance of the standard L_actor factor:

$λ = \frac{W_{b c}}{\frac{1}{N} \sum_{i = 1}^{N} | Q (s_{i}, a_{i}) |}$

The scaling factor k scales the regularization term to the appropriate action range:

$k = \frac{2}{A_{m x} - A_{m n}}$

Here A_mx and A_mn are the upper and lower limits of the action range. These limits are taken from the action specifications (or are otherwise estimated if unavailable).

To set W_bc, assign a value to the BehaviorCloningRegularizerWeight property of the rlBehaviorCloningRegularizerOptions object. For more information, see [1].

Example: BehaviorCloningRegularizerWeight=5

Object Functions

Examples

collapse all

Create Behavioral Cloning Regularizer Options Object

Open Live Script

Create an rlBehaviorCloningRegularizerOptions object specifying the BehaviorCloningRegularizerWeight.

opt = rlBehaviorCloningRegularizerOptions( ...
    BehaviorCloningRegularizerWeight=5)

opt = 
  rlBehaviorCloningRegularizerOptions with properties:

    BehaviorCloningRegularizerWeight: 5

You can modify the options using dot notation. For example, set BehaviorCloningRegularizerWeight to 3.

opt.BehaviorCloningRegularizerWeight = 3;

To specify this behavioral cloning option set for an agent, first create the agent options object. For this example, create a default rlTD3AgentOptions object for a TD3 agent.

agentOpts = rlTD3AgentOptions;

Then, assign the rlBehaviorCloningRegularizerOptions object to the BatchDataRegularizerOptions property.

agentOpts.BatchDataRegularizerOptions  = opt;

When you create the agent, use agentOpts as the last input argument for the agent constructor function rlTD3Agent.

References

[1] Fujimoto, Scott, and Shixiang Shane Gu. "A minimalist approach to offline reinforcement learning." Advances in Neural Information Processing Systems 34 (2021): 20132-20145.

Version History

Introduced in R2023a

rlBehaviorCloningRegularizerOptions

Description

Creation

Syntax

Description

Properties

`BehaviorCloningRegularizerWeight` — Behavioral cloning regularizer weight
`2.5` (default) | positive scalar

Object Functions

Examples

Create Behavioral Cloning Regularizer Options Object

References

Version History

See Also

Objects

Topics

rlBehaviorCloningRegularizerOptions

Description

Creation

Syntax

Description

Properties

BehaviorCloningRegularizerWeight — Behavioral cloning regularizer weight 2.5 (default) | positive scalar

Object Functions

Examples

Create Behavioral Cloning Regularizer Options Object

References

Version History

See Also

Objects

Topics

`BehaviorCloningRegularizerWeight` — Behavioral cloning regularizer weight
`2.5` (default) | positive scalar