Regularizer options object to train DDPG, TD3 and SAC agents
rlBehaviorCloningRegularizerOptions object to specify
behavioral cloning regularizer options to train a DDPG, TD3, or SAC agent. The only option you
can specify is the regularizer weight, which balances the actor loss with the behavioral
cloning penalty and is mostly useful to train agents offline (specifically to deal with
possible differences between the probability distribution of the dataset and the one generated
by the environment). To enable the behavioral cloning regularizer when training an agent, set
BatchDataRegularizerOptions property of the agent options object to
rlBehaviorCloningRegularizerOptions object that has your preferred
returns a default behavioral cloning regularizer options set.
bcOpts = rlBehaviorCloningRegularizerOptions
creates the behavioral cloning regularizer option set
bcOpts = rlBehaviorCloningRegularizerOptions(
bcOpts and sets
its properties using one or more name-value arguments.
BehaviorCloningRegularizerWeight — Behavioral cloning regularizer weight
2.5 (default) | positive scalar
Behavioral cloning regularizer weight, specified as a positive scalar. This weight controls the trade-off between the actor loss and the behavioral cloning penalty.
Specifically, the behavioral cloning regularizer k2(π(si)-ai)2 is added to the actor loss Lactor, where ai is an action from the minibatch (which stores N experiences) and π(si) is an action from the current actor given the observation si (also taken from the minibatch). The actor is therefore updated to minimize the loss function, L'actor:
Here the normalization term λ depends on the behavioral cloning weight Wbc, which regulates the importance of the standard Lactor factor:
The scaling factor k scales the regularization term to the appropriate action range:
Here Amx and Amn are the upper and lower limits of the action range. These limits are taken from the action specifications (or are otherwise estimated if unavailable).
To set Wbc, assign a value to the
BehaviorCloningRegularizerWeight property of the
rlBehaviorCloningRegularizerOptions object. For more information, see
Create Behavioral Cloning Regularizer Options Object
rlBehaviorCloningRegularizerOptions object specifying the
opt = rlBehaviorCloningRegularizerOptions( ...
opt = rlBehaviorCloningRegularizerOptions with properties: BehaviorCloningRegularizerWeight: 5
You can modify options using dot notation. For example, set
opt.BehaviorCloningRegularizerWeight = 3;
To specify this behavioral cloning option set for an agent, first create the agent options object. For this example, create a default
rlTD3AgentOptions object for a TD3 agent.
agentOpts = rlTD3AgentOptions;
Then, assign the
rlBehaviorCloningRegularizerOptions object to the
agentOpts.BatchDataRegularizerOptions = opt;
When you create the agent, use
agentOpts as the last input argument for the agent constructor function
 Fujimoto, Scott, and Shixiang Shane Gu. "A minimalist approach to offline reinforcement learning." Advances in Neural Information Processing Systems 34 (2021): 20132-20145.
Introduced in R2023a