rlDQNAgentOptions

Create options for DQN agent

Description

opt = rlDQNAgentOptions creates an rlDQNAgentOptions object for use as an argument when creating a DQN agent using all default settings. You can modify the object properties using dot notation.

example

opt = rlDQNAgentOptions(Name,Value) creates a DQN options object using the specified name-value pairs to override default property values.

Examples

collapse all

Create an rlDQNAgentOptions object that specifies the agent mini-batch size.

opt = rlDQNAgentOptions('MiniBatchSize',48)
opt = 

  rlDQNAgentOptions with properties:

                           UseDoubleDQN: 1
               EpsilonGreedyExploration: [1×1 rl.option.EpsilonGreedyExploration]
                     TargetSmoothFactor: 1.0000e-03
                  TargetUpdateFrequency: 4
                     TargetUpdateMethod: "smoothing"
    ResetExperienceBufferBeforeTraining: 1
          SaveExperienceBufferWithAgent: 0
                          MiniBatchSize: 48
                    NumStepsToLookAhead: 1
                 ExperienceBufferLength: 10000
                             SampleTime: 1
                         DiscountFactor: 0.9900

You can modify options using dot notation. For example, set the agent sample time to 0.5.

opt.SampleTime = 0.5;

Input Arguments

collapse all

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: "UseDoubleDQN",false

Flag for using double DQN for value function target updates, specified as the comma-separated pair consisting of 'UseDoubleDQN' and a logical true or false. For most application set UseDoubleDQN to "on". For more information, see Deep Q-Network Agents.

Options for epsilon-greedy exploration, specified as the comma-separated pair consisting of 'EpsilonGreedyExploration' and an EpsilonGreedyExploration object with the following numeric value properties.

PropertyDescription
EpsilonProbability threshold to either randomly select an action or select the action that maximizes the state-action value function. A larger value of Epsilon means that the agent randomly explores the action space at a higher rate.
EpsilonMinMinimum value of Epsilon
EpsilonDecayDecay rate

At the end of each training time step, if Epsilon is greater than EpsilonMin, then it is updated using the following formula.

Epsilon = Epsilon*(1-EpsilonDecay)

To specify exploration options, use dot notation after creating the rlDQNAgentOptions object. For example, set the epsilon value to 0.9.

opt = rlDQNAgentOptions;
opt.EpsilonGreedyExploration.Epsilon = 0.9;

If your agent converges on local optima too quickly, promote agent exploration by increasing Epsilon.

Smoothing factor for target critic updates, specified as the comma-separated pair consisting of 'TargetSmoothFactor' and a double. The smoothing factor determines how the target critic properties are updated when TargetUpdateMethod is "smoothing".

Number of episodes between target critic updates, specified as the comma-separated pair consisting of 'TargetUpdateFrequency' and a numeric integer value. This option applies only when TargetUpdateMethod is "periodic".

Strategy for updating target critic properties using values from the trained actor and critic, specified as the comma-separated pair consisting of 'TargetUpdateMethod' and one of the following:

  • "smoothing" — Update the target critic properties, thetaTarget, at every training episode according to the following formula, where theta is are the current trained network properties:

    thetaTarget = TargetSmoothFactor*theta + (1 - TargetSmoothFactor)*thetaTarget
  • "periodic" — Update the target critic properties every TargetUpdateFrequency training episodes.

Flag for clearing the experience buffer before training, specified as the comma-separated pair consisting of 'ResetExperienceBufferBeforeTraining' and a logical true or false.

Flag for saving the experience buffer data when saving the agent, specified as the comma-separated pair consisting of 'SaveExperienceBufferWithAgent' and a logical true or false. This option applies both when saving candidate agents during training and when saving agents using the save function.

For some agents, such as those with a large experience buffer and image-based observations, the memory required for saving their experience buffer is large. In such cases, to not save the experience buffer data, set SaveExperienceBufferWithAgent to false.

If you plan to further train your saved agent, you can start training with the previous experience buffer as a starting point. In this case, set SaveExperienceBufferWithAgent to true.

Size of random experience mini-batch, specified as the comma-separated pair consisting of 'MiniBatchSize' and a positive numeric value. During each training episode, the agent randomly samples experiences from the experience buffer when computing gradients for updating the actor and critic properties.

Number of steps to look ahead during training, specified as the comma-separated pair consisting of 'NumStepsToLookAhead' and a numeric positive integer value.

Experience buffer size, specified as the comma-separated pair consisting of 'ExperienceBufferLength' and a numeric positive integer value. During training, the agent updates the critic using a mini-batch of experiences randomly sampled from the buffer.

In general, agents need to learn from both good and bad experiences. Specify an experience buffer size that is able to store enough experience for learning.

Sample time of agent, specified as the comma-separated pair consisting of 'SampleTime' and a numeric value.

Discount factor applied to future rewards during training, specified as the comma-separated pair consisting of 'DiscountFactor' and a positive numeric value less than or equal to 1.

Output Arguments

collapse all

DQN agent options, returned as an rlDQNAgentOptions object. The object properties are described in Name-Value Pair Arguments.

See Also

Functions

Introduced in R2019a