A reinforcement learning agent receives observations and a reward from the environment. Using its policy, the agent selects an action based on the observations and reward, and returns the action to the environment. During training, the agent continuously updates the policy parameters based on the action, observations, and reward. Doing so, allows the agent to learn the optimal policy for the given environment and reward signal.
Reinforcement Learning Toolbox™ software provides reinforcement learning agents that use several common algorithms, such as SARSA, DQN, DDPG, and PPO. You can also implement other agent algorithms by creating your own custom agents.
For more information, see Reinforcement Learning Agents. For more information on defining policy representations, see Create Policies and Value Functions.
|Reinforcement Learning Designer||Design, train, and simulate reinforcement learning agents|
|RL Agent||Reinforcement learning agent|
|Q-learning reinforcement learning agent|
|SARSA reinforcement learning agent|
|Deep Q-network (DQN) reinforcement learning agent|
|Actor-critic (AC) reinforcement learning agent|
|Policy gradient (PG) reinforcement learning agent|
|Deep deterministic policy gradient (DDPG) reinforcement learning agent|
|Twin-delayed deep deterministic (TD3) policy gradient reinforcement learning agent|
|Soft actor-critic (SAC) reinforcement learning agent|
|Proximal policy optimization (PPO) reinforcement learning agent|
|Trust region policy optimization (TRPO) reinforcement learning agent|
|Options for Q-learning agent|
|Options for SARSA agent|
|Options for DQN agent|
|Options for PG agent|
|Options for DDPG agent|
|Options for TD3 agent|
|Options for AC agent|
|Options for PPO agent|
|Options for TRPO agent|
|Options for SAC agent|
|Options for initializing reinforcement learning agents|
|Regularizer options object to train DQN and SAC agents|
|Regularizer options object to train DDPG, TD3 and SAC agents|
Model-Based Policy Optimization
|Model-based policy optimization (MBPO) reinforcement learning agent|
|Options for MBPO agent|
Get and Set Actors and Critics
|Replay memory experience buffer|
|Replay memory experience buffer with prioritized sampling|
|Hindsight replay memory experience buffer|
|Hindsight replay memory experience buffer with prioritized sampling|
|Append experiences to replay memory buffer|
|Sample experiences from replay memory buffer|
|Resize replay memory experience buffer|
|Return all experiences in replay memory buffer|
|Validate experiences for replay memory|
|Generate hindsight experiences from hindsight experience replay buffer|
Observation and Action Specifications
|Obtain action data specifications from reinforcement learning environment, agent, or experience buffer|
|Obtain observation data specifications from reinforcement learning environment, agent, or experience buffer|
Reset Agent or Experience Buffer
- Reinforcement Learning Agents
You can create an agent using one of several standard reinforcement learning algorithms or define your own custom agent.
- Create Agents Using Reinforcement Learning Designer
Interactively create or import agents for training using the Reinforcement Learning Designer app.
- Q-Learning Agents
Create Q-learning agents for reinforcement learning.
- SARSA Agents
Create SARSA agents for reinforcement learning.
- Deep Q-Network (DQN) Agents
Create DQN agents for reinforcement learning.
- Policy Gradient (PG) Agents
Create policy gradient agents for reinforcement learning.
- Deep Deterministic Policy Gradient (DDPG) Agents
Create DDPG agents for reinforcement learning.
- Twin-Delayed Deep Deterministic (TD3) Policy Gradient Agents
Create TD3 agents for reinforcement learning.
- Actor-Critic (AC) Agents
Create actor-critic agents for reinforcement learning.
- Proximal Policy Optimization (PPO) Agents
Create PPO agents for reinforcement learning.
- Trust Region Policy Optimization (TRPO) Agents
Create TRPO agents for reinforcement learning.
- Soft Actor-Critic (SAC) Agents
Create SAC agents for reinforcement learning.
- Model-Based Policy Optimization (MBPO) Agents
A model-based (MBPO) reinforcement learning agent learns a model of its environment that it can use to generate additional experiences for training.
- Create Custom Reinforcement Learning Agents
Create agents that use custom reinforcement learning algorithms.