Reinforcement Learning Toolbox
Design and train policies using reinforcement learning
Reinforcement Learning Toolbox™ provides functions and blocks for training policies using reinforcement learning algorithms including DQN, A2C, and DDPG. You can use these policies to implement controllers and decision-making algorithms for complex systems such as robots and autonomous systems. You can implement the policies using deep neural networks, polynomials, or look-up tables.
The toolbox lets you train policies by enabling them to interact with environments represented by MATLAB® or Simulink® models. You can evaluate algorithms, experiment with hyperparameter settings, and monitor training progress. To improve training performance, you can run simulations in parallel on the cloud, computer clusters, and GPUs (with Parallel Computing Toolbox™ and MATLAB Parallel Server™).
Through the ONNX™ model format, existing policies can be imported from deep learning frameworks such as TensorFlow™ Keras and PyTorch (with Deep Learning Toolbox™). You can generate optimized C, C++, and CUDA code to deploy trained policies on microcontrollers and GPUs.
The toolbox includes reference examples for using reinforcement learning to design controllers for robotics and automated driving applications.
Reinforcement Learning Algorithms
Implement agents using Deep Q-Network (DQN), Advantage Actor Critic (A2C), Deep Deterministic Policy Gradients (DDPG), and other built-in algorithms. Use templates to implement custom agents for training policies.
Policy and Value Function Representation Using Deep Neural Networks
Use deep neural network policies for complex systems with large state-action spaces. Define policies using networks and architectures from Deep Learning Toolbox. Import ONNX models for interoperability with other deep learning frameworks.
Simulink Blocks for Agents
Implement and train reinforcement learning agents in Simulink.
Simulink and Simscape Environments
Use Simulink and Simscape™ models to represent an environment. Specify the observation, action, and reward signals within the model.
Use MATLAB functions and classes to represent an environment. Specify observation, action, and reward variables within the MATLAB file.
Distributed Computing and Multicore Acceleration
Speed up training by running parallel simulations on multicore computers, cloud resources, or compute clusters using Parallel Computing Toolbox and MATLAB Parallel Server.
Create twin-delayed deep deterministic policy gradient (TD3) agents that often exhibit better learning speed and performance than DDPG agents
New Agents for Continuous Action Spaces
Use PPO, TD3, AC, and PG agents with continuous action spaces
Create neural network policies using Long Short-Term Memory (LSTM) networks for DQN and PPO agents