PPO | RL | A single policy controlled many agents during training
Show older comments
Hi ,
I am currently working on PPO agent using RL and Parallel toolboxes. I read about this share policy to controlled 20 agents (as quoted below).
"During training, a single policy controlled 20 agents that interact with the enviroment. Though the 20 agents shared a single policy and same measured dataset, actions of each agent varied during a training session because of entropy regularization simulation samples and converging speed."
I wonder, how do set this condition while using RL toolbox.
Thank you in in advance.
Accepted Answer
More Answers (0)
Categories
Find more on Agents in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!