Deep Deterministic Policy Gradient Agents (DDPG at Reinforcement Learning), actor output is oscilating a few times then got stuck on the minimum.
Show older comments
Hi
I am not experienced on Simulink and RL. I have tried to simulate a very simple scenario to test DDPG before implementing my complex system. The agent is randomly placed around (0,0) and the goal is to move to (500,500) or its nearby.
But it doesn't work for me. The action output (2x1) should be continuous in the range [-2 2]. For the first few episodes, the output oscillates between max and min and then stay on the minimum for the rest of the episodes.


I changed deep network settings as well as RL options but same problem. I have changed the output range and make it (-inf inf) with saturation but still the same. Also, I simulate it for a few thousand episodes but the same problem.
Codes are attached.
Accepted Answer
More Answers (1)
Emmanouil Tzorakoleftherakis
on 30 Mar 2020
Edited: Emmanouil Tzorakoleftherakis
on 30 Mar 2020
0 votes
Hi Samir,
After reviewing your model, if you check the actions the agent outputs, they blow up to infinity. That should not be possible given that the last layer in your actor is a tanh layer. The problem is actually in the plant dynamics. In some instances the observations that are fed to the agent are NaN, which leads to this behavior.
Hope that helps
Categories
Find more on Agents in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!