How do I confirm whether the training for DDPG reinforcement learning agent is completed?
Show older comments
Hello MathWork Community,
I have been recently working on the training of a DDPG RL agent for electricity network control.
After several rounds of tweaking and training, I got a training curve like this: 

I read several questions & answers about the episode Q0 for the agent network with an actor and a critic. If I correctly understand this, when the agent is well trained, the episode Q0 should coverge to the average reward curve. However, it is not always the case and critic network may take longer to be trained.
As shown in my case, it seems that both average reward and episode Q0 have coverged, but to different values. Does this mean there is something wrong with the critic network or the reward function that pervent the critic to estimate the final episode reward correctly?
Btw, this agent is trained for 10000 episode and each episode has 168 steps.
Answers (1)
展 苏
on 3 May 2023
0 votes
It seems that in DDPG the average reward dosen't need to match the Q0. You can judge only by average reward. But I don't know why exactly.
Categories
Find more on Reinforcement Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!