How do I confirm whether the training for DDPG reinforcement learning agent is completed?

Hello MathWork Community,
I have been recently working on the training of a DDPG RL agent for electricity network control.
After several rounds of tweaking and training, I got a training curve like this:
I read several questions & answers about the episode Q0 for the agent network with an actor and a critic. If I correctly understand this, when the agent is well trained, the episode Q0 should coverge to the average reward curve. However, it is not always the case and critic network may take longer to be trained.
As shown in my case, it seems that both average reward and episode Q0 have coverged, but to different values. Does this mean there is something wrong with the critic network or the reward function that pervent the critic to estimate the final episode reward correctly?
Btw, this agent is trained for 10000 episode and each episode has 168 steps.

Answers (1)

It seems that in DDPG the average reward dosen't need to match the Q0. You can judge only by average reward. But I don't know why exactly.

Categories

Find more on Reinforcement Learning Toolbox in Help Center and File Exchange

Products

Release

R2021b

Asked:

on 29 Oct 2021

Answered:

on 3 May 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!