Dealing with delayed observations/reward in RL
Show older comments
Hi everyone,
I'm currently facing an issue: my agent can't learn to control the water tank system of the example below if I add an unit delay to the observation signal.
So, I just added a delay as the following picture shows:

Then, it seems that, the agent can no longer learn what action to take.
But, I guess it is a normal behavior since nothing in the network architecture allow it to learn signal time dependencies. This is why I tried to add Long short-term memory layers (LSTM), but I didn't succeed.
So, in general terms, is adding LSTM layers a good solution to this kind of problems? How can we give a chance to the agent to learn time dependencies in signals?
I'm using a DDPG agent and to add the LSTM layers I set the option UseRNN to true and I leaved the default architecture for the actor and the critic nets.
initOpts = rlAgentInitializationOptions(UseRNN=true)
I'm using the 2023b version and I suspect that the Matlab example doesn't work in the 2024a version.
This would be particularly useful for example for penalising agent for big actions (flow) - adding a penalty proportional to the action taken - or for penalising agent for big action variations - adding a penalty proportional
I added the result of my training below:

Strangely enough, we can see that the flow is always oscillating a little.
For this test, the reward has been slightly modified as follows:
reward = rewardFromTheMatlabExample + 2 / 20 * abs(error) + 2; % add a small continuous component to improve convergence
trainOpts.StopTrainingCriteria="none"; % remove the stopping criteria
Any help would be greatly appreciated!
Regards
1 Comment
Nicolas CRETIN
on 27 Aug 2024
Answers (0)
Categories
Find more on Reinforcement Learning Toolbox in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!