RL DDPG Actions have high oscillation
Show older comments
Hello, I am using the DDPG Reinforcement learning toolbox in matlab to train a 3DOF robotic arm to move. the actions are joint torques, and although the actions reach the target, they are highly oscillating and noisy.
Can anyone help explane where this comes from ? ie: the algorithm itself, noise options ....
I am using the walking robot example to build noise options:
%% DDPG Agent Options
agentOptions = rlDDPGAgentOptions;
agentOptions.SampleTime = 0.025;
agentOptions.DiscountFactor = 0.99;
agentOptions.MiniBatchSize = 128;
agentOptions.ExperienceBufferLength = 5e5;
agentOptions.TargetSmoothFactor = 1e-3;
agentOptions.NoiseOptions.MeanAttractionConstant = 0.5;
agentOptions.NoiseOptions.Variance = 0.3;
agentOptions.NoiseOptions.VarianceDecayRate = 1e-5;
i think it might have something to do with MeanattractionConstant, varience, or varience decay. (by the way, the joint limits are between -3,3).
the actions i get look like this :



Answers (1)
Emmanouil Tzorakoleftherakis
on 9 Nov 2023
1 vote
Hi,
The noise options you are mentioning are only used during training and are essential for exploration. If the plots you are showing above are from training, you may consider reducing the noise variance a bit.
If the plots you are showing are from the trained agent, you can consider penalizing large action changes in your reward signal. That would help reduce the oscillatory content.
Hope this helps
8 Comments
Ahmad Al Ali
on 13 Nov 2023
Emmanouil Tzorakoleftherakis
on 13 Nov 2023
Regarding the reward function, using the absolute value of the difference between current action and previous action (which you can get with a delay block) should work. You need to make sure that this penalty term scales properly with the rest of the reward terms you are using.
Regarding the rest of your comments, there are many things not clear:
1) you mention getting a loop error? Are you talking about an algebraic loop? How are you modeling your 3dof robot in Simulink?
2) You mentioned your friend had the same issues in python. Which issues are you referring to? Oscillating actions? Algebraic loops?
3) What do you mean by "separate the 3 agent outputs"? This is a coupled system as is, so obviously there is dependence between inputs/observations. Adding the penalty term on oscillations should help.
Ahmad Al Ali
on 13 Nov 2023
Emmanouil Tzorakoleftherakis
on 13 Nov 2023
Got it. Regarding the solver, what error are you getting? Basically, if you use fixed-step solvers, the agent sample time should be a multiple of that step.
Also, you may be able to resolve the algebraic loop by adding a delay block in the reward calculation. You would need one time delay for the current agent output, and a 2-step delay for the prior agent output.
Sourabh
on 14 Dec 2023
I had a doubt regarding states in RL
Can I use concatenation of several states to give it as a single state to my observation in RL block
I have signal which I am sampling at 1 sec and adding it and then giving it but obviously this means I am loosing a lot of information about my states so is it possible in matlab RL to use concatenation of several states to use it as a single state to my observation ?
Ahmad Al Ali
on 14 Dec 2023
Ahmad Al Ali
on 14 Dec 2023
Sourabh
on 15 Dec 2023
Actually i have a signal and i want to sample that signal at interval of 4 sec to make a array and then feed that array to my observation. Can i do it using rate transition block
Categories
Find more on Agents in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!