RL DDPG Actions have high oscillation

Hello, I am using the DDPG Reinforcement learning toolbox in matlab to train a 3DOF robotic arm to move. the actions are joint torques, and although the actions reach the target, they are highly oscillating and noisy.
Can anyone help explane where this comes from ? ie: the algorithm itself, noise options ....
I am using the walking robot example to build noise options:
%% DDPG Agent Options
agentOptions = rlDDPGAgentOptions;
agentOptions.SampleTime = 0.025;
agentOptions.DiscountFactor = 0.99;
agentOptions.MiniBatchSize = 128;
agentOptions.ExperienceBufferLength = 5e5;
agentOptions.TargetSmoothFactor = 1e-3;
agentOptions.NoiseOptions.MeanAttractionConstant = 0.5;
agentOptions.NoiseOptions.Variance = 0.3;
agentOptions.NoiseOptions.VarianceDecayRate = 1e-5;
i think it might have something to do with MeanattractionConstant, varience, or varience decay. (by the way, the joint limits are between -3,3).
the actions i get look like this :

Answers (1)

Hi,
The noise options you are mentioning are only used during training and are essential for exploration. If the plots you are showing above are from training, you may consider reducing the noise variance a bit.
If the plots you are showing are from the trained agent, you can consider penalizing large action changes in your reward signal. That would help reduce the oscillatory content.
Hope this helps

8 Comments

Thank you for your help.
unfortunatly,the plots shown are after training. I think your suggestion is good; adding a penalty for torques cahnges. Can you help with the mathematical formua ?
I tries taking the actions outputs, then summation squared and multiplying with a negative fraction. however the computation becomes very large ( error message that a loop exists).
another thing I want to ask is: my freind uses RL in python /pybullet, he had the same issues but fixed it with some parameter called "substeps". apparently the 3 agent outputs at the same timestep will affect each other but if they are seperated it would be fixed. (we can see in the plots, when 1 oscillates the other 2 do so too).
is there any training setting like that in simulink?
Regarding the reward function, using the absolute value of the difference between current action and previous action (which you can get with a delay block) should work. You need to make sure that this penalty term scales properly with the rest of the reward terms you are using.
Regarding the rest of your comments, there are many things not clear:
1) you mention getting a loop error? Are you talking about an algebraic loop? How are you modeling your 3dof robot in Simulink?
2) You mentioned your friend had the same issues in python. Which issues are you referring to? Oscillating actions? Algebraic loops?
3) What do you mean by "separate the 3 agent outputs"? This is a coupled system as is, so obviously there is dependence between inputs/observations. Adding the penalty term on oscillations should help.
thank you, I will try the absolute value of the difference between current action and previous action.
1) yes an algebriac loop, it still works but takes longer time to compute. The robot is modeled as rigid bodies and joints inbetween; the joint block is actuated by torque input.
2) the issue he has was with oscillations not the loops sorry.
3) i think each 1 timestep output from agent is calculated for more than 1 timestep in simulink.
the agents output timestep is 0.025s
agentOptions.SampleTime = 0.025;
( however when i run the simulink options with a fixed timestep it gives me an error so i change it back to variable timesteps and it works.
but this causes the simulink timesteps to not be exaclty 400 ( 10 seconds simulation/0.025s per timestep =400 timesteps). i think this difference is what might cause oscillations.
-either way i will try a torque penalty and see what happenes, thank again for answering all my questions, i really appreciate it.
Got it. Regarding the solver, what error are you getting? Basically, if you use fixed-step solvers, the agent sample time should be a multiple of that step.
Also, you may be able to resolve the algebraic loop by adding a delay block in the reward calculation. You would need one time delay for the current agent output, and a 2-step delay for the prior agent output.
I had a doubt regarding states in RL
Can I use concatenation of several states to give it as a single state to my observation in RL block
I have signal which I am sampling at 1 sec and adding it and then giving it but obviously this means I am loosing a lot of information about my states so is it possible in matlab RL to use concatenation of several states to use it as a single state to my observation ?
@Emmanouil Tzorakoleftherakis Thank you again for your help. the delay block works fine, and a torque constraint helps abit.
I am now trying to get information from each episode ( if the episode reached success or failure). I am thinking of using the Reset function which is called before every episode.
As I mentioned I'm training a 3DOF robot. Basically if the EE eaches within 0.1m I want the episode to be a success. is this how to do it:
function in = kinovaResetFcn(in)
counter=0;
if d < 0.1
counter=counter+1;
else
counter=counter;
in = setVariable(in,'counter',counter);
end
d is the distance between EE and target ( I use simulink sensor block between EE frame and target frame to output x,y,z then calculate d).
However I think this code will place the counter in simulink not in workspace, and i want the counter to be in the workspace.
@Sourabh I use a Rate Transition block in simulink, before inputting in the obsercations to the agent:
Actually i have a signal and i want to sample that signal at interval of 4 sec to make a array and then feed that array to my observation. Can i do it using rate transition block

Sign in to comment.

Products

Release

R2021b

Asked:

on 8 Nov 2023

Commented:

on 15 Dec 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!