RL Agent External action not properly used in SAC

Question

0 votes

I am using the external action input of a Simulink RL Agent Block at the beginning of training, to guide the agent.

When using PPO, this was enough to let the agent also learn from those forced external actions.

When using SAC, the agent seems to only learn to output 0 with this setup. I finally found that adding the last_action input fixed the setup. In PPO this seems to happen internally.

This woraround is sufficent for me, so there is no need for an immediate solution. I just thought I would report this unexpected behavior. The documentation says, that the external action is used for learning, so I think the way it works with PPO is the desired outcome.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Hassaan on 11 Jan 2024

2 votes

Your observation about the different behaviors between the Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) algorithms is quite interesting and could be valuable for others.

In reinforcement learning, especially in complex environments or with sophisticated algorithms like PPO and SAC, the nuances of how the agent interacts with the environment and learns from actions can significantly impact the learning process and outcomes. Here are a few points to consider:

Algorithm-Specific Behaviors: PPO and SAC are fundamentally different in how they approach policy optimization. PPO tries to keep the new policy close to the old policy, while SAC aims for maximum entropy in addition to reward maximization. This difference might influence how they handle external actions and learning.
Learning from External Actions: The fact that adding the last_action input improved learning in SAC suggests that the algorithm might require explicit historical action data to learn effectively, which PPO might be handling internally.
Documentation and Expected Behavior: If the MATLAB documentation indicates that external actions are used for learning, but the behavior differs between algorithms, it could be worth bringing this to the attention of MathWorks through their support or community forums. This feedback could lead to improved documentation or even enhancements in future software updates.
Practical Solutions: Your workaround of adding the last_action input is a practical solution. In complex systems, such approaches are often necessary to achieve desired outcomes, even if they deviate from the expected or documented behavior.
Community Knowledge Sharing: Sharing these kinds of insights, as you've done, is beneficial for the community. It helps others who might face similar challenges and contributes to a collective understanding of these advanced tools.
Further Experimentation and Reporting: Continued experimentation with these algorithms, and reporting any unusual behaviors or discrepancies with expected outcomes, is valuable. Such feedback is often crucial for the continuous improvement of software tools and algorithms.

---------------------------------------------------------------------------------------------------------------------------------------------------

If you find the solution helpful and it resolves your issue, it would be greatly appreciated if you could accept the answer. Also, leaving an upvote and a comment are also wonderful ways to provide feedback.

Professional Interests

Technical Services and Consulting
Embedded Systems | Firmware Developement | Simulations
Electrical and Electronics Engineering

Feel free to contact me.

1 Comment
Show -1 older comments Hide -1 older comments

Lukas Pindl on 11 Jan 2024

Good Point! I will report it as a bug. Will see, what the Matlab team thinks.

Sign in to comment.

RL Agent External action not properly used in SAC

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

1 Comment
Show -1 older comments Hide -1 older comments

More Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

RL Agent External action not properly used in SAC

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

1 Comment Show -1 older comments Hide -1 older comments

More Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

1 Comment
Show -1 older comments Hide -1 older comments