Reinforcement learning to tune a PI controller

20 views (last 30 days)
yiwei
yiwei on 15 Dec 2025 at 8:36
Commented: Sam Chak on 15 Dec 2025 at 16:10
I’ve been studying the MathWorks official example “Tune PI Controller Using Reinforcement Learning” (link: https://ww2.mathworks.cn/help/reinforcement-learning/ug/tune-pi-controller-using-td3.html?s_tid=srchtitle_site_search_3_TD3) and have some questions during the learning process.
1.When using reinforcement learning to tune a PI controller, is a fixed set of parameters (kp, ki) used for control in the end? (During the simulation, kp and ki do not change in real time, similar to a fuzzy-PID or BP-PID).
2.Will its control performance be comparable to online-tuning algorithms?

Accepted Answer

Sam Chak
Sam Chak about 6 hours ago
If you scroll down to the "Validate Trained Agent" section, you will observe that the RL agent returns a set of fixed values for the proportional and integral gains.
Comparison to Fuzzy PID Controllers:
In the design of the Fuzzy PID controller, the control gains can change in real time, depending on the architecture of the controller. For example, human designers can intelligently use fuzzy rules to tune the PID parameters:
In the fixed-valued Fuzzy PID control architecture, it appears as follows:
where , , and and fixed values.
Comparison to online-tuning algorithms:
Most online tuning algorithms typically adjust the parameters of a controller (such as gains), which subsequently determine the control action under dynamic operating conditions. The gains often change continuously or at preset intervals during operation. The algorithm observes the current error from a setpoint in real time and decides whether to update a parameter (such as increasing or decreasing a gain) to enhance future performance. The updated controller employs these new values to calculate the final control action. However, some optimization algorithms may adjust the control signals more directly, such as thrust and angle in interplanetary transfer missions, when the control law is either unavailable or overly complex.
In the example where the PI controller for the water tank is tuned by an RL agent, an offline optimization approach is employed because the system operates under static conditions (the size of the water tank does not change over time, and the water level setpoint is typically fixed). The offline algorithm conducts a test (such as a step response) to determine the "best" set of gains in a simulated environment. Once identified, these gains are fixed and used for standard operation until a human operator or a new trigger event initiates another tuning session.
  2 Comments
yiwei
yiwei 4 minutes ago
Thank you very much for your answer. This undoubtedly resolved my confusion.I would also like to ask whether reinforcement learning can be used for online tuning. If so, are there learning resources in this area? Thanks again.
Sam Chak
Sam Chak 14 minutes ago
The example of "Quadruped Robot Locomotion Using DDPG Agent" uses RL for online optimization. Instead of determining the control gains, which are commonly used in conventional strategies to calculate the control action, the RL agent directly generates eight control torque signals for the revolute joints of the robot's four legs.

Sign in to comment.

More Answers (0)

Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

Products


Release

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!