If you scroll down to the "Validate Trained Agent" section, you will observe that the RL agent returns a set of fixed values for the proportional and integral gains.
Comparison to Fuzzy PID Controllers:
In the design of the Fuzzy PID controller, the control gains can change in real time, depending on the architecture of the controller. For example, human designers can intelligently use fuzzy rules to tune the PID parameters:
In the fixed-valued Fuzzy PID control architecture, it appears as follows:
where
,
, and
and fixed values.
Comparison to online-tuning algorithms:
Most online tuning algorithms typically adjust the parameters of a controller (such as gains), which subsequently determine the control action under dynamic operating conditions. The gains often change continuously or at preset intervals during operation. The algorithm observes the current error from a setpoint in real time and decides whether to update a parameter (such as increasing or decreasing a gain) to enhance future performance. The updated controller employs these new values to calculate the final control action. However, some optimization algorithms may adjust the control signals more directly, such as thrust and angle in interplanetary transfer missions, when the control law is either unavailable or overly complex.
In the example where the PI controller for the water tank is tuned by an RL agent, an offline optimization approach is employed because the system operates under static conditions (the size of the water tank does not change over time, and the water level setpoint is typically fixed). The offline algorithm conducts a test (such as a step response) to determine the "best" set of gains in a simulated environment. Once identified, these gains are fixed and used for standard operation until a human operator or a new trigger event initiates another tuning session.