Reinforcement Learning for Control of Non-Linear Valves
Updated 2 Apr 2021
* This code accompanies the paper titled "Reinforcement Learning for Control of Valves" https://doi.org/10.1016/j.mlwa.2021.100030
* The paper explores RL for optimum control of non-linear systems
* Platform: MATLAB's Reinforcement Learning ToolBox (release R2019a) and Simulink
* Run `main.m` to perform a test-run to ensure code is working. It runs 4 code files sequentially. It will train an agent with just 100 episodes, store it in `\results` folder, validate it against the PID, perform stability analysis (on an existing transfer-function data file, stored in `\data` folder) and produce plots and store them in `\results` folder.
#### Training the RL controller:
* `code_DDPG_Training.m`: Training code that uses DDPG to train an agent in a staged manner. Uses sm_DDPG_Training_Circuit.slx. This file is run iteratively, using Graded Learning to run on the previously stored model and enhancing it's "learning".
* `sm_DDPG_Training_Circuit.slx`: Simlulink model to train the agent to control a non-linear valve model
#### Experiment with trained controller:
* ` sm_Experimental_Setup.slx`: Simulink model to compare the DDPG agent controller with PID, and experiment with various noise signals and noise sources
* `code_Experimental_Setup.m`: Load a pre-trained model (RL controller) and run to see effect. Uses sm_Experimental_Setup.slx
#### Stability analysis of the RL controller:
* `code_SA_TF_Estimator.m`: Estimate a transfer function for the RL controller to perform stability analysis
* `sm_StabilityStudy.slx`: Simulink model used to estimate the transfer function
The paper https://arxiv.org/abs/2012.14668 explores RL for optimum control of non-linear systems.
We use the DDPG (Deep Deterministic Policy-Gradient) algorithm to control a non-linear valve modelled based on di Capaci and Scali (2018). While the code and paper use valves as a 'plant', the method and code is easily adaptable to any industrial plant.
Challenges associated with Reinforcement Learning (RL) are outlined in the paper. The paper explores "Graded Learning" to assist in efficiently training an RL agent. We decompose the training task into simpler objectives and train the agent in stages. The Graded Learning parameters will be based on your process and plant.
Note that Graded Learning is the simplest (and application/practise oriented) form of Curriculum Learning (Narvekar et al., 2020).
The paper and code uses the following elements as the controlled system:
1. An "industrial" process modelled using a first-order plus time-delay (FOPTD) transfer-function of the form
G(s) = k * exp(-L.s) / (1 + T.s)
where k = 3.8163, T = 156.46 and L is the time-delay parameter and L = 2.5
2. A non-linear valve modelled based on (di Capaci and Scali, 2018), characterized by two parameters:
Static friction or stiction: fS = 8.40
Dynamic friction: fD = 3.5243
## Running the code
### 1. Training the agent:
To train the agent, launch the Simulink model `sm_DDPG_Training_Circuit.slx` and then ensure variables are correctly set in the code file `code_DDPG_Training.m` and excute the code.
Review/set the following global and "Graded Learning" variables:
1. `MODELS_PATH`: Points to your base path for storing the models. Leave it to 'models' and the code will create a folder if it does not exist.
2. `VERSION`: Version suffix for your model, say "V1", or "Grade-1" etc. Ensure you change this so that a new model is created during each stage of the training process.
3. `VALVE_SIMULATION_MODEL`: Set to the Simulink model 'sm_DDPG_Training_Circuit'. In case you rename it you will have to set the name here.
4. `USE_PRE_TRAINED_MODEL = false`: To train the first model - or to train only a SINGLE model set to 'false'
To train a pre-trained model, i.e. apply Graded Learning set USE_PRE_TRAINED_MODEL = true;
5. `PRE_TRAINED_MODEL_FILE = 'Grade_I.mat'`: Set to file name of previous stage model. Example shown here is set to Grade_I model, to continue training an agent and create a Grade_II model.
6. `MAX_EPISODES = 1000`: This is the maximum episodes a training round lasts. Reduce this initally if you want to test it. However training a stable agent requires 1000 of episodes
Next set the Graded Learning parameters:
Graded Learning: We trained the agent in SIX stages (Grade-I to Grade-VI) by successively increasing the difficulty of the task. The parameters will be based on your process and plant. For this code, we used the following:
1. `TIME_DELAY` = Time-delay parameter (L) of the FOPTD process. Set as 0.1, 0.5, 1.5, 2.0 and 2.5
2. `fS` = Non-linear valve stiction. We use the following stages 1/10th of 8.5, followed by 1/5th, 1/2, 2/3 and finally full 8.4
3. `fD` = Non-linear valve dynamic friction. We used the same fractions as above for fS for fD, finally ending with the actual value of 3.5243
Rajesh Siraskar (2023). Reinforcement Learning for Control of Non-Linear Valves (https://github.com/Rajesh-Siraskar/Reinforcement-Learning-for-Control-of-Valves/releases/tag/v3.0), GitHub. Retrieved .
Siraskar, R. 2020.Reinforcement Learning for Control of Valves. https://www.sciencedirect.com/science/article/pii/S2666827021000116
MATLAB Release Compatibility
Platform CompatibilityWindows macOS Linux
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!Start Hunting!
Discover Live Editor
Create scripts with code, output, and formatted text in a single executable document.
See release notes for this release on GitHub: https://github.com/Rajesh-Siraskar/Reinforcement-Learning-for-Control-of-Valves/releases/tag/v3.0
See release notes for this release on GitHub: https://github.com/Rajesh-Siraskar/Reinforcement-Learning-for-Control-of-Valves/releases/tag/v2.0