pith. sign in

arxiv: 2508.16474 · v1 · pith:7K7WASYVnew · submitted 2025-08-22 · 📡 eess.SY · cs.LG· cs.SY· math.OC

Reinforcement Learning-based Control via Y-wise Affine Neural Networks (YANNs)

Pith reviewed 2026-05-21 22:45 UTC · model grok-4.3

classification 📡 eess.SY cs.LGcs.SYmath.OC
keywords reinforcement learningYANNmodel predictive controlpiecewise affinenonlinear controlsafety constraintsexplicit MPC
0
0 comments X

The pith

YANNs initialize RL actor-critic from exact linear MPC solutions and extend them to nonlinear control.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a reinforcement learning algorithm that uses Y-wise Affine Neural Networks to exactly encode the multi-parametric solutions of linear optimal control problems as the starting policy and value function. Additional layers are added so online training on the true nonlinear plant can adapt these representations while the linear solution remains available as a performance lower bound via continuous policy improvement. A sympathetic reader would care because the method begins with the interpretability and constraint-handling properties of explicit linear MPC yet gains the ability to solve general nonlinear problems, and the authors report clear gains over deep deterministic policy gradient especially under safety limits. The approach is illustrated on a clipped pendulum and a safety-critical chemical reactor.

Core claim

YANNs can exactly represent any piecewise-affine function defined on polytopic subdomains; therefore the explicit multi-parametric solution of a linear OCP and its associated state-action value function can be encoded directly into the actor and critic. Extra layers injected into the YANN architecture allow the networks to represent nonlinear maps that are then trained by direct interaction with the true plant, so the policy and value functions begin as the exact linear OCP solution and evolve into the solution of the nonlinear OCP.

What carries the argument

Y-wise Affine Neural Networks (YANNs), which exactly represent known piecewise affine functions of arbitrary input and output dimensions on any number of polytopic subdomains.

If this is right

  • The YANN actor begins as the exact optimal policy of the linear OCP and the critic as the exact value function for that OCP.
  • Continuous policy improvement guarantees that the final RL policy is at least as good as the linear solution.
  • Safety constraints are respected more reliably than with standard deep RL because the initial policy already satisfies them.
  • The same YANN architecture can be applied to any system for which an explicit multi-parametric linear MPC solution can be pre-computed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method may reduce the number of unsafe episodes during early training in safety-critical RL tasks.
  • Similar initialization strategies could be tested on other explicit control representations such as hybrid MPC or switched linear systems.
  • Stability or robustness certificates might be carried over from the linear solution into the early stages of nonlinear training.

Load-bearing premise

The explicit multi-parametric solutions obtained from an approximated linear model supply an initial policy and value function that are close enough to the true nonlinear system for online training to succeed.

What would settle it

Running YANN-RL and DDPG on the chemical-reactive system and finding that YANN-RL either violates the safety constraints or achieves lower cumulative reward than DDPG would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2508.16474 by Austin Braniff, Yuhe Tian.

Figure 1
Figure 1. Figure 1: A schematic of the proposed RL algorithm based on YANNs. [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The terminology of RL-based control used in this work. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Y-wise Affine Neural Network architecture. [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: YANN-actor network architecture. This structure can be interpreted in the following way. First, three vec￾tors are computed simultaneously: (i) a vector of binaries relating to the solutions of indicator functions for the subdomains of the piecewise control law (blue box in [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: To achieve an exact representation of this function via an NN, the [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
read the original abstract

This work presents a novel reinforcement learning (RL) algorithm based on Y-wise Affine Neural Networks (YANNs). YANNs provide an interpretable neural network which can exactly represent known piecewise affine functions of arbitrary input and output dimensions defined on any amount of polytopic subdomains. One representative application of YANNs is to reformulate explicit solutions of multi-parametric linear model predictive control. Built on this, we propose the use of YANNs to initialize RL actor and critic networks, which enables the resulting YANN-RL control algorithm to start with the confidence of linear optimal control. The YANN-actor is initialized by representing the multi-parametric control solutions obtained via offline computation using an approximated linear system model. The YANN-critic represents the explicit form of the state-action value function for the linear system and the reward function as the objective in an optimal control problem (OCP). Additional network layers are injected to extend YANNs for nonlinear expressions, which can be trained online by directly interacting with the true complex nonlinear system. In this way, both the policy and state-value functions exactly represent a linear OCP initially and are able to eventually learn the solution of a general nonlinear OCP. Continuous policy improvement is also implemented to provide heuristic confidence that the linear OCP solution serves as an effective lower bound to the performance of RL policy. The YANN-RL algorithm is demonstrated on a clipped pendulum and a safety-critical chemical-reactive system. Our results show that YANN-RL significantly outperforms the modern RL algorithm using deep deterministic policy gradient, especially when considering safety constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Y-wise Affine Neural Networks (YANNs) capable of exactly representing piecewise affine functions on arbitrary polytopic subdomains. It initializes RL actor and critic networks from explicit multi-parametric solutions of an approximated linear MPC problem, injects additional layers to extend to nonlinear dynamics, and trains online on the true nonlinear plant while using continuous policy improvement for heuristic lower-bound confidence. Demonstrations on a clipped pendulum and safety-critical chemical reactor claim significant outperformance over DDPG, particularly under safety constraints.

Significance. If the empirical results hold, the work offers a principled way to initialize RL policies from linear optimal control solutions, potentially improving safety and interpretability in nonlinear control tasks. The exact PWA representation property of YANNs is a clear technical strength for bridging explicit MPC and data-driven methods.

major comments (2)
  1. [Abstract and Numerical Examples] Abstract and demonstration sections: the central claim that YANN-RL 'significantly outperforms' DDPG (especially under safety constraints) rests on unverified demonstration; no quantitative metrics, reward curves, success rates, error bars, or tables comparing performance are referenced, making it impossible to assess the magnitude or statistical reliability of the reported gains.
  2. [Method (initialization and training)] Method sections on initialization and online training: the headline performance advantage depends on the multi-parametric linear solutions serving as an effective initial actor/critic for the true nonlinear system, yet no ablation study, sensitivity analysis to linear approximation quality, or isolation of the initialization contribution versus the YANN architecture and training procedure is provided. This is load-bearing for the claim that the linear OCP solution acts as a reliable lower bound.
minor comments (1)
  1. The manuscript would benefit from explicit pseudocode or a diagram clarifying the transition from the exact linear YANN representation to the trained nonlinear extension.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the YANN representation property and its potential to bridge explicit MPC with RL. We address each major comment below and will incorporate revisions to strengthen the empirical support and methodological analysis.

read point-by-point responses
  1. Referee: [Abstract and Numerical Examples] Abstract and demonstration sections: the central claim that YANN-RL 'significantly outperforms' DDPG (especially under safety constraints) rests on unverified demonstration; no quantitative metrics, reward curves, success rates, error bars, or tables comparing performance are referenced, making it impossible to assess the magnitude or statistical reliability of the reported gains.

    Authors: We agree that the abstract would benefit from explicit quantitative references to support the performance claims. In the revised manuscript, we will update the abstract to include key metrics such as average cumulative rewards, safety constraint violation rates, and success rates drawn from the numerical examples on the clipped pendulum and chemical reactor. We will also add a summary comparison table in the demonstration section that includes error bars, standard deviations across multiple runs, and references to the reward curves and success rate plots already present in the figures. This will make the magnitude and reliability of the gains directly verifiable. revision: yes

  2. Referee: [Method (initialization and training)] Method sections on initialization and online training: the headline performance advantage depends on the multi-parametric linear solutions serving as an effective initial actor/critic for the true nonlinear system, yet no ablation study, sensitivity analysis to linear approximation quality, or isolation of the initialization contribution versus the YANN architecture and training procedure is provided. This is load-bearing for the claim that the linear OCP solution acts as a reliable lower bound.

    Authors: We acknowledge that isolating the initialization contribution is important for substantiating the lower-bound claim. In the revised manuscript, we will add an ablation study comparing YANN-RL performance with the proposed linear multi-parametric initialization against random initialization and standard neural network warm-starts. We will also include a sensitivity analysis that varies the accuracy of the linear system approximation used to compute the explicit MPC solution and reports the resulting impact on final policy performance and safety metrics. These additions will directly address the load-bearing role of the initialization. revision: yes

Circularity Check

0 steps flagged

No circularity: linear mp-MPC initialization and online nonlinear training are independent of final performance metric

full rationale

The derivation begins with offline computation of explicit multi-parametric solutions for an approximated linear system, represented exactly by YANNs as piecewise-affine functions. These initialize the actor and critic, after which nonlinear layers are added and trained online via interaction with the true nonlinear plant. The final performance comparison to DDPG is obtained through simulation on the clipped pendulum and chemical reactor, not by algebraic reduction to the linear initialization or any fitted parameter. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the chain; the linear solution functions as an external warm-start rather than a constructed outcome of the RL procedure itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach depends on the computability of explicit multi-parametric solutions for linear systems and on the assumption that YANNs can be extended with additional layers while preserving initial exact representation.

axioms (1)
  • domain assumption YANNs can exactly represent known piecewise affine functions of arbitrary input and output dimensions defined on any amount of polytopic subdomains
    Stated directly in the abstract as the defining property enabling initialization from linear MPC solutions
invented entities (1)
  • YANNs no independent evidence
    purpose: Interpretable neural network architecture for exact representation of piecewise affine functions
    New architecture introduced to enable exact encoding of linear optimal control solutions

pith-pipeline@v0.9.0 · 5822 in / 1325 out tokens · 45614 ms · 2026-05-21T22:45:47.044917+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Reinforcement Learning-based Control via Y-wise Affine Neural Networks: Comparative Case Studies for Chemical Processes

    eess.SY 2026-05 unverdicted novelty 3.0

    YANN-RL is tested on three PC-Gym chemical process case studies, showing reduced training time and near-NMPC performance compared to PPO, SAC, DDPG, and TD3.

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages · cited by 1 Pith paper · 7 internal anchors

  1. [1]

    Dogru, J

    O. Dogru, J. Xie, O. Prakash, R. Chiplunkar, J. Soesanto, H. Chen, K. Velswamy, F. Ibrahim, B. Huang, Reinforcement Learning in Process Industries: Review and Perspective, IEEE/CAA Journal of Automatica Sinica 11 (2) (2024) 283–300

  2. [2]

    J. Shin, T. A. Badgwell, K.-H. Liu, J. H. Lee, Reinforcement Learn- ing – Overview of recent progress and implications for process control, Computers & Chemical Engineering 127 (2019) 282–294

  3. [3]

    V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, D. Hassabis, Human-level control through deep reinforcement learning, Nature 518 (7540) (2015) 529–533

  4. [4]

    Silver, J

    D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lilli- crap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, D. Hassabis, Mastering the game of Go without human knowledge, Nature 550 (7676) (2017) 354–359

  5. [5]

    Silver, A

    D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanc- tot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, D. Hassabis, Mas- tering the game of Go with deep neural networks and tree search, Nature 529 (758...

  6. [6]

    Kaufmann, L

    E. Kaufmann, L. Bauersfeld, A. Loquercio, M. M¨ uller, V. Koltun, D. Scaramuzza, Champion-level drone racing using deep reinforcement learning, Nature 620 (7976) (2023) 982–987

  7. [7]

    J. Wang, J. Zhao, A virtual entity of the digital twin based on deep re- inforcement learning model for dynamic scheduling process, in: F. Ma- nenti, G. V. Reklaitis (Eds.), Computer Aided Chemical Engineering, Vol. 53 of 34 European Symposium on Computer Aided Process Engi- neering / 15 International Symposium on Process Systems Engineering, Elsevier, 20...

  8. [8]

    C. D. Hubbs, C. Li, N. V. Sahinidis, I. E. Grossmann, J. M. Wassick, A deep reinforcement learning approach for chemical production schedul- ing, Computers & Chemical Engineering 141 (2020) 106982

  9. [9]

    Braniff, F

    A. Braniff, F. You, Y. Tian, Enhanced Reinforcement Learning-driven Process Design via Quantum Machine Learning, in: The 35th European Symposium on Computer Aided Process Engineering, Ghent, Belgium, 2025, pp. 1403–1408

  10. [10]

    Reynoso-Donzelli, L

    S. Reynoso-Donzelli, L. A. Ricardez-Sandoval, An integrated reinforce- ment learning framework for simultaneous generation, design, and con- trol of chemical process flowsheets, Computers & Chemical Engineering 194 (2025) 108988

  11. [11]

    Petsagkourakis, I

    P. Petsagkourakis, I. O. Sandoval, E. Bradford, D. Zhang, E. A. del Rio-Chanona, Reinforcement learning for batch bioprocess optimization, Computers & Chemical Engineering 133 (2020) 106649

  12. [12]

    Spielberg, A

    S. Spielberg, A. Tulsyan, N. P. Lawrence, P. D. Loewen, R. Bhushan Gopaluni, Toward self-driving processes: A deep rein- forcement learning approach to control, AIChE Journal 65 (10) (2019) e16689

  13. [13]

    R. d. R. Faria, B. D. O. Capron, M. B. de Souza Jr., A. R. Secchi, One-Layer Real-Time Optimization Using Reinforcement Learning: A Review with Guidelines, Processes 11 (1) (2023) 123

  14. [14]

    Joshi, S

    T. Joshi, S. Makker, H. Kodamana, H. Kandath, Twin actor twin de- layed deep deterministic policy gradient (TATD3) learning for batch process control, Computers & Chemical Engineering 155 (2021) 107527

  15. [15]

    Dogru, N

    O. Dogru, N. Wieczorek, K. Velswamy, F. Ibrahim, B. Huang, Online reinforcement learning for a continuous space system with experimental validation, Journal of Process Control 104 (2021) 86–100

  16. [16]

    Y. Wang, X. Zhu, Z. Wu, A tutorial review of policy iteration methods in reinforcement learning for nonlinear optimal control, Digital Chemical Engineering 15 (2025) 100231

  17. [17]

    Braniff, S

    A. Braniff, S. S. Akundi, Y. Liu, B. Dantas, S. S. Niknezhad, F. Khan, E. N. Pistikopoulos, Y. Tian, Real-time process safety and systems 30 decision-making toward safe and smart chemical manufacturing, Dig- ital Chemical Engineering 15 (2025) 100227

  18. [18]

    R. Nian, J. Liu, B. Huang, A review On reinforcement learning: In- troduction and applications in industrial process control, Computers & Chemical Engineering 139 (2020) 106886

  19. [19]

    R. d. R. Faria, B. D. O. Capron, A. R. Secchi, M. B. de Souza, Where Reinforcement Learning Meets Process Control: Review and Guidelines, Processes 10 (11) (2022) 2311

  20. [20]

    H. Yoo, H. E. Byun, D. Han, J. H. Lee, Reinforcement learning for batch process control: Review and perspectives, Annual Reviews in Control 52 (2021) 108–119

  21. [21]

    Hassanpour, X

    H. Hassanpour, X. Wang, B. Corbett, P. Mhaskar, A practically imple- mentable reinforcement learning-based process controller design, AIChE Journal 70 (1) (2024) e18245

  22. [22]

    Hassanpour, P

    H. Hassanpour, P. Mhaskar, B. Corbett, A practically implementable reinforcement learning control approach by leveraging offset-free model predictive control, Computers & Chemical Engineering 181 (2024) 108511

  23. [23]

    Hassanpour, B

    H. Hassanpour, B. Corbett, P. Mhaskar, A practical reinforcement learn- ing control design for nonlinear systems with input and output con- straints, Computers & Chemical Engineering 201 (2025) 109248

  24. [24]

    Y. Kim, T. H. Oh, Model-based safe reinforcement learning for non- linear systems under uncertainty with constraints tightening approach, Computers & Chemical Engineering 183 (2024) 108601

  25. [25]

    Y. Kim, J. W. Kim, Safe model-based reinforcement learning for non- linear optimal control with state and input constraints, AIChE Journal 68 (5) (2022) e17601

  26. [26]

    Y. Kim, J. M. Lee, Model-based reinforcement learning for nonlinear optimal control with practical asymptotic stability guarantees, AIChE Journal 66 (10) (2020) e16544. 31

  27. [27]

    Berkenkamp, M

    F. Berkenkamp, M. Turchetta, A. Schoellig, A. Krause, Safe Model- based Reinforcement Learning with Stability Guarantees, in: Advances in Neural Information Processing Systems, Vol. 30, Curran Associates, Inc., 2017

  28. [28]

    Bloor, A

    M. Bloor, A. Ahmed, N. Kotecha, M. Mercang¨ oz, C. Tsay, E. A. del R´ ıo-Chanona, Control-Informed Reinforcement Learning for Chemical Processes, Industrial & Engineering Chemistry Research 64 (9) (2025) 4966–4978

  29. [29]

    M. A. Chowdhury, S. S. S. Al-Wahaibi, Q. Lu, Entropy-maximizing TD3-based reinforcement learning for adaptive PID control of dynamical systems, Computers & Chemical Engineering 178 (2023) 108393

  30. [30]

    Dogru, K

    O. Dogru, K. Velswamy, F. Ibrahim, Y. Wu, A. S. Sundaramoorthy, B. Huang, S. Xu, M. Nixon, N. Bell, Reinforcement learning approach to autonomous PID tuning, Computers & Chemical Engineering 161 (2022) 107760

  31. [31]

    Beahr, D

    D. Beahr, D. Bhattacharyya, D. A. Allan, S. E. Zitney, Development of algorithms for augmenting and replacing conventional process control using reinforcement learning, Computers & Chemical Engineering 190 (2024) 108826

  32. [32]

    N. P. Lawrence, M. G. Forbes, P. D. Loewen, D. G. McClement, J. U. Backstr¨ om, R. B. Gopaluni, Deep reinforcement learning with shallow controllers: An experimental application to PID tuning, Control Engi- neering Practice 121 (2022) 105046

  33. [33]

    S. Gros, M. Zanon, Data-Driven Economic NMPC Using Reinforcement Learning, IEEE Transactions on Automatic Control 65 (2) (2020) 636– 648

  34. [34]

    Alhazmi, F

    K. Alhazmi, F. Albalawi, S. M. Sarathy, A reinforcement learning-based economic model predictive control framework for autonomous operation of chemical reactors, Chemical Engineering Journal 428 (2022) 130993

  35. [35]

    AC4MPC: Actor-critic reinforcement learning for nonlinear model predictive control,

    R. Reiter, A. Ghezzi, K. Baumg¨ artner, J. Hoffmann, R. D. McAllister, M. Diehl, AC4MPC: Actor-Critic Reinforcement Learning for Nonlinear Model Predictive Control (Jun. 2024). arXiv:2406.03995. 32

  36. [36]

    Hedrick, K

    E. Hedrick, K. Hedrick, D. Bhattacharyya, S. E. Zitney, B. Omell, Rein- forcement learning for online adaptation of model predictive controllers: Application to a selective catalytic reduction unit, Computers & Chem- ical Engineering 160 (2022) 107727

  37. [37]

    J. W. Kim, B. J. Park, T. H. Oh, J. M. Lee, Model-based reinforcement learning and predictive control for two-stage optimal control of fed-batch bioreactor, Computers & Chemical Engineering 154 (2021) 107465

  38. [38]

    Chang, S

    Y.-C. Chang, S. Gao, Stabilizing Neural Control Using Self-Learned Almost Lyapunov Critics, in: 2021 IEEE International Conference on Robotics and Automation (ICRA), IEEE, Xi’an, China, 2021, pp. 1803– 1809

  39. [39]

    Y. Chow, O. Nachum, E. Duenez-Guzman, M. Ghavamzadeh, A Lyapunov-based Approach to Safe Reinforcement Learning, in: Ad- vances in Neural Information Processing Systems, Vol. 31, Curran As- sociates, Inc., 2018

  40. [40]

    X. Zhu, Y. Wang, Z. Wu, Reinforcement learning for optimal control of stochastic nonlinear systems, AIChE Journal 71 (7) (2025) e18840

  41. [41]

    Y. Wang, Z. Wu, Control Lyapunov-barrier function-based safe rein- forcement learning for nonlinear optimal control, AIChE Journal 70 (3) (2024) e18306

  42. [42]

    Marvi, B

    Z. Marvi, B. Kiumarsi, Reinforcement Learning With Safety and Sta- bility Guarantees During Exploration For Linear Systems, IEEE Open Journal of Control Systems 1 (2022) 322–334

  43. [43]

    Thananjeyan, A

    B. Thananjeyan, A. Balakrishna, S. Nair, M. Luo, K. Srinivasan, M. Hwang, J. E. Gonzalez, J. Ibarz, C. Finn, K. Goldberg, Recovery RL: Safe Reinforcement Learning With Learned Recovery Zones, IEEE Robotics and Automation Letters 6 (3) (2021) 4915–4922

  44. [44]

    Zanon, S

    M. Zanon, S. Gros, Safe Reinforcement Learning Using Robust MPC, IEEE Transactions on Automatic Control 66 (8) (2021) 3638–3652

  45. [45]

    Y. Wang, M. Xiao, Z. Wu, Safe Transfer-Reinforcement-Learning-Based Optimal Control of Nonlinear Systems, IEEE Transactions on Cyber- netics 54 (12) (2024) 7272–7284. 33

  46. [46]

    S. Bo, B. T. Agyeman, X. Yin, J. Liu, Control invariant set enhanced safe reinforcement learning: Improved sampling efficiency, guaranteed stability and robustness, Computers & Chemical Engineering 179 (2023) 108413

  47. [47]

    Mowbray, P

    M. Mowbray, P. Petsagkourakis, E. A. del Rio-Chanona, D. Zhang, Safe chance constrained reinforcement learning for batch process con- trol, Computers & Chemical Engineering 157 (2022) 107630

  48. [48]

    Garcıa, F

    J. Garcıa, F. Fern´ andez, A comprehensive survey on safe reinforcement learning, Journal of Machine Learning Research 16 (1) (2015) 1437–1480

  49. [49]

    YANNs: Y-wise Affine Neural Networks for Exact and Efficient Representations of Piecewise Linear Functions

    A. Braniff, Y. Tian, YANNs: Y-wise Affine Neural Networks for Exact and Efficient Representations of Piecewise Linear Functions (May 2025). arXiv:2505.07054

  50. [50]

    E. N. Pistikopoulos, N. A. Diangelakis, R. Oberdieck, Multi-Parametric Optimization and Control, John Wiley & Sons, Ltd, 2020

  51. [51]

    Sutton, A

    R. Sutton, A. Barto, Reinforcement Learning, Second Edition: An In- troduction, Adaptive Computation and Machine Learning Series, MIT Press, 2018

  52. [52]

    S. L. Brunton, J. N. Kutz, Reinforcement Learning, Cambridge Univer- sity Press, 2022, p. 419–448

  53. [53]

    V. S. Devarakonda, W. Sun, X. Tang, Y. Tian, Recent Advances in Reinforcement Learning for Chemical Process Control, Processes 13 (6) (2025) 1791

  54. [54]

    Y. Tian, I. Pappas, B. Burnak, J. Katz, E. N. Pistikopoulos, Simulta- neous design & control of a reactive distillation system – A paramet- ric optimization & control approach, Chemical Engineering Science 230 (2021) 116232

  55. [55]

    Kenefake, E

    D. Kenefake, E. N. Pistikopoulos, PPOPT – Multiparametric solver for explicit MPC, in: Computer Aided Chemical Engineering, Vol. 51, Elsevier, 2022, pp. 1273–1278

  56. [56]

    Bradtke, Reinforcement Learning Applied to Linear Quadratic Regu- lation, in: Advances in Neural Information Processing Systems, Vol

    S. Bradtke, Reinforcement Learning Applied to Linear Quadratic Regu- lation, in: Advances in Neural Information Processing Systems, Vol. 5, Morgan-Kaufmann, 1992. 34

  57. [57]

    K. M. Patel, A practical Reinforcement Learning implementation ap- proach for continuous process control, Computers & Chemical Engi- neering 174 (2023) 108232

  58. [58]

    Panjapornpon, P

    C. Panjapornpon, P. Chinchalongporn, S. Bardeeniz, R. Makkayatorn, W. Wongpunnawat, Reinforcement Learning Control with Deep Deter- ministic Policy Gradient Algorithm for Multivariable pH Process, Pro- cesses 10 (12) (2022) 2514

  59. [59]

    Siraskar, Reinforcement learning for control of valves, Machine Learn- ing with Applications 4 (2021) 100030

    R. Siraskar, Reinforcement learning for control of valves, Machine Learn- ing with Applications 4 (2021) 100030

  60. [60]

    M. S. F. Bangi, J. S.-I. Kwon, Deep reinforcement learning control of hydraulic fracturing, Computers & Chemical Engineering 154 (2021) 107489

  61. [61]

    T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Sil- ver, D. Wierstra, Continuous control with deep reinforcement learning (Jul. 2019). arXiv:1509.02971

  62. [62]

    Kakade, J

    S. Kakade, J. Langford, Approximately optimal approximate reinforce- ment learning, in: Proceedings of the Nineteenth International Confer- ence on Machine Learning, ICML ’02, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2002, p. 267–274

  63. [63]

    Trust Region Policy Optimization

    J. Schulman, S. Levine, P. Moritz, M. I. Jordan, P. Abbeel, Trust Region Policy Optimization (Apr. 2017). arXiv:1502.05477

  64. [64]

    Pirotta, M

    M. Pirotta, M. Restelli, A. Pecorino, D. Calandriello, Safe Policy Itera- tion, in: Proceedings of the 30th International Conference on Machine Learning, PMLR, 2013, pp. 307–315

  65. [65]

    Scherrer, M

    B. Scherrer, M. Geist, Local Policy Search in a Convex Space and Con- servative Policy Iteration as Boosted Policy Search, in: T. Calders, F. Esposito, E. H¨ ullermeier, R. Meo (Eds.), Machine Learning and Knowledge Discovery in Databases, Springer, Berlin, Heidelberg, 2014, pp. 35–50

  66. [66]

    Safe Exploration in Continuous Action Spaces

    G. Dalal, K. Dvijotham, M. Vecerik, T. Hester, C. Paduraru, Y. Tassa, Safe Exploration in Continuous Action Spaces (Jan. 2018). arXiv:1801.08757. 35

  67. [67]

    Stochastic Variance-Reduced Policy Gradient

    M. Papini, D. Binaghi, G. Canonaco, M. Pirotta, M. Restelli, Stochastic Variance-Reduced Policy Gradient (Jun. 2018). arXiv:1806.05618

  68. [68]

    Brunke, M

    L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, A. P. Schoellig, Safe Learning in Robotics: From Learning-Based Control to Safe Reinforcement Learning, Annual Review of Control, Robotics, and Autonomous Systems 5 (Volume 5, 2022) (2022) 411–444

  69. [69]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford, O. Klimov, Proximal Policy Optimization Algorithms (Aug. 2017). arXiv:1707.06347

  70. [70]

    OpenAI Gym

    G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, W. Zaremba, Openai gym, arXiv preprint arXiv:1606.01540 (2016)

  71. [71]

    Braniff, Y

    A. Braniff, Y. Tian, A hierarchical multi-parametric programming ap- proach for dynamic risk-based model predictive quality control, Control Engineering Practice 152 (2024) 106062

  72. [72]

    M. Ali, X. Cai, F. I. Khan, E. N. Pistikopoulos, Y. Tian, Dynamic risk- based process design and operational optimization via multi-parametric programming, Digital Chemical Engineering 7 (2023) 100096. 36