A Review On Safe Reinforcement Learning Using Lyapunov and Barrier Functions
Pith reviewed 2026-05-18 22:34 UTC · model grok-4.3
The pith
Safe RL has shifted from model-based to model-free formulations since 2017, with combined Lyapunov-barrier methods now most active, while high-dimensional deployment remains the chief barrier.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The field has shifted decisively from model-based to model-free formulations since 2017, with combined CLF-CBF approaches becoming the most active sub-area post-2022. Per-class open problems are now well-defined: certificate validity under function approximation and distribution shift for Lyapunov methods, feasibility and deadlock under hard CBF-QP shielding for barrier methods, and joint CLF-CBF feasibility under model uncertainty for combined methods. Deployment to high-dimensional and partially observable settings remains the dominant scalability barrier across all three classes.
What carries the argument
A three-way classification of safe RL methods into Lyapunov-based approaches for stability certificates, barrier-based approaches for constraint enforcement, and their combinations, used to chart the move from model-based to model-free implementations and to isolate per-class open problems.
If this is right
- Lyapunov-based methods will need new ways to maintain certificate validity when function approximators are used or when the data distribution changes.
- Barrier-based methods must resolve issues of infeasibility and deadlock that arise from hard quadratic-program shielding.
- Combined CLF-CBF approaches require techniques that preserve joint feasibility when the system model is uncertain.
- All classes will require advances before reliable deployment becomes feasible in high-dimensional or partially observable environments.
Where Pith is reading between the lines
- The identified open problems suggest that hybrid learning-control architectures could become a natural next step for closing the gap between theoretical guarantees and practical performance.
- The scalability barrier implies that dimensionality-reduction or hierarchical safety layers may be needed before these methods reach real-world high-dimensional tasks.
- The documented shift toward model-free combined methods could encourage benchmark suites that directly compare the three classes on shared safety metrics.
Load-bearing premise
The claimed trends and open problems depend on the surveyed papers forming a representative sample of the literature and on the manual categorization correctly reflecting dominant directions without major omissions.
What would settle it
A broad search that locates a large body of recent papers showing continued dominance of model-based methods or that uncovers major open problems absent from the review's per-class list would contradict the reported shifts and classification.
Figures
read the original abstract
Reinforcement learning (RL) has proven to be particularly effective in solving complex decision-making problems for a wide range of applications. Safe reinforcement learning refers to a class of constrained problems where the constraint violations lead to partial or complete system failure. The goal of this review is to provide an overview of safe RL techniques using Lyapunov and barrier functions to guarantee this notion of safety (stability of the system in terms of a computed policy and constraint satisfaction during training and deployment). Three concrete takeaways emerge from our analysis: (i) the field has shifted decisively from model-based to model-free formulations since 2017, with combined CLF-CBF approaches becoming the most active sub-area post-2022; (ii) per-class open problems are now well-defined, certificate validity under function approximation and distribution shift for Lyapunov methods, feasibility and deadlock under hard CBF-QP shielding for barrier methods, and joint CLF--CBF feasibility under model uncertainty for combined methods; and (iii) deployment to high-dimensional and partially observable settings remains the dominant scalability barrier across all three classes. The different approaches employed are discussed in detail along with their shortcomings and benefits to provide critique and possible future research directions. The review demonstrates promising scope for providing safety guarantees for complex dynamical systems with operational constraints using model-based and model-free RL.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a review of safe reinforcement learning techniques that employ Lyapunov (CLF) and barrier (CBF) functions to provide guarantees on stability and constraint satisfaction during training and deployment. It analyzes the literature to extract three takeaways: a shift from model-based to model-free formulations since 2017 with combined CLF-CBF methods becoming most active post-2022; well-defined per-class open problems (certificate validity under approximation for Lyapunov, feasibility/deadlock for barrier, joint feasibility under uncertainty for combined); and scalability barriers in high-dimensional and partially observable settings. The review discusses approaches, benefits, shortcomings, and future directions.
Significance. If the underlying literature sample is representative, the review would offer a useful synthesis by distilling trends, articulating concrete open problems per method class, and critiquing shortcomings of existing CLF, CBF, and hybrid approaches. Explicitly naming per-class challenges and scalability issues could help focus research on deployment-relevant questions in constrained dynamical systems.
major comments (2)
- [Abstract and §1] Abstract and §1: The central claims of a 'decisive shift' from model-based to model-free since 2017 and combined CLF-CBF as the 'most active sub-area post-2022' are load-bearing for takeaway (i) yet rest on an unquantified manual categorization. No table, figure, or count of papers by year and category is referenced to support the trend statements or to allow readers to check for counter-examples in model-based or high-dimensional work.
- [§2] §2 (or equivalent methods section): The review provides no description of the literature search protocol, including databases, keywords, time bounds, or inclusion/exclusion criteria. This directly affects the reliability of takeaways (i) and (ii), because the open-problem taxonomy and activity rankings depend on the surveyed corpus being unbiased and complete.
minor comments (1)
- [Abstract] Abstract: Acronyms CLF and CBF are introduced without expansion on first use, which reduces accessibility for readers outside the immediate subfield.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. The two major comments identify important gaps in how the trends and corpus are documented. We address each point below and commit to revisions that will make the claims more transparent and the review more reproducible.
read point-by-point responses
-
Referee: [Abstract and §1] Abstract and §1: The central claims of a 'decisive shift' from model-based to model-free since 2017 and combined CLF-CBF as the 'most active sub-area post-2022' are load-bearing for takeaway (i) yet rest on an unquantified manual categorization. No table, figure, or count of papers by year and category is referenced to support the trend statements or to allow readers to check for counter-examples in model-based or high-dimensional work.
Authors: We accept that the trend statements would be stronger if supported by explicit counts. Our categorization was performed by systematically reading and classifying papers according to the three classes (model-based, model-free, combined) and publication year, but this process was not visualized. In the revised manuscript we will add a new figure (or table) in Section 1 that reports the number of papers per year and per category. The figure will also indicate the subset of high-dimensional or partially observable examples, allowing readers to directly evaluate the claimed shift and the post-2022 activity in combined methods. revision: yes
-
Referee: [§2] §2 (or equivalent methods section): The review provides no description of the literature search protocol, including databases, keywords, time bounds, or inclusion/exclusion criteria. This directly affects the reliability of takeaways (i) and (ii), because the open-problem taxonomy and activity rankings depend on the surveyed corpus being unbiased and complete.
Authors: We agree that an explicit search protocol is necessary for a literature review. The current manuscript does not contain one. We will insert a new subsection (tentatively placed at the beginning of Section 2) that states the databases queried (arXiv, Google Scholar, IEEE Xplore, ScienceDirect), the keyword combinations and Boolean strings employed, the date range covered, and the inclusion/exclusion rules applied. This addition will document how the corpus was assembled and thereby support the representativeness of the identified trends and open problems. revision: yes
Circularity Check
No circularity: literature review without derivations or self-referential constructions
full rationale
This is a survey paper that classifies and summarizes existing literature on safe RL with Lyapunov and barrier functions. It contains no new equations, fitted parameters, predictions, or derivations that could reduce to the paper's own inputs by construction. The three takeaways are descriptive statements about trends in the surveyed corpus; while the representativeness of the manual sample is a validity concern for the claims, it does not create any of the enumerated circularity patterns (self-definitional, fitted-input-as-prediction, load-bearing self-citation, etc.). No load-bearing steps rely on the authors' prior work as an unverified uniqueness theorem or ansatz. The paper is self-contained as a review and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Deep reinforcement learning: A brief survey,
K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep reinforcement learning: A brief survey,”IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 26–38, 2017
work page 2017
-
[2]
How to train your robot with deep reinforcement learning: lessons we have learned,
J. Ibarz, J. Tan, C. Finn, M. Kalakrishnan, P. Pastor, and S. Levine, “How to train your robot with deep reinforcement learning: lessons we have learned,” The International Journal of Robotics Research, vol. 40, no. 4-5, pp. 698–721, 2021
work page 2021
-
[3]
Distributed reinforcement learning for robot teams: A review,
Y . Wang, M. Damani, P. Wang, Y . Cao, and G. Sartoretti, “Distributed reinforcement learning for robot teams: A review,” Current Robotics Reports, vol. 3, no. 4, pp. 239–257, 2022
work page 2022
-
[4]
i-sim2real: Reinforcement learning of robotic policies in tight human- robot interaction loops,
S. W. Abeyruwan, L. Graesser, D. B. D’Ambrosio, A. Singh, A. Shankar, A. Bewley, D. Jain, K. M. Choromanski, and P. R. Sanketi, “i-sim2real: Reinforcement learning of robotic policies in tight human- robot interaction loops,” in Conference on Robot Learning . PMLR, 2023, pp. 212–224. 16 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO....
work page 2023
-
[5]
Y . Liu, H. Xu, D. Liu, and L. Wang, “A digital twin-based sim-to- real transfer for deep reinforcement learning-enabled industrial robot grasping,” Robotics and Computer-Integrated Manufacturing , vol. 78, p. 102365, 2022
work page 2022
-
[6]
Deep reinforcement learning for humanoid robot behaviors,
A. F. Muzio, M. R. Maximo, and T. Yoneyama, “Deep reinforcement learning for humanoid robot behaviors,” Journal of Intelligent & Robotic Systems, vol. 105, no. 1, p. 12, 2022
work page 2022
-
[7]
Tun- ing computer vision models with task rewards,
A. S. Pinto, A. Kolesnikov, Y . Shi, L. Beyer, and X. Zhai, “Tun- ing computer vision models with task rewards,” arXiv preprint arXiv:2302.08242, 2023
-
[8]
Deep reinforcement learning in computer vision: a comprehensive survey,
N. Le, V . S. Rathour, K. Yamazaki, K. Luu, and M. Savvides, “Deep reinforcement learning in computer vision: a comprehensive survey,” Artificial Intelligence Review, pp. 1–87, 2022
work page 2022
-
[9]
Evaluating vision transformer methods for deep reinforcement learning from pixels,
T. Tao, D. Reda, and M. van de Panne, “Evaluating vision transformer methods for deep reinforcement learning from pixels,” arXiv preprint arXiv:2204.04905, 2022
-
[10]
Cyber-security and reinforce- ment learning—a brief survey,
A. M. K. Adawadkar and N. Kulkarni, “Cyber-security and reinforce- ment learning—a brief survey,” Engineering Applications of Artificial Intelligence, vol. 114, p. 105116, 2022
work page 2022
-
[11]
Knowledge guided two-player reinforcement learning for cyber at- tacks and defenses,
A. Piplai, M. Anoruo, K. Fasaye, A. Joshi, T. Finin, and A. Ridley, “Knowledge guided two-player reinforcement learning for cyber at- tacks and defenses,” in 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 2022, pp. 1342– 1349
work page 2022
-
[12]
Cascaded reinforcement learning agents for large action spaces in autonomous penetration testing,
K. Tran, M. Standen, J. Kim, D. Bowman, T. Richer, A. Akella, and C.-T. Lin, “Cascaded reinforcement learning agents for large action spaces in autonomous penetration testing,” Applied Sciences , vol. 12, no. 21, p. 11265, 2022
work page 2022
-
[13]
Reinforcement learning for feedback-enabled cyber resilience,
Y . Huang, L. Huang, and Q. Zhu, “Reinforcement learning for feedback-enabled cyber resilience,” Annual reviews in control, vol. 53, pp. 273–295, 2022
work page 2022
-
[14]
A. H. Ganesh and B. Xu, “A review of reinforcement learning based energy management systems for electrified powertrains: Progress, challenge, and potential solution,” Renewable and Sustainable Energy Reviews, vol. 154, p. 111833, 2022
work page 2022
-
[15]
Energy man- agement for hybrid electric vehicles based on imitation reinforcement learning,
Y . Liu, Y . Wu, X. Wang, L. Li, Y . Zhang, and Z. Chen, “Energy man- agement for hybrid electric vehicles based on imitation reinforcement learning,” Energy, vol. 263, p. 125890, 2023
work page 2023
-
[16]
D. Yang, L. Wang, K. Yu, and J. Liang, “A reinforcement learning- based energy management strategy for fuel cell hybrid vehicle consid- ering real-time velocity prediction,” Energy Conversion and Manage- ment, vol. 274, p. 116453, 2022
work page 2022
-
[17]
Economic energy dis- patch of microgrid using deeplstm-based deep reinforcement learning,
D. S. Kushwaha, Z. Biron, and R. Abdollahi, “Economic energy dis- patch of microgrid using deeplstm-based deep reinforcement learning,” in 2022 IEEE Power & Energy Society General Meeting (PESGM) . IEEE, 2022, pp. 1–5
work page 2022
-
[18]
Supervised and reinforce- ment learning from observations in reconnaissance blind chess,
T. Bertram, J. F ¨urnkranz, and M. M ¨uller, “Supervised and reinforce- ment learning from observations in reconnaissance blind chess,” in 2022 IEEE Conference on Games (CoG) . IEEE, 2022, pp. 608–611
work page 2022
-
[19]
A data-efficient method of deep reinforcement learning for chinese chess,
C. Xu, H. Ding, X. Zhang, C. Wang, and H. Yang, “A data-efficient method of deep reinforcement learning for chinese chess,” in 2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C) . IEEE, 2022, pp. 1–8
work page 2022
-
[20]
P. Hammersborg and I. Str ¨umke, “Reinforcement learning in an adapt- able chess environment for detecting human-understandable concepts,” arXiv preprint arXiv:2211.05500 , 2022
-
[21]
Reinforcement learning agents providing advice in complex video games,
M. E. Taylor, N. Carboni, A. Fachantidis, I. Vlahavas, and L. Torrey, “Reinforcement learning agents providing advice in complex video games,” Connection Science, vol. 26, no. 1, pp. 45–63, 2014
work page 2014
-
[22]
Playing Atari with Deep Reinforcement Learning
V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602 , 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[23]
Model-based reinforcement learning for atari,
L. Kaiser, M. Babaeizadeh, P. Milos, B. Osinski, R. H. Camp- bell, K. Czechowski, D. Erhan, C. Finn, P. Kozakowski, S. Levine et al. , “Model-based reinforcement learning for atari,” arXiv preprint arXiv:1903.00374, 2019
-
[24]
Deep reinforcement learning that matters,
P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger, “Deep reinforcement learning that matters,” in Proceedings of the AAAI conference on artificial intelligence , vol. 32, 2018
work page 2018
-
[25]
Reinforcement learning with guarantees: a review,
P. Osinenko, D. Dobriborsci, and W. Aumer, “Reinforcement learning with guarantees: a review,” IFAC-PapersOnLine, vol. 55, no. 15, pp. 123–128, 2022
work page 2022
-
[26]
R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, 2019, pp. 3387–3395
work page 2019
-
[27]
Safe reinforcement learning via confidence-based filters,
S. Curi, A. Lederer, S. Hirche, and A. Krause, “Safe reinforcement learning via confidence-based filters,” in 2022 IEEE 61st Conference on Decision and Control (CDC) . IEEE, 2022, pp. 3409–3415
work page 2022
-
[28]
Safe multi-agent motion planning via filtered reinforcement learning,
A. P. Vinod, S. Safaoui, A. Chakrabarty, R. Quirynen, N. Yoshikawa, and S. Di Cairano, “Safe multi-agent motion planning via filtered reinforcement learning,” in 2022 International Conference on Robotics and Automation (ICRA) . IEEE, 2022, pp. 7270–7276
work page 2022
-
[29]
DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback
R. Arakawa, S. Kobayashi, Y . Unno, Y . Tsuboi, and S.-i. Maeda, “Dqn- tamer: Human-in-the-loop reinforcement learning with intractable feed- back,” arXiv preprint arXiv:1810.11748 , 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[30]
J. Wu, Z. Huang, Z. Hu, and C. Lv, “Toward human-in-the-loop ai: Enhancing deep reinforcement learning via real-time human guidance for autonomous driving,” Engineering, vol. 21, pp. 75–91, 2023
work page 2023
-
[31]
Human-in-the-loop rein- forcement learning in continuous-action space,
B. Luo, Z. Wu, F. Zhou, and B.-C. Wang, “Human-in-the-loop rein- forcement learning in continuous-action space,” IEEE Transactions on Neural Networks and Learning Systems , 2023
work page 2023
-
[32]
Safe multi-agent reinforcement learning via shielding,
I. ElSayed-Aly, S. Bharadwaj, C. Amato, R. Ehlers, U. Topcu, and L. Feng, “Safe multi-agent reinforcement learning via shielding,” arXiv preprint arXiv:2101.11196, 2021
-
[33]
Safe reinforcement learning via shielding,
M. Alshiekh, R. Bloem, R. Ehlers, B. K ¨onighofer, S. Niekum, and U. Topcu, “Safe reinforcement learning via shielding,” in Proceedings of the AAAI conference on artificial intelligence , vol. 32, 2018
work page 2018
-
[34]
A. Norouzi, H. Heidarifar, H. Borhan, M. Shahbakhti, and C. R. Koch, “Integrating machine learning and model predictive control for automotive applications: A review and future directions,” Engineering Applications of Artificial Intelligence , vol. 120, p. 105878, 2023
work page 2023
-
[35]
Safe reinforcement learning using robust mpc,
M. Zanon and S. Gros, “Safe reinforcement learning using robust mpc,” IEEE Transactions on Automatic Control , vol. 66, no. 8, pp. 3638– 3652, 2020
work page 2020
-
[36]
Bridging the gap between qp-based and mpc- based rl,
S. Sawant and S. Gros, “Bridging the gap between qp-based and mpc- based rl,” arXiv preprint arXiv:2205.08856 , 2022
-
[37]
E. D. Sontag, “Control-lyapunov functions,” in Open problems in mathematical systems and control theory . Springer, 1999, pp. 211– 216
work page 1999
-
[38]
Control barrier functions: Theory and applications,
A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control barrier functions: Theory and applications,” in 2019 18th European control conference (ECC) . IEEE, 2019, pp. 3420–3431
work page 2019
-
[39]
A comprehensive survey on safe rein- forcement learning,
J. Garcıa and F. Fern ´andez, “A comprehensive survey on safe rein- forcement learning,” Journal of Machine Learning Research , vol. 16, no. 1, pp. 1437–1480, 2015
work page 2015
-
[40]
Policy learning with constraints in model-free reinforcement learning: A survey,
Y . Liu, A. Halev, and X. Liu, “Policy learning with constraints in model-free reinforcement learning: A survey,” inThe 30th International Joint Conference on Artificial Intelligence (IJCAI) , 2021
work page 2021
-
[41]
Safe learning in robotics: From learning-based control to safe reinforcement learning,
L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig, “Safe learning in robotics: From learning-based control to safe reinforcement learning,” Annual Review of Control, Robotics, and Autonomous Systems , vol. 5, pp. 411–444, 2022
work page 2022
-
[42]
A review of safe reinforcement learning: Methods, theory and applications,
S. Gu, L. Yang, Y . Du, G. Chen, F. Walter, J. Wang, Y . Yang, and A. Knoll, “A review of safe reinforcement learning: Methods, theory and applications,” arXiv preprint arXiv:2205.10330 , 2022
-
[43]
Safe learning for control using control lyapunov functions and control barrier functions: A review,
A. Anand, K. Seel, V . Gjærum, A. H ˚akansson, H. Robinson, and A. Saad, “Safe learning for control using control lyapunov functions and control barrier functions: A review,” Procedia Computer Science , vol. 192, pp. 3987–3997, 2021
work page 2021
-
[44]
C. Dawson, S. Gao, and C. Fan, “Safe control with learned certificates: A survey of neural lyapunov, barrier, and contraction methods,” arXiv preprint arXiv:2202.11762, 2022
-
[45]
Meyn, Control systems and reinforcement learning
S. Meyn, Control systems and reinforcement learning . Cambridge University Press, 2022
work page 2022
-
[46]
Altman, Constrained Markov decision processes
E. Altman, Constrained Markov decision processes. Routledge, 2021
work page 2021
-
[47]
Khalil, Nonlinear Systems , ser
H. Khalil, Nonlinear Systems , ser. Pearson Education. Prentice Hall, 2002. [Online]. Available: https://books.google.com/books?id=t d1QgAACAAJ
work page 2002
-
[48]
A ‘universal’construction of artstein’s theorem on nonlinear stabilization,
E. D. Sontag, “A ‘universal’construction of artstein’s theorem on nonlinear stabilization,” Systems & control letters , vol. 13, no. 2, pp. 117–123, 1989
work page 1989
-
[49]
Modified barrier functions (theory and methods),
R. Polyak, “Modified barrier functions (theory and methods),” Mathe- matical programming, vol. 54, pp. 177–222, 1992
work page 1992
-
[50]
Control barrier function based quadratic programs for safety critical systems,
A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,” IEEE Transactions on Automatic Control , vol. 62, no. 8, pp. 3861–3876, 2016
work page 2016
-
[51]
Lyapunov design for safe reinforcement learning,
T. J. Perkins and A. G. Barto, “Lyapunov design for safe reinforcement learning,” Journal of Machine Learning Research , vol. 3, no. Dec, pp. 803–832, 2002. KUSHW AHAet al.: A REVIEW ON SAFE REINFORCEMENT LEARNING USING LAYPUNOV AND BARRIER FUNCTIONS 17
work page 2002
-
[52]
Theory and development of higher-order cmac neural networks,
S. H. Lane, D. A. Handelman, and J. J. Gelfand, “Theory and development of higher-order cmac neural networks,” IEEE Control Systems Magazine, vol. 12, no. 2, pp. 23–30, 1992
work page 1992
-
[53]
K. G. Vamvoudakis, M. F. Miranda, and J. P. Hespanha, “Asymp- totically stable adaptive–optimal control algorithm with saturating actuators and relaxed persistence of excitation,” IEEE transactions on neural networks and learning systems , vol. 27, no. 11, pp. 2386–2398, 2015
work page 2015
-
[54]
Model-based rein- forcement learning for approximate optimal regulation,
R. Kamalapurkar, P. Walters, and W. E. Dixon, “Model-based rein- forcement learning for approximate optimal regulation,” Automatica, vol. 64, pp. 94–104, 2016
work page 2016
-
[55]
Decomposing control lya- punov functions for efficient reinforcement learning,
A. Lopez and D. Fridovich-Keil, “Decomposing control lya- punov functions for efficient reinforcement learning,” arXiv preprint arXiv:2403.12210, 2024
-
[56]
R. Kumar, S. Srivastava, and J. Gupta, “Diagonal recurrent neural network based adaptive control of nonlinear dynamical systems using lyapunov stability criterion,” ISA transactions , vol. 67, pp. 407–427, 2017
work page 2017
-
[57]
Safe model-based reinforcement learning with stability guarantees,
F. Berkenkamp, M. Turchetta, A. Schoellig, and A. Krause, “Safe model-based reinforcement learning with stability guarantees,” Ad- vances in neural information processing systems , vol. 30, 2017
work page 2017
-
[58]
Actor-critic physics-informed neural lya- punov control,
J. Wang and M. Fazlyab, “Actor-critic physics-informed neural lya- punov control,” arXiv preprint arXiv:2403.08448 , 2024
-
[59]
Methods of am lyapunov and their application,
V . I. Zubov, “Methods of am lyapunov and their application,”(No Title), 1964
work page 1964
-
[60]
Lyapunov-based reinforce- ment learning using koopman operators for automated vehicle parking,
D. S. Kushwaha, M. Hu, and Z. A. Biron, “Lyapunov-based reinforce- ment learning using koopman operators for automated vehicle parking,” IFAC-PapersOnLine, vol. 58, no. 28, pp. 84–89, 2024
work page 2024
-
[61]
Neural lyapunov function approximation with self-supervised reinforcement learning,
L. McCutcheon, B. Gharesifard, and S. Fallah, “Neural lyapunov function approximation with self-supervised reinforcement learning,” arXiv preprint arXiv:2503.15629 , 2025
-
[62]
Lyapunov-based safe reinforcement learning for microgrid energy management,
G. Hao, Y . Li, Y . Li, L. Jiang, and Z. Zeng, “Lyapunov-based safe reinforcement learning for microgrid energy management,” IEEE transactions on neural networks and learning systems , 2024
work page 2024
-
[63]
B. Hejase and U. Ozguner, “Lyapunov stability regulation of deep reinforcement learning control with application to automated driving,” in 2023 American Control Conference (ACC) , 2023, pp. 4437–4442
work page 2023
-
[64]
Finite time lyapunov exponent analysis of model predictive control and reinforcement learning,
K. Krishna, S. L. Brunton, and Z. Song, “Finite time lyapunov exponent analysis of model predictive control and reinforcement learning,” IEEE Access, 2023
work page 2023
-
[65]
A. S. Kumar, L. Zhao, and X. Fernando, “Task offloading and resource allocation in vehicular networks: A lyapunov-based deep reinforce- ment learning approach,” IEEE Transactions on Vehicular Technology, vol. 72, no. 10, pp. 13 360–13 373, 2023
work page 2023
-
[66]
Lyapunov-based distributed reinforcement learning control with stability guarantee,
J. Yao, M. Han, and X. Yin, “Lyapunov-based distributed reinforcement learning control with stability guarantee,” Computers & Chemical Engineering, vol. 195, p. 108979, 2025
work page 2025
-
[67]
Stabilizing neural control using self-learned almost lyapunov critics,
Y .-C. Chang and S. Gao, “Stabilizing neural control using self-learned almost lyapunov critics,” in 2021 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2021, pp. 1803–1809
work page 2021
-
[68]
Lyapunov design for robust and efficient robotic reinforcement learn- ing,
T. Westenbroek, F. Castaneda, A. Agrawal, S. Sastry, and K. Sreenath, “Lyapunov design for robust and efficient robotic reinforcement learn- ing,” arXiv preprint arXiv:2208.06721 , 2022
-
[69]
S. Huh and I. Yang, “Safe reinforcement learning for probabilistic reachability and safety specifications: A lyapunov-based approach,” arXiv preprint arXiv:2002.10126 , 2020
-
[70]
Lyapunov- based uncertainty-aware safe reinforcement learning,
A. B. Jeddi, N. L. Dehghani, and A. Shafieezadeh, “Lyapunov- based uncertainty-aware safe reinforcement learning,” arXiv preprint arXiv:2107.13944, 2021
-
[71]
Learning min-norm stabilizing control laws for systems with unknown dynamics,
T. Westenbroek, F. Casta ˜neda, A. Agrawal, S. S. Sastry, and K. Sreenath, “Learning min-norm stabilizing control laws for systems with unknown dynamics,” in 2020 59th IEEE Conference on Decision and Control (CDC) . IEEE, 2020, pp. 737–744
work page 2020
-
[72]
Certifying stability of reinforce- ment learning policies using generalized lyapunov functions,
K. Long, J. Cort ´es, and N. Atanasov, “Certifying stability of reinforce- ment learning policies using generalized lyapunov functions,” arXiv preprint arXiv:2505.10947, 2025
-
[73]
Z. Cao, R. Wang, X. Zhou, and Y . Wen, “Toward model-assisted safe reinforcement learning for data center cooling control: A lyapunov- based approach,” in Proceedings of the 14th ACM International Con- ference on Future Energy Systems , 2023, pp. 333–346
work page 2023
-
[74]
Stable inverse reinforcement learning: Policies from control lyapunov landscapes,
S. Tesfazgi, L. Sprandl, A. Lederer, and S. Hirche, “Stable inverse reinforcement learning: Policies from control lyapunov landscapes,” arXiv preprint arXiv:2405.08756 , 2024
-
[75]
A lyapunov-based approach to safe reinforcement learning,
Y . Chow, O. Nachum, E. Duenez-Guzman, and M. Ghavamzadeh, “A lyapunov-based approach to safe reinforcement learning,” Advances in neural information processing systems , vol. 31, 2018
work page 2018
-
[76]
Lyapunov-based Safe Policy Optimization for Continuous Control
Y . Chow, O. Nachum, A. Faust, E. Duenez-Guzman, and M. Ghavamzadeh, “Lyapunov-based safe policy optimization for con- tinuous control,” arXiv preprint arXiv:1901.10031 , 2019
work page internal anchor Pith review Pith/arXiv arXiv 1901
-
[77]
Principled reward shaping for reinforcement learning via lyapunov stability theory,
Y . Dong, X. Tang, and Y . Yuan, “Principled reward shaping for reinforcement learning via lyapunov stability theory,” Neurocomputing, vol. 393, pp. 83–90, 2020
work page 2020
-
[78]
Lyapunov-inspired deep reinforcement learning for robot navigation in obstacle environments,
H. I. Ugurlu, A. Redder, and E. Kayacan, “Lyapunov-inspired deep reinforcement learning for robot navigation in obstacle environments,” in 2025 IEEE Symposium on Computational Intelligence on Engineer- ing/Cyber Physical Systems (CIES) . IEEE, 2025, pp. 1–8
work page 2025
-
[79]
Estimating lyapunov region of attraction for robust model-based reinforcement learning usv,
L. Xia, Y . Cui, Z. Yi, H. Li, and X. Wu, “Estimating lyapunov region of attraction for robust model-based reinforcement learning usv,” IEEE Transactions on Automation Science and Engineering , 2024
work page 2024
-
[80]
Policy invariance under reward transformations: Theory and application to reward shaping,
A. Y . Ng, D. Harada, and S. Russell, “Policy invariance under reward transformations: Theory and application to reward shaping,” in Icml, vol. 99. Citeseer, 1999, pp. 278–287
work page 1999
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.