A Review On Safe Reinforcement Learning Using Lyapunov and Barrier Functions

arxiv: 2508.09128 · v4 · submitted 2025-08-12 · 📡 eess.SY · cs.SY

A Review On Safe Reinforcement Learning Using Lyapunov and Barrier Functions

Dhruv Singh Kushwaha , Zoleikha Abdollahi Biron This is my paper

Pith reviewed 2026-05-18 22:34 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords safe reinforcement learningLyapunov functionscontrol barrier functionsmodel-free RLstability and constraint satisfactionCLF-CBF methodssafety guaranteesopen problems in safe RL

0 comments p. Extension

The pith

Safe RL has shifted from model-based to model-free formulations since 2017, with combined Lyapunov-barrier methods now most active, while high-dimensional deployment remains the chief barrier.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The review surveys techniques that enforce safety in reinforcement learning by using Lyapunov functions to ensure stability of learned policies and barrier functions to prevent constraint violations during both training and deployment. It tracks the literature to identify a clear transition away from model-based methods toward model-free ones, with combined control Lyapunov function and control barrier function approaches emerging as the busiest area after 2022. The analysis defines distinct open problems for each class of methods and points to scalability in high-dimensional or partially observable settings as the shared obstacle. Readers interested in reliable decision-making systems would find this mapping useful for directing efforts toward practical safety guarantees in complex dynamical environments.

Core claim

The field has shifted decisively from model-based to model-free formulations since 2017, with combined CLF-CBF approaches becoming the most active sub-area post-2022. Per-class open problems are now well-defined: certificate validity under function approximation and distribution shift for Lyapunov methods, feasibility and deadlock under hard CBF-QP shielding for barrier methods, and joint CLF-CBF feasibility under model uncertainty for combined methods. Deployment to high-dimensional and partially observable settings remains the dominant scalability barrier across all three classes.

What carries the argument

A three-way classification of safe RL methods into Lyapunov-based approaches for stability certificates, barrier-based approaches for constraint enforcement, and their combinations, used to chart the move from model-based to model-free implementations and to isolate per-class open problems.

If this is right

Lyapunov-based methods will need new ways to maintain certificate validity when function approximators are used or when the data distribution changes.
Barrier-based methods must resolve issues of infeasibility and deadlock that arise from hard quadratic-program shielding.
Combined CLF-CBF approaches require techniques that preserve joint feasibility when the system model is uncertain.
All classes will require advances before reliable deployment becomes feasible in high-dimensional or partially observable environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The identified open problems suggest that hybrid learning-control architectures could become a natural next step for closing the gap between theoretical guarantees and practical performance.
The scalability barrier implies that dimensionality-reduction or hierarchical safety layers may be needed before these methods reach real-world high-dimensional tasks.
The documented shift toward model-free combined methods could encourage benchmark suites that directly compare the three classes on shared safety metrics.

Load-bearing premise

The claimed trends and open problems depend on the surveyed papers forming a representative sample of the literature and on the manual categorization correctly reflecting dominant directions without major omissions.

What would settle it

A broad search that locates a large body of recent papers showing continued dominance of model-based methods or that uncovers major open problems absent from the review's per-class list would contradict the reported shifts and classification.

Figures

Figures reproduced from arXiv: 2508.09128 by Dhruv Singh Kushwaha, Zoleikha Abdollahi Biron.

**Figure 1.** Figure 1: Safe set (Xs) satisfying constraints, unsafe set (Xu) and state space (X ). Xf denotes the largest feasible safe set. Thus, in such a scenario, system trajectories (Tr) must remain in the safe set Xs and furthermore, avoid exploration in Xu. set’s forward invariance, safety can be ensured. That is, for every any time t (t ≥ 0), all trajectories beginning in the set of safe states will remain there. Formall… view at source ↗

**Figure 2.** Figure 2: Lyapunov and barrier functions utility in control theory [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 4.** Figure 4: Classification of methods using Lyapunov functions in [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Classification of methods using Barrier functions in [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Classification of methods using Lyapunov and Barrier [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Environments: Gridworlds and Safe Gym the following methods:(i) Solving a constrained optimization problem with Lyapunov and barrier constraints; (ii) using Lyapunov-barrier terms in the critic/actor loss function; (iii) Lyapunov-barrier-based reward shaping. It remains an open question for the feasibility and complexity of solving a constrained optimization problem at each iteration whilst optimizing th… view at source ↗

**Figure 8.** Figure 8: Safe Policy Optimization environment architecture [ [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Safe-Control-Gym: adding constraints and injecting [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

read the original abstract

Reinforcement learning (RL) has proven to be particularly effective in solving complex decision-making problems for a wide range of applications. Safe reinforcement learning refers to a class of constrained problems where the constraint violations lead to partial or complete system failure. The goal of this review is to provide an overview of safe RL techniques using Lyapunov and barrier functions to guarantee this notion of safety (stability of the system in terms of a computed policy and constraint satisfaction during training and deployment). Three concrete takeaways emerge from our analysis: (i) the field has shifted decisively from model-based to model-free formulations since 2017, with combined CLF-CBF approaches becoming the most active sub-area post-2022; (ii) per-class open problems are now well-defined, certificate validity under function approximation and distribution shift for Lyapunov methods, feasibility and deadlock under hard CBF-QP shielding for barrier methods, and joint CLF--CBF feasibility under model uncertainty for combined methods; and (iii) deployment to high-dimensional and partially observable settings remains the dominant scalability barrier across all three classes. The different approaches employed are discussed in detail along with their shortcomings and benefits to provide critique and possible future research directions. The review demonstrates promising scope for providing safety guarantees for complex dynamical systems with operational constraints using model-based and model-free RL.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript is a review of safe reinforcement learning techniques that employ Lyapunov (CLF) and barrier (CBF) functions to provide guarantees on stability and constraint satisfaction during training and deployment. It analyzes the literature to extract three takeaways: a shift from model-based to model-free formulations since 2017 with combined CLF-CBF methods becoming most active post-2022; well-defined per-class open problems (certificate validity under approximation for Lyapunov, feasibility/deadlock for barrier, joint feasibility under uncertainty for combined); and scalability barriers in high-dimensional and partially observable settings. The review discusses approaches, benefits, shortcomings, and future directions.

Significance. If the underlying literature sample is representative, the review would offer a useful synthesis by distilling trends, articulating concrete open problems per method class, and critiquing shortcomings of existing CLF, CBF, and hybrid approaches. Explicitly naming per-class challenges and scalability issues could help focus research on deployment-relevant questions in constrained dynamical systems.

major comments (2)

[Abstract and §1] Abstract and §1: The central claims of a 'decisive shift' from model-based to model-free since 2017 and combined CLF-CBF as the 'most active sub-area post-2022' are load-bearing for takeaway (i) yet rest on an unquantified manual categorization. No table, figure, or count of papers by year and category is referenced to support the trend statements or to allow readers to check for counter-examples in model-based or high-dimensional work.
[§2] §2 (or equivalent methods section): The review provides no description of the literature search protocol, including databases, keywords, time bounds, or inclusion/exclusion criteria. This directly affects the reliability of takeaways (i) and (ii), because the open-problem taxonomy and activity rankings depend on the surveyed corpus being unbiased and complete.

minor comments (1)

[Abstract] Abstract: Acronyms CLF and CBF are introduced without expansion on first use, which reduces accessibility for readers outside the immediate subfield.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The two major comments identify important gaps in how the trends and corpus are documented. We address each point below and commit to revisions that will make the claims more transparent and the review more reproducible.

read point-by-point responses

Referee: [Abstract and §1] Abstract and §1: The central claims of a 'decisive shift' from model-based to model-free since 2017 and combined CLF-CBF as the 'most active sub-area post-2022' are load-bearing for takeaway (i) yet rest on an unquantified manual categorization. No table, figure, or count of papers by year and category is referenced to support the trend statements or to allow readers to check for counter-examples in model-based or high-dimensional work.

Authors: We accept that the trend statements would be stronger if supported by explicit counts. Our categorization was performed by systematically reading and classifying papers according to the three classes (model-based, model-free, combined) and publication year, but this process was not visualized. In the revised manuscript we will add a new figure (or table) in Section 1 that reports the number of papers per year and per category. The figure will also indicate the subset of high-dimensional or partially observable examples, allowing readers to directly evaluate the claimed shift and the post-2022 activity in combined methods. revision: yes
Referee: [§2] §2 (or equivalent methods section): The review provides no description of the literature search protocol, including databases, keywords, time bounds, or inclusion/exclusion criteria. This directly affects the reliability of takeaways (i) and (ii), because the open-problem taxonomy and activity rankings depend on the surveyed corpus being unbiased and complete.

Authors: We agree that an explicit search protocol is necessary for a literature review. The current manuscript does not contain one. We will insert a new subsection (tentatively placed at the beginning of Section 2) that states the databases queried (arXiv, Google Scholar, IEEE Xplore, ScienceDirect), the keyword combinations and Boolean strings employed, the date range covered, and the inclusion/exclusion rules applied. This addition will document how the corpus was assembled and thereby support the representativeness of the identified trends and open problems. revision: yes

Circularity Check

0 steps flagged

No circularity: literature review without derivations or self-referential constructions

full rationale

This is a survey paper that classifies and summarizes existing literature on safe RL with Lyapunov and barrier functions. It contains no new equations, fitted parameters, predictions, or derivations that could reduce to the paper's own inputs by construction. The three takeaways are descriptive statements about trends in the surveyed corpus; while the representativeness of the manual sample is a validity concern for the claims, it does not create any of the enumerated circularity patterns (self-definitional, fitted-input-as-prediction, load-bearing self-citation, etc.). No load-bearing steps rely on the authors' prior work as an unverified uniqueness theorem or ansatz. The paper is self-contained as a review and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a literature review the paper introduces no new free parameters, axioms, or invented entities; all content is drawn from previously published work.

pith-pipeline@v0.9.0 · 5768 in / 1106 out tokens · 28262 ms · 2026-05-18T22:34:10.172723+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

146 extracted references · 146 canonical work pages · 8 internal anchors

[1]

Deep reinforcement learning: A brief survey,

K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep reinforcement learning: A brief survey,”IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 26–38, 2017

work page 2017
[2]

How to train your robot with deep reinforcement learning: lessons we have learned,

J. Ibarz, J. Tan, C. Finn, M. Kalakrishnan, P. Pastor, and S. Levine, “How to train your robot with deep reinforcement learning: lessons we have learned,” The International Journal of Robotics Research, vol. 40, no. 4-5, pp. 698–721, 2021

work page 2021
[3]

Distributed reinforcement learning for robot teams: A review,

Y . Wang, M. Damani, P. Wang, Y . Cao, and G. Sartoretti, “Distributed reinforcement learning for robot teams: A review,” Current Robotics Reports, vol. 3, no. 4, pp. 239–257, 2022

work page 2022
[4]

i-sim2real: Reinforcement learning of robotic policies in tight human- robot interaction loops,

S. W. Abeyruwan, L. Graesser, D. B. D’Ambrosio, A. Singh, A. Shankar, A. Bewley, D. Jain, K. M. Choromanski, and P. R. Sanketi, “i-sim2real: Reinforcement learning of robotic policies in tight human- robot interaction loops,” in Conference on Robot Learning . PMLR, 2023, pp. 212–224. 16 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO....

work page 2023
[5]

A digital twin-based sim-to- real transfer for deep reinforcement learning-enabled industrial robot grasping,

Y . Liu, H. Xu, D. Liu, and L. Wang, “A digital twin-based sim-to- real transfer for deep reinforcement learning-enabled industrial robot grasping,” Robotics and Computer-Integrated Manufacturing , vol. 78, p. 102365, 2022

work page 2022
[6]

Deep reinforcement learning for humanoid robot behaviors,

A. F. Muzio, M. R. Maximo, and T. Yoneyama, “Deep reinforcement learning for humanoid robot behaviors,” Journal of Intelligent & Robotic Systems, vol. 105, no. 1, p. 12, 2022

work page 2022
[7]

Tun- ing computer vision models with task rewards,

A. S. Pinto, A. Kolesnikov, Y . Shi, L. Beyer, and X. Zhai, “Tun- ing computer vision models with task rewards,” arXiv preprint arXiv:2302.08242, 2023

work page arXiv 2023
[8]

Deep reinforcement learning in computer vision: a comprehensive survey,

N. Le, V . S. Rathour, K. Yamazaki, K. Luu, and M. Savvides, “Deep reinforcement learning in computer vision: a comprehensive survey,” Artificial Intelligence Review, pp. 1–87, 2022

work page 2022
[9]

Evaluating vision transformer methods for deep reinforcement learning from pixels,

T. Tao, D. Reda, and M. van de Panne, “Evaluating vision transformer methods for deep reinforcement learning from pixels,” arXiv preprint arXiv:2204.04905, 2022

work page arXiv 2022
[10]

Cyber-security and reinforce- ment learning—a brief survey,

A. M. K. Adawadkar and N. Kulkarni, “Cyber-security and reinforce- ment learning—a brief survey,” Engineering Applications of Artificial Intelligence, vol. 114, p. 105116, 2022

work page 2022
[11]

Knowledge guided two-player reinforcement learning for cyber at- tacks and defenses,

A. Piplai, M. Anoruo, K. Fasaye, A. Joshi, T. Finin, and A. Ridley, “Knowledge guided two-player reinforcement learning for cyber at- tacks and defenses,” in 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 2022, pp. 1342– 1349

work page 2022
[12]

Cascaded reinforcement learning agents for large action spaces in autonomous penetration testing,

K. Tran, M. Standen, J. Kim, D. Bowman, T. Richer, A. Akella, and C.-T. Lin, “Cascaded reinforcement learning agents for large action spaces in autonomous penetration testing,” Applied Sciences , vol. 12, no. 21, p. 11265, 2022

work page 2022
[13]

Reinforcement learning for feedback-enabled cyber resilience,

Y . Huang, L. Huang, and Q. Zhu, “Reinforcement learning for feedback-enabled cyber resilience,” Annual reviews in control, vol. 53, pp. 273–295, 2022

work page 2022
[14]

A review of reinforcement learning based energy management systems for electrified powertrains: Progress, challenge, and potential solution,

A. H. Ganesh and B. Xu, “A review of reinforcement learning based energy management systems for electrified powertrains: Progress, challenge, and potential solution,” Renewable and Sustainable Energy Reviews, vol. 154, p. 111833, 2022

work page 2022
[15]

Energy man- agement for hybrid electric vehicles based on imitation reinforcement learning,

Y . Liu, Y . Wu, X. Wang, L. Li, Y . Zhang, and Z. Chen, “Energy man- agement for hybrid electric vehicles based on imitation reinforcement learning,” Energy, vol. 263, p. 125890, 2023

work page 2023
[16]

A reinforcement learning- based energy management strategy for fuel cell hybrid vehicle consid- ering real-time velocity prediction,

D. Yang, L. Wang, K. Yu, and J. Liang, “A reinforcement learning- based energy management strategy for fuel cell hybrid vehicle consid- ering real-time velocity prediction,” Energy Conversion and Manage- ment, vol. 274, p. 116453, 2022

work page 2022
[17]

Economic energy dis- patch of microgrid using deeplstm-based deep reinforcement learning,

D. S. Kushwaha, Z. Biron, and R. Abdollahi, “Economic energy dis- patch of microgrid using deeplstm-based deep reinforcement learning,” in 2022 IEEE Power & Energy Society General Meeting (PESGM) . IEEE, 2022, pp. 1–5

work page 2022
[18]

Supervised and reinforce- ment learning from observations in reconnaissance blind chess,

T. Bertram, J. F ¨urnkranz, and M. M ¨uller, “Supervised and reinforce- ment learning from observations in reconnaissance blind chess,” in 2022 IEEE Conference on Games (CoG) . IEEE, 2022, pp. 608–611

work page 2022
[19]

A data-efficient method of deep reinforcement learning for chinese chess,

C. Xu, H. Ding, X. Zhang, C. Wang, and H. Yang, “A data-efficient method of deep reinforcement learning for chinese chess,” in 2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C) . IEEE, 2022, pp. 1–8

work page 2022
[20]

Reinforcement learning in an adapt- able chess environment for detecting human-understandable concepts,

P. Hammersborg and I. Str ¨umke, “Reinforcement learning in an adapt- able chess environment for detecting human-understandable concepts,” arXiv preprint arXiv:2211.05500 , 2022

work page arXiv 2022
[21]

Reinforcement learning agents providing advice in complex video games,

M. E. Taylor, N. Carboni, A. Fachantidis, I. Vlahavas, and L. Torrey, “Reinforcement learning agents providing advice in complex video games,” Connection Science, vol. 26, no. 1, pp. 45–63, 2014

work page 2014
[22]

Playing Atari with Deep Reinforcement Learning

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013
[23]

Model-based reinforcement learning for atari,

L. Kaiser, M. Babaeizadeh, P. Milos, B. Osinski, R. H. Camp- bell, K. Czechowski, D. Erhan, C. Finn, P. Kozakowski, S. Levine et al. , “Model-based reinforcement learning for atari,” arXiv preprint arXiv:1903.00374, 2019

work page arXiv 1903
[24]

Deep reinforcement learning that matters,

P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger, “Deep reinforcement learning that matters,” in Proceedings of the AAAI conference on artificial intelligence , vol. 32, 2018

work page 2018
[25]

Reinforcement learning with guarantees: a review,

P. Osinenko, D. Dobriborsci, and W. Aumer, “Reinforcement learning with guarantees: a review,” IFAC-PapersOnLine, vol. 55, no. 15, pp. 123–128, 2022

work page 2022
[26]

End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,

R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, 2019, pp. 3387–3395

work page 2019
[27]

Safe reinforcement learning via confidence-based filters,

S. Curi, A. Lederer, S. Hirche, and A. Krause, “Safe reinforcement learning via confidence-based filters,” in 2022 IEEE 61st Conference on Decision and Control (CDC) . IEEE, 2022, pp. 3409–3415

work page 2022
[28]

Safe multi-agent motion planning via filtered reinforcement learning,

A. P. Vinod, S. Safaoui, A. Chakrabarty, R. Quirynen, N. Yoshikawa, and S. Di Cairano, “Safe multi-agent motion planning via filtered reinforcement learning,” in 2022 International Conference on Robotics and Automation (ICRA) . IEEE, 2022, pp. 7270–7276

work page 2022
[29]

DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

R. Arakawa, S. Kobayashi, Y . Unno, Y . Tsuboi, and S.-i. Maeda, “Dqn- tamer: Human-in-the-loop reinforcement learning with intractable feed- back,” arXiv preprint arXiv:1810.11748 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[30]

Toward human-in-the-loop ai: Enhancing deep reinforcement learning via real-time human guidance for autonomous driving,

J. Wu, Z. Huang, Z. Hu, and C. Lv, “Toward human-in-the-loop ai: Enhancing deep reinforcement learning via real-time human guidance for autonomous driving,” Engineering, vol. 21, pp. 75–91, 2023

work page 2023
[31]

Human-in-the-loop rein- forcement learning in continuous-action space,

B. Luo, Z. Wu, F. Zhou, and B.-C. Wang, “Human-in-the-loop rein- forcement learning in continuous-action space,” IEEE Transactions on Neural Networks and Learning Systems , 2023

work page 2023
[32]

Safe multi-agent reinforcement learning via shielding,

I. ElSayed-Aly, S. Bharadwaj, C. Amato, R. Ehlers, U. Topcu, and L. Feng, “Safe multi-agent reinforcement learning via shielding,” arXiv preprint arXiv:2101.11196, 2021

work page arXiv 2021
[33]

Safe reinforcement learning via shielding,

M. Alshiekh, R. Bloem, R. Ehlers, B. K ¨onighofer, S. Niekum, and U. Topcu, “Safe reinforcement learning via shielding,” in Proceedings of the AAAI conference on artificial intelligence , vol. 32, 2018

work page 2018
[34]

Integrating machine learning and model predictive control for automotive applications: A review and future directions,

A. Norouzi, H. Heidarifar, H. Borhan, M. Shahbakhti, and C. R. Koch, “Integrating machine learning and model predictive control for automotive applications: A review and future directions,” Engineering Applications of Artificial Intelligence , vol. 120, p. 105878, 2023

work page 2023
[35]

Safe reinforcement learning using robust mpc,

M. Zanon and S. Gros, “Safe reinforcement learning using robust mpc,” IEEE Transactions on Automatic Control , vol. 66, no. 8, pp. 3638– 3652, 2020

work page 2020
[36]

Bridging the gap between qp-based and mpc- based rl,

S. Sawant and S. Gros, “Bridging the gap between qp-based and mpc- based rl,” arXiv preprint arXiv:2205.08856 , 2022

work page arXiv 2022
[37]

Control-lyapunov functions,

E. D. Sontag, “Control-lyapunov functions,” in Open problems in mathematical systems and control theory . Springer, 1999, pp. 211– 216

work page 1999
[38]

Control barrier functions: Theory and applications,

A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control barrier functions: Theory and applications,” in 2019 18th European control conference (ECC) . IEEE, 2019, pp. 3420–3431

work page 2019
[39]

A comprehensive survey on safe rein- forcement learning,

J. Garcıa and F. Fern ´andez, “A comprehensive survey on safe rein- forcement learning,” Journal of Machine Learning Research , vol. 16, no. 1, pp. 1437–1480, 2015

work page 2015
[40]

Policy learning with constraints in model-free reinforcement learning: A survey,

Y . Liu, A. Halev, and X. Liu, “Policy learning with constraints in model-free reinforcement learning: A survey,” inThe 30th International Joint Conference on Artificial Intelligence (IJCAI) , 2021

work page 2021
[41]

Safe learning in robotics: From learning-based control to safe reinforcement learning,

L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig, “Safe learning in robotics: From learning-based control to safe reinforcement learning,” Annual Review of Control, Robotics, and Autonomous Systems , vol. 5, pp. 411–444, 2022

work page 2022
[42]

A review of safe reinforcement learning: Methods, theory and applications,

S. Gu, L. Yang, Y . Du, G. Chen, F. Walter, J. Wang, Y . Yang, and A. Knoll, “A review of safe reinforcement learning: Methods, theory and applications,” arXiv preprint arXiv:2205.10330 , 2022

work page arXiv 2022
[43]

Safe learning for control using control lyapunov functions and control barrier functions: A review,

A. Anand, K. Seel, V . Gjærum, A. H ˚akansson, H. Robinson, and A. Saad, “Safe learning for control using control lyapunov functions and control barrier functions: A review,” Procedia Computer Science , vol. 192, pp. 3987–3997, 2021

work page 2021
[44]

Safe control with learned certificates: A survey of neural lyapunov, barrier, and contraction methods,

C. Dawson, S. Gao, and C. Fan, “Safe control with learned certificates: A survey of neural lyapunov, barrier, and contraction methods,” arXiv preprint arXiv:2202.11762, 2022

work page arXiv 2022
[45]

Meyn, Control systems and reinforcement learning

S. Meyn, Control systems and reinforcement learning . Cambridge University Press, 2022

work page 2022
[46]

Altman, Constrained Markov decision processes

E. Altman, Constrained Markov decision processes. Routledge, 2021

work page 2021
[47]

Khalil, Nonlinear Systems , ser

H. Khalil, Nonlinear Systems , ser. Pearson Education. Prentice Hall, 2002. [Online]. Available: https://books.google.com/books?id=t d1QgAACAAJ

work page 2002
[48]

A ‘universal’construction of artstein’s theorem on nonlinear stabilization,

E. D. Sontag, “A ‘universal’construction of artstein’s theorem on nonlinear stabilization,” Systems & control letters , vol. 13, no. 2, pp. 117–123, 1989

work page 1989
[49]

Modified barrier functions (theory and methods),

R. Polyak, “Modified barrier functions (theory and methods),” Mathe- matical programming, vol. 54, pp. 177–222, 1992

work page 1992
[50]

Control barrier function based quadratic programs for safety critical systems,

A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,” IEEE Transactions on Automatic Control , vol. 62, no. 8, pp. 3861–3876, 2016

work page 2016
[51]

Lyapunov design for safe reinforcement learning,

T. J. Perkins and A. G. Barto, “Lyapunov design for safe reinforcement learning,” Journal of Machine Learning Research , vol. 3, no. Dec, pp. 803–832, 2002. KUSHW AHAet al.: A REVIEW ON SAFE REINFORCEMENT LEARNING USING LAYPUNOV AND BARRIER FUNCTIONS 17

work page 2002
[52]

Theory and development of higher-order cmac neural networks,

S. H. Lane, D. A. Handelman, and J. J. Gelfand, “Theory and development of higher-order cmac neural networks,” IEEE Control Systems Magazine, vol. 12, no. 2, pp. 23–30, 1992

work page 1992
[53]

Asymp- totically stable adaptive–optimal control algorithm with saturating actuators and relaxed persistence of excitation,

K. G. Vamvoudakis, M. F. Miranda, and J. P. Hespanha, “Asymp- totically stable adaptive–optimal control algorithm with saturating actuators and relaxed persistence of excitation,” IEEE transactions on neural networks and learning systems , vol. 27, no. 11, pp. 2386–2398, 2015

work page 2015
[54]

Model-based rein- forcement learning for approximate optimal regulation,

R. Kamalapurkar, P. Walters, and W. E. Dixon, “Model-based rein- forcement learning for approximate optimal regulation,” Automatica, vol. 64, pp. 94–104, 2016

work page 2016
[55]

Decomposing control lya- punov functions for efficient reinforcement learning,

A. Lopez and D. Fridovich-Keil, “Decomposing control lya- punov functions for efficient reinforcement learning,” arXiv preprint arXiv:2403.12210, 2024

work page arXiv 2024
[56]

Diagonal recurrent neural network based adaptive control of nonlinear dynamical systems using lyapunov stability criterion,

R. Kumar, S. Srivastava, and J. Gupta, “Diagonal recurrent neural network based adaptive control of nonlinear dynamical systems using lyapunov stability criterion,” ISA transactions , vol. 67, pp. 407–427, 2017

work page 2017
[57]

Safe model-based reinforcement learning with stability guarantees,

F. Berkenkamp, M. Turchetta, A. Schoellig, and A. Krause, “Safe model-based reinforcement learning with stability guarantees,” Ad- vances in neural information processing systems , vol. 30, 2017

work page 2017
[58]

Actor-critic physics-informed neural lya- punov control,

J. Wang and M. Fazlyab, “Actor-critic physics-informed neural lya- punov control,” arXiv preprint arXiv:2403.08448 , 2024

work page arXiv 2024
[59]

Methods of am lyapunov and their application,

V . I. Zubov, “Methods of am lyapunov and their application,”(No Title), 1964

work page 1964
[60]

Lyapunov-based reinforce- ment learning using koopman operators for automated vehicle parking,

D. S. Kushwaha, M. Hu, and Z. A. Biron, “Lyapunov-based reinforce- ment learning using koopman operators for automated vehicle parking,” IFAC-PapersOnLine, vol. 58, no. 28, pp. 84–89, 2024

work page 2024
[61]

Neural lyapunov function approximation with self-supervised reinforcement learning,

L. McCutcheon, B. Gharesifard, and S. Fallah, “Neural lyapunov function approximation with self-supervised reinforcement learning,” arXiv preprint arXiv:2503.15629 , 2025

work page arXiv 2025
[62]

Lyapunov-based safe reinforcement learning for microgrid energy management,

G. Hao, Y . Li, Y . Li, L. Jiang, and Z. Zeng, “Lyapunov-based safe reinforcement learning for microgrid energy management,” IEEE transactions on neural networks and learning systems , 2024

work page 2024
[63]

Lyapunov stability regulation of deep reinforcement learning control with application to automated driving,

B. Hejase and U. Ozguner, “Lyapunov stability regulation of deep reinforcement learning control with application to automated driving,” in 2023 American Control Conference (ACC) , 2023, pp. 4437–4442

work page 2023
[64]

Finite time lyapunov exponent analysis of model predictive control and reinforcement learning,

K. Krishna, S. L. Brunton, and Z. Song, “Finite time lyapunov exponent analysis of model predictive control and reinforcement learning,” IEEE Access, 2023

work page 2023
[65]

Task offloading and resource allocation in vehicular networks: A lyapunov-based deep reinforce- ment learning approach,

A. S. Kumar, L. Zhao, and X. Fernando, “Task offloading and resource allocation in vehicular networks: A lyapunov-based deep reinforce- ment learning approach,” IEEE Transactions on Vehicular Technology, vol. 72, no. 10, pp. 13 360–13 373, 2023

work page 2023
[66]

Lyapunov-based distributed reinforcement learning control with stability guarantee,

J. Yao, M. Han, and X. Yin, “Lyapunov-based distributed reinforcement learning control with stability guarantee,” Computers & Chemical Engineering, vol. 195, p. 108979, 2025

work page 2025
[67]

Stabilizing neural control using self-learned almost lyapunov critics,

Y .-C. Chang and S. Gao, “Stabilizing neural control using self-learned almost lyapunov critics,” in 2021 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2021, pp. 1803–1809

work page 2021
[68]

Lyapunov design for robust and efficient robotic reinforcement learn- ing,

T. Westenbroek, F. Castaneda, A. Agrawal, S. Sastry, and K. Sreenath, “Lyapunov design for robust and efficient robotic reinforcement learn- ing,” arXiv preprint arXiv:2208.06721 , 2022

work page arXiv 2022
[69]

Safe reinforcement learning for probabilistic reachability and safety specifications: A lyapunov-based approach,

S. Huh and I. Yang, “Safe reinforcement learning for probabilistic reachability and safety specifications: A lyapunov-based approach,” arXiv preprint arXiv:2002.10126 , 2020

work page arXiv 2002
[70]

Lyapunov- based uncertainty-aware safe reinforcement learning,

A. B. Jeddi, N. L. Dehghani, and A. Shafieezadeh, “Lyapunov- based uncertainty-aware safe reinforcement learning,” arXiv preprint arXiv:2107.13944, 2021

work page arXiv 2021
[71]

Learning min-norm stabilizing control laws for systems with unknown dynamics,

T. Westenbroek, F. Casta ˜neda, A. Agrawal, S. S. Sastry, and K. Sreenath, “Learning min-norm stabilizing control laws for systems with unknown dynamics,” in 2020 59th IEEE Conference on Decision and Control (CDC) . IEEE, 2020, pp. 737–744

work page 2020
[72]

Certifying stability of reinforce- ment learning policies using generalized lyapunov functions,

K. Long, J. Cort ´es, and N. Atanasov, “Certifying stability of reinforce- ment learning policies using generalized lyapunov functions,” arXiv preprint arXiv:2505.10947, 2025

work page arXiv 2025
[73]

Toward model-assisted safe reinforcement learning for data center cooling control: A lyapunov- based approach,

Z. Cao, R. Wang, X. Zhou, and Y . Wen, “Toward model-assisted safe reinforcement learning for data center cooling control: A lyapunov- based approach,” in Proceedings of the 14th ACM International Con- ference on Future Energy Systems , 2023, pp. 333–346

work page 2023
[74]

Stable inverse reinforcement learning: Policies from control lyapunov landscapes,

S. Tesfazgi, L. Sprandl, A. Lederer, and S. Hirche, “Stable inverse reinforcement learning: Policies from control lyapunov landscapes,” arXiv preprint arXiv:2405.08756 , 2024

work page arXiv 2024
[75]

A lyapunov-based approach to safe reinforcement learning,

Y . Chow, O. Nachum, E. Duenez-Guzman, and M. Ghavamzadeh, “A lyapunov-based approach to safe reinforcement learning,” Advances in neural information processing systems , vol. 31, 2018

work page 2018
[76]

Lyapunov-based Safe Policy Optimization for Continuous Control

Y . Chow, O. Nachum, A. Faust, E. Duenez-Guzman, and M. Ghavamzadeh, “Lyapunov-based safe policy optimization for con- tinuous control,” arXiv preprint arXiv:1901.10031 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[77]

Principled reward shaping for reinforcement learning via lyapunov stability theory,

Y . Dong, X. Tang, and Y . Yuan, “Principled reward shaping for reinforcement learning via lyapunov stability theory,” Neurocomputing, vol. 393, pp. 83–90, 2020

work page 2020
[78]

Lyapunov-inspired deep reinforcement learning for robot navigation in obstacle environments,

H. I. Ugurlu, A. Redder, and E. Kayacan, “Lyapunov-inspired deep reinforcement learning for robot navigation in obstacle environments,” in 2025 IEEE Symposium on Computational Intelligence on Engineer- ing/Cyber Physical Systems (CIES) . IEEE, 2025, pp. 1–8

work page 2025
[79]

Estimating lyapunov region of attraction for robust model-based reinforcement learning usv,

L. Xia, Y . Cui, Z. Yi, H. Li, and X. Wu, “Estimating lyapunov region of attraction for robust model-based reinforcement learning usv,” IEEE Transactions on Automation Science and Engineering , 2024

work page 2024
[80]

Policy invariance under reward transformations: Theory and application to reward shaping,

A. Y . Ng, D. Harada, and S. Russell, “Policy invariance under reward transformations: Theory and application to reward shaping,” in Icml, vol. 99. Citeseer, 1999, pp. 278–287

work page 1999

Showing first 80 references.

[1] [1]

Deep reinforcement learning: A brief survey,

K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep reinforcement learning: A brief survey,”IEEE Signal Processing Magazine, vol. 34, no. 6, pp. 26–38, 2017

work page 2017

[2] [2]

How to train your robot with deep reinforcement learning: lessons we have learned,

J. Ibarz, J. Tan, C. Finn, M. Kalakrishnan, P. Pastor, and S. Levine, “How to train your robot with deep reinforcement learning: lessons we have learned,” The International Journal of Robotics Research, vol. 40, no. 4-5, pp. 698–721, 2021

work page 2021

[3] [3]

Distributed reinforcement learning for robot teams: A review,

Y . Wang, M. Damani, P. Wang, Y . Cao, and G. Sartoretti, “Distributed reinforcement learning for robot teams: A review,” Current Robotics Reports, vol. 3, no. 4, pp. 239–257, 2022

work page 2022

[4] [4]

i-sim2real: Reinforcement learning of robotic policies in tight human- robot interaction loops,

S. W. Abeyruwan, L. Graesser, D. B. D’Ambrosio, A. Singh, A. Shankar, A. Bewley, D. Jain, K. M. Choromanski, and P. R. Sanketi, “i-sim2real: Reinforcement learning of robotic policies in tight human- robot interaction loops,” in Conference on Robot Learning . PMLR, 2023, pp. 212–224. 16 JOURNAL OF IEEE TRANSACTIONS ON ARTIFICIAL INTELLIGENCE, VOL. 00, NO....

work page 2023

[5] [5]

A digital twin-based sim-to- real transfer for deep reinforcement learning-enabled industrial robot grasping,

Y . Liu, H. Xu, D. Liu, and L. Wang, “A digital twin-based sim-to- real transfer for deep reinforcement learning-enabled industrial robot grasping,” Robotics and Computer-Integrated Manufacturing , vol. 78, p. 102365, 2022

work page 2022

[6] [6]

Deep reinforcement learning for humanoid robot behaviors,

A. F. Muzio, M. R. Maximo, and T. Yoneyama, “Deep reinforcement learning for humanoid robot behaviors,” Journal of Intelligent & Robotic Systems, vol. 105, no. 1, p. 12, 2022

work page 2022

[7] [7]

Tun- ing computer vision models with task rewards,

A. S. Pinto, A. Kolesnikov, Y . Shi, L. Beyer, and X. Zhai, “Tun- ing computer vision models with task rewards,” arXiv preprint arXiv:2302.08242, 2023

work page arXiv 2023

[8] [8]

Deep reinforcement learning in computer vision: a comprehensive survey,

N. Le, V . S. Rathour, K. Yamazaki, K. Luu, and M. Savvides, “Deep reinforcement learning in computer vision: a comprehensive survey,” Artificial Intelligence Review, pp. 1–87, 2022

work page 2022

[9] [9]

Evaluating vision transformer methods for deep reinforcement learning from pixels,

T. Tao, D. Reda, and M. van de Panne, “Evaluating vision transformer methods for deep reinforcement learning from pixels,” arXiv preprint arXiv:2204.04905, 2022

work page arXiv 2022

[10] [10]

Cyber-security and reinforce- ment learning—a brief survey,

A. M. K. Adawadkar and N. Kulkarni, “Cyber-security and reinforce- ment learning—a brief survey,” Engineering Applications of Artificial Intelligence, vol. 114, p. 105116, 2022

work page 2022

[11] [11]

Knowledge guided two-player reinforcement learning for cyber at- tacks and defenses,

A. Piplai, M. Anoruo, K. Fasaye, A. Joshi, T. Finin, and A. Ridley, “Knowledge guided two-player reinforcement learning for cyber at- tacks and defenses,” in 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, 2022, pp. 1342– 1349

work page 2022

[12] [12]

Cascaded reinforcement learning agents for large action spaces in autonomous penetration testing,

K. Tran, M. Standen, J. Kim, D. Bowman, T. Richer, A. Akella, and C.-T. Lin, “Cascaded reinforcement learning agents for large action spaces in autonomous penetration testing,” Applied Sciences , vol. 12, no. 21, p. 11265, 2022

work page 2022

[13] [13]

Reinforcement learning for feedback-enabled cyber resilience,

Y . Huang, L. Huang, and Q. Zhu, “Reinforcement learning for feedback-enabled cyber resilience,” Annual reviews in control, vol. 53, pp. 273–295, 2022

work page 2022

[14] [14]

A review of reinforcement learning based energy management systems for electrified powertrains: Progress, challenge, and potential solution,

A. H. Ganesh and B. Xu, “A review of reinforcement learning based energy management systems for electrified powertrains: Progress, challenge, and potential solution,” Renewable and Sustainable Energy Reviews, vol. 154, p. 111833, 2022

work page 2022

[15] [15]

Energy man- agement for hybrid electric vehicles based on imitation reinforcement learning,

Y . Liu, Y . Wu, X. Wang, L. Li, Y . Zhang, and Z. Chen, “Energy man- agement for hybrid electric vehicles based on imitation reinforcement learning,” Energy, vol. 263, p. 125890, 2023

work page 2023

[16] [16]

A reinforcement learning- based energy management strategy for fuel cell hybrid vehicle consid- ering real-time velocity prediction,

D. Yang, L. Wang, K. Yu, and J. Liang, “A reinforcement learning- based energy management strategy for fuel cell hybrid vehicle consid- ering real-time velocity prediction,” Energy Conversion and Manage- ment, vol. 274, p. 116453, 2022

work page 2022

[17] [17]

Economic energy dis- patch of microgrid using deeplstm-based deep reinforcement learning,

D. S. Kushwaha, Z. Biron, and R. Abdollahi, “Economic energy dis- patch of microgrid using deeplstm-based deep reinforcement learning,” in 2022 IEEE Power & Energy Society General Meeting (PESGM) . IEEE, 2022, pp. 1–5

work page 2022

[18] [18]

Supervised and reinforce- ment learning from observations in reconnaissance blind chess,

T. Bertram, J. F ¨urnkranz, and M. M ¨uller, “Supervised and reinforce- ment learning from observations in reconnaissance blind chess,” in 2022 IEEE Conference on Games (CoG) . IEEE, 2022, pp. 608–611

work page 2022

[19] [19]

A data-efficient method of deep reinforcement learning for chinese chess,

C. Xu, H. Ding, X. Zhang, C. Wang, and H. Yang, “A data-efficient method of deep reinforcement learning for chinese chess,” in 2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C) . IEEE, 2022, pp. 1–8

work page 2022

[20] [20]

Reinforcement learning in an adapt- able chess environment for detecting human-understandable concepts,

P. Hammersborg and I. Str ¨umke, “Reinforcement learning in an adapt- able chess environment for detecting human-understandable concepts,” arXiv preprint arXiv:2211.05500 , 2022

work page arXiv 2022

[21] [21]

Reinforcement learning agents providing advice in complex video games,

M. E. Taylor, N. Carboni, A. Fachantidis, I. Vlahavas, and L. Torrey, “Reinforcement learning agents providing advice in complex video games,” Connection Science, vol. 26, no. 1, pp. 45–63, 2014

work page 2014

[22] [22]

Playing Atari with Deep Reinforcement Learning

V . Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, “Playing atari with deep reinforcement learning,” arXiv preprint arXiv:1312.5602 , 2013

work page internal anchor Pith review Pith/arXiv arXiv 2013

[23] [23]

Model-based reinforcement learning for atari,

L. Kaiser, M. Babaeizadeh, P. Milos, B. Osinski, R. H. Camp- bell, K. Czechowski, D. Erhan, C. Finn, P. Kozakowski, S. Levine et al. , “Model-based reinforcement learning for atari,” arXiv preprint arXiv:1903.00374, 2019

work page arXiv 1903

[24] [24]

Deep reinforcement learning that matters,

P. Henderson, R. Islam, P. Bachman, J. Pineau, D. Precup, and D. Meger, “Deep reinforcement learning that matters,” in Proceedings of the AAAI conference on artificial intelligence , vol. 32, 2018

work page 2018

[25] [25]

Reinforcement learning with guarantees: a review,

P. Osinenko, D. Dobriborsci, and W. Aumer, “Reinforcement learning with guarantees: a review,” IFAC-PapersOnLine, vol. 55, no. 15, pp. 123–128, 2022

work page 2022

[26] [26]

End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,

R. Cheng, G. Orosz, R. M. Murray, and J. W. Burdick, “End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, 2019, pp. 3387–3395

work page 2019

[27] [27]

Safe reinforcement learning via confidence-based filters,

S. Curi, A. Lederer, S. Hirche, and A. Krause, “Safe reinforcement learning via confidence-based filters,” in 2022 IEEE 61st Conference on Decision and Control (CDC) . IEEE, 2022, pp. 3409–3415

work page 2022

[28] [28]

Safe multi-agent motion planning via filtered reinforcement learning,

A. P. Vinod, S. Safaoui, A. Chakrabarty, R. Quirynen, N. Yoshikawa, and S. Di Cairano, “Safe multi-agent motion planning via filtered reinforcement learning,” in 2022 International Conference on Robotics and Automation (ICRA) . IEEE, 2022, pp. 7270–7276

work page 2022

[29] [29]

DQN-TAMER: Human-in-the-Loop Reinforcement Learning with Intractable Feedback

R. Arakawa, S. Kobayashi, Y . Unno, Y . Tsuboi, and S.-i. Maeda, “Dqn- tamer: Human-in-the-loop reinforcement learning with intractable feed- back,” arXiv preprint arXiv:1810.11748 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[30] [30]

Toward human-in-the-loop ai: Enhancing deep reinforcement learning via real-time human guidance for autonomous driving,

J. Wu, Z. Huang, Z. Hu, and C. Lv, “Toward human-in-the-loop ai: Enhancing deep reinforcement learning via real-time human guidance for autonomous driving,” Engineering, vol. 21, pp. 75–91, 2023

work page 2023

[31] [31]

Human-in-the-loop rein- forcement learning in continuous-action space,

B. Luo, Z. Wu, F. Zhou, and B.-C. Wang, “Human-in-the-loop rein- forcement learning in continuous-action space,” IEEE Transactions on Neural Networks and Learning Systems , 2023

work page 2023

[32] [32]

Safe multi-agent reinforcement learning via shielding,

I. ElSayed-Aly, S. Bharadwaj, C. Amato, R. Ehlers, U. Topcu, and L. Feng, “Safe multi-agent reinforcement learning via shielding,” arXiv preprint arXiv:2101.11196, 2021

work page arXiv 2021

[33] [33]

Safe reinforcement learning via shielding,

M. Alshiekh, R. Bloem, R. Ehlers, B. K ¨onighofer, S. Niekum, and U. Topcu, “Safe reinforcement learning via shielding,” in Proceedings of the AAAI conference on artificial intelligence , vol. 32, 2018

work page 2018

[34] [34]

Integrating machine learning and model predictive control for automotive applications: A review and future directions,

A. Norouzi, H. Heidarifar, H. Borhan, M. Shahbakhti, and C. R. Koch, “Integrating machine learning and model predictive control for automotive applications: A review and future directions,” Engineering Applications of Artificial Intelligence , vol. 120, p. 105878, 2023

work page 2023

[35] [35]

Safe reinforcement learning using robust mpc,

M. Zanon and S. Gros, “Safe reinforcement learning using robust mpc,” IEEE Transactions on Automatic Control , vol. 66, no. 8, pp. 3638– 3652, 2020

work page 2020

[36] [36]

Bridging the gap between qp-based and mpc- based rl,

S. Sawant and S. Gros, “Bridging the gap between qp-based and mpc- based rl,” arXiv preprint arXiv:2205.08856 , 2022

work page arXiv 2022

[37] [37]

Control-lyapunov functions,

E. D. Sontag, “Control-lyapunov functions,” in Open problems in mathematical systems and control theory . Springer, 1999, pp. 211– 216

work page 1999

[38] [38]

Control barrier functions: Theory and applications,

A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control barrier functions: Theory and applications,” in 2019 18th European control conference (ECC) . IEEE, 2019, pp. 3420–3431

work page 2019

[39] [39]

A comprehensive survey on safe rein- forcement learning,

J. Garcıa and F. Fern ´andez, “A comprehensive survey on safe rein- forcement learning,” Journal of Machine Learning Research , vol. 16, no. 1, pp. 1437–1480, 2015

work page 2015

[40] [40]

Policy learning with constraints in model-free reinforcement learning: A survey,

Y . Liu, A. Halev, and X. Liu, “Policy learning with constraints in model-free reinforcement learning: A survey,” inThe 30th International Joint Conference on Artificial Intelligence (IJCAI) , 2021

work page 2021

[41] [41]

Safe learning in robotics: From learning-based control to safe reinforcement learning,

L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig, “Safe learning in robotics: From learning-based control to safe reinforcement learning,” Annual Review of Control, Robotics, and Autonomous Systems , vol. 5, pp. 411–444, 2022

work page 2022

[42] [42]

A review of safe reinforcement learning: Methods, theory and applications,

S. Gu, L. Yang, Y . Du, G. Chen, F. Walter, J. Wang, Y . Yang, and A. Knoll, “A review of safe reinforcement learning: Methods, theory and applications,” arXiv preprint arXiv:2205.10330 , 2022

work page arXiv 2022

[43] [43]

Safe learning for control using control lyapunov functions and control barrier functions: A review,

A. Anand, K. Seel, V . Gjærum, A. H ˚akansson, H. Robinson, and A. Saad, “Safe learning for control using control lyapunov functions and control barrier functions: A review,” Procedia Computer Science , vol. 192, pp. 3987–3997, 2021

work page 2021

[44] [44]

Safe control with learned certificates: A survey of neural lyapunov, barrier, and contraction methods,

C. Dawson, S. Gao, and C. Fan, “Safe control with learned certificates: A survey of neural lyapunov, barrier, and contraction methods,” arXiv preprint arXiv:2202.11762, 2022

work page arXiv 2022

[45] [45]

Meyn, Control systems and reinforcement learning

S. Meyn, Control systems and reinforcement learning . Cambridge University Press, 2022

work page 2022

[46] [46]

Altman, Constrained Markov decision processes

E. Altman, Constrained Markov decision processes. Routledge, 2021

work page 2021

[47] [47]

Khalil, Nonlinear Systems , ser

H. Khalil, Nonlinear Systems , ser. Pearson Education. Prentice Hall, 2002. [Online]. Available: https://books.google.com/books?id=t d1QgAACAAJ

work page 2002

[48] [48]

A ‘universal’construction of artstein’s theorem on nonlinear stabilization,

E. D. Sontag, “A ‘universal’construction of artstein’s theorem on nonlinear stabilization,” Systems & control letters , vol. 13, no. 2, pp. 117–123, 1989

work page 1989

[49] [49]

Modified barrier functions (theory and methods),

R. Polyak, “Modified barrier functions (theory and methods),” Mathe- matical programming, vol. 54, pp. 177–222, 1992

work page 1992

[50] [50]

Control barrier function based quadratic programs for safety critical systems,

A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,” IEEE Transactions on Automatic Control , vol. 62, no. 8, pp. 3861–3876, 2016

work page 2016

[51] [51]

Lyapunov design for safe reinforcement learning,

T. J. Perkins and A. G. Barto, “Lyapunov design for safe reinforcement learning,” Journal of Machine Learning Research , vol. 3, no. Dec, pp. 803–832, 2002. KUSHW AHAet al.: A REVIEW ON SAFE REINFORCEMENT LEARNING USING LAYPUNOV AND BARRIER FUNCTIONS 17

work page 2002

[52] [52]

Theory and development of higher-order cmac neural networks,

S. H. Lane, D. A. Handelman, and J. J. Gelfand, “Theory and development of higher-order cmac neural networks,” IEEE Control Systems Magazine, vol. 12, no. 2, pp. 23–30, 1992

work page 1992

[53] [53]

Asymp- totically stable adaptive–optimal control algorithm with saturating actuators and relaxed persistence of excitation,

K. G. Vamvoudakis, M. F. Miranda, and J. P. Hespanha, “Asymp- totically stable adaptive–optimal control algorithm with saturating actuators and relaxed persistence of excitation,” IEEE transactions on neural networks and learning systems , vol. 27, no. 11, pp. 2386–2398, 2015

work page 2015

[54] [54]

Model-based rein- forcement learning for approximate optimal regulation,

R. Kamalapurkar, P. Walters, and W. E. Dixon, “Model-based rein- forcement learning for approximate optimal regulation,” Automatica, vol. 64, pp. 94–104, 2016

work page 2016

[55] [55]

Decomposing control lya- punov functions for efficient reinforcement learning,

A. Lopez and D. Fridovich-Keil, “Decomposing control lya- punov functions for efficient reinforcement learning,” arXiv preprint arXiv:2403.12210, 2024

work page arXiv 2024

[56] [56]

Diagonal recurrent neural network based adaptive control of nonlinear dynamical systems using lyapunov stability criterion,

R. Kumar, S. Srivastava, and J. Gupta, “Diagonal recurrent neural network based adaptive control of nonlinear dynamical systems using lyapunov stability criterion,” ISA transactions , vol. 67, pp. 407–427, 2017

work page 2017

[57] [57]

Safe model-based reinforcement learning with stability guarantees,

F. Berkenkamp, M. Turchetta, A. Schoellig, and A. Krause, “Safe model-based reinforcement learning with stability guarantees,” Ad- vances in neural information processing systems , vol. 30, 2017

work page 2017

[58] [58]

Actor-critic physics-informed neural lya- punov control,

J. Wang and M. Fazlyab, “Actor-critic physics-informed neural lya- punov control,” arXiv preprint arXiv:2403.08448 , 2024

work page arXiv 2024

[59] [59]

Methods of am lyapunov and their application,

V . I. Zubov, “Methods of am lyapunov and their application,”(No Title), 1964

work page 1964

[60] [60]

Lyapunov-based reinforce- ment learning using koopman operators for automated vehicle parking,

D. S. Kushwaha, M. Hu, and Z. A. Biron, “Lyapunov-based reinforce- ment learning using koopman operators for automated vehicle parking,” IFAC-PapersOnLine, vol. 58, no. 28, pp. 84–89, 2024

work page 2024

[61] [61]

Neural lyapunov function approximation with self-supervised reinforcement learning,

L. McCutcheon, B. Gharesifard, and S. Fallah, “Neural lyapunov function approximation with self-supervised reinforcement learning,” arXiv preprint arXiv:2503.15629 , 2025

work page arXiv 2025

[62] [62]

Lyapunov-based safe reinforcement learning for microgrid energy management,

G. Hao, Y . Li, Y . Li, L. Jiang, and Z. Zeng, “Lyapunov-based safe reinforcement learning for microgrid energy management,” IEEE transactions on neural networks and learning systems , 2024

work page 2024

[63] [63]

Lyapunov stability regulation of deep reinforcement learning control with application to automated driving,

B. Hejase and U. Ozguner, “Lyapunov stability regulation of deep reinforcement learning control with application to automated driving,” in 2023 American Control Conference (ACC) , 2023, pp. 4437–4442

work page 2023

[64] [64]

Finite time lyapunov exponent analysis of model predictive control and reinforcement learning,

K. Krishna, S. L. Brunton, and Z. Song, “Finite time lyapunov exponent analysis of model predictive control and reinforcement learning,” IEEE Access, 2023

work page 2023

[65] [65]

Task offloading and resource allocation in vehicular networks: A lyapunov-based deep reinforce- ment learning approach,

A. S. Kumar, L. Zhao, and X. Fernando, “Task offloading and resource allocation in vehicular networks: A lyapunov-based deep reinforce- ment learning approach,” IEEE Transactions on Vehicular Technology, vol. 72, no. 10, pp. 13 360–13 373, 2023

work page 2023

[66] [66]

Lyapunov-based distributed reinforcement learning control with stability guarantee,

J. Yao, M. Han, and X. Yin, “Lyapunov-based distributed reinforcement learning control with stability guarantee,” Computers & Chemical Engineering, vol. 195, p. 108979, 2025

work page 2025

[67] [67]

Stabilizing neural control using self-learned almost lyapunov critics,

Y .-C. Chang and S. Gao, “Stabilizing neural control using self-learned almost lyapunov critics,” in 2021 IEEE International Conference on Robotics and Automation (ICRA) . IEEE, 2021, pp. 1803–1809

work page 2021

[68] [68]

Lyapunov design for robust and efficient robotic reinforcement learn- ing,

T. Westenbroek, F. Castaneda, A. Agrawal, S. Sastry, and K. Sreenath, “Lyapunov design for robust and efficient robotic reinforcement learn- ing,” arXiv preprint arXiv:2208.06721 , 2022

work page arXiv 2022

[69] [69]

Safe reinforcement learning for probabilistic reachability and safety specifications: A lyapunov-based approach,

S. Huh and I. Yang, “Safe reinforcement learning for probabilistic reachability and safety specifications: A lyapunov-based approach,” arXiv preprint arXiv:2002.10126 , 2020

work page arXiv 2002

[70] [70]

Lyapunov- based uncertainty-aware safe reinforcement learning,

A. B. Jeddi, N. L. Dehghani, and A. Shafieezadeh, “Lyapunov- based uncertainty-aware safe reinforcement learning,” arXiv preprint arXiv:2107.13944, 2021

work page arXiv 2021

[71] [71]

Learning min-norm stabilizing control laws for systems with unknown dynamics,

T. Westenbroek, F. Casta ˜neda, A. Agrawal, S. S. Sastry, and K. Sreenath, “Learning min-norm stabilizing control laws for systems with unknown dynamics,” in 2020 59th IEEE Conference on Decision and Control (CDC) . IEEE, 2020, pp. 737–744

work page 2020

[72] [72]

Certifying stability of reinforce- ment learning policies using generalized lyapunov functions,

K. Long, J. Cort ´es, and N. Atanasov, “Certifying stability of reinforce- ment learning policies using generalized lyapunov functions,” arXiv preprint arXiv:2505.10947, 2025

work page arXiv 2025

[73] [73]

Toward model-assisted safe reinforcement learning for data center cooling control: A lyapunov- based approach,

Z. Cao, R. Wang, X. Zhou, and Y . Wen, “Toward model-assisted safe reinforcement learning for data center cooling control: A lyapunov- based approach,” in Proceedings of the 14th ACM International Con- ference on Future Energy Systems , 2023, pp. 333–346

work page 2023

[74] [74]

Stable inverse reinforcement learning: Policies from control lyapunov landscapes,

S. Tesfazgi, L. Sprandl, A. Lederer, and S. Hirche, “Stable inverse reinforcement learning: Policies from control lyapunov landscapes,” arXiv preprint arXiv:2405.08756 , 2024

work page arXiv 2024

[75] [75]

A lyapunov-based approach to safe reinforcement learning,

Y . Chow, O. Nachum, E. Duenez-Guzman, and M. Ghavamzadeh, “A lyapunov-based approach to safe reinforcement learning,” Advances in neural information processing systems , vol. 31, 2018

work page 2018

[76] [76]

Lyapunov-based Safe Policy Optimization for Continuous Control

Y . Chow, O. Nachum, A. Faust, E. Duenez-Guzman, and M. Ghavamzadeh, “Lyapunov-based safe policy optimization for con- tinuous control,” arXiv preprint arXiv:1901.10031 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[77] [77]

Principled reward shaping for reinforcement learning via lyapunov stability theory,

Y . Dong, X. Tang, and Y . Yuan, “Principled reward shaping for reinforcement learning via lyapunov stability theory,” Neurocomputing, vol. 393, pp. 83–90, 2020

work page 2020

[78] [78]

Lyapunov-inspired deep reinforcement learning for robot navigation in obstacle environments,

H. I. Ugurlu, A. Redder, and E. Kayacan, “Lyapunov-inspired deep reinforcement learning for robot navigation in obstacle environments,” in 2025 IEEE Symposium on Computational Intelligence on Engineer- ing/Cyber Physical Systems (CIES) . IEEE, 2025, pp. 1–8

work page 2025

[79] [79]

Estimating lyapunov region of attraction for robust model-based reinforcement learning usv,

L. Xia, Y . Cui, Z. Yi, H. Li, and X. Wu, “Estimating lyapunov region of attraction for robust model-based reinforcement learning usv,” IEEE Transactions on Automation Science and Engineering , 2024

work page 2024

[80] [80]

Policy invariance under reward transformations: Theory and application to reward shaping,

A. Y . Ng, D. Harada, and S. Russell, “Policy invariance under reward transformations: Theory and application to reward shaping,” in Icml, vol. 99. Citeseer, 1999, pp. 278–287

work page 1999