Synthesis and Deployment of Maximal Robust Control Barrier Functions through Adversarial Reinforcement Learning
Pith reviewed 2026-05-10 14:23 UTC · model grok-4.3
The pith
The safety value function from the Isaacs equation serves as a robust discrete-time control barrier function that certifies the maximal safe set under bounded uncertainty.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The safety value function solving the dynamic programming Isaacs equation is a valid robust discrete-time CBF that enforces safety on the maximal robust safe set. By adopting the Q-function from reinforcement learning, the barrier is lifted into state-action space to produce a robust Q-CBF constraint; adversarial RL then synthesizes and deploys these Q-CBFs on general nonlinear systems whose dynamics are treated as black boxes.
What carries the argument
The robust Q-CBF obtained by lifting the safety value function into state-action space so that safety filtering no longer requires explicit dynamics.
If this is right
- Robust CBFs become available for general nonlinear systems without control-affine structure or explicit uncertainty models.
- The synthesized Q-CBFs certify substantially larger safe sets than existing barrier-based methods on the inverted pendulum benchmark.
- Safety enforcement remains reliable on high-dimensional systems such as 36-dimensional quadruped simulators even when disturbances are chosen adversarially.
- The learned Q-function can be used directly for online safety filtering without recovering the underlying dynamics.
Where Pith is reading between the lines
- The same lifting and adversarial training idea could be applied to other certificates such as Lyapunov functions for stability.
- Hardware experiments on physical robots would test whether the black-box approximation holds under real sensor noise and actuator limits.
- If uncertainty bounds are themselves uncertain, the method could be extended to learn a joint bound and value function.
- Discretization choices in the Isaacs equation may affect conservatism; comparing different time steps on the same plant would quantify the effect.
Load-bearing premise
Adversarial reinforcement learning can accurately approximate the Q-function of the safety value function when dynamics are unknown and disturbances are bounded.
What would settle it
On a system whose true maximal robust safe set is known by other means, deploy the learned Q-CBF controller and check whether it prevents all safety violations under worst-case disturbances while still allowing the full certified set.
Figures
read the original abstract
Robust control barrier functions (CBFs) provide a principled mechanism for smooth safety enforcement under worst-case disturbances. However, existing approaches typically rely on explicit, closed-form structure in the dynamics (e.g., control-affine) and uncertainty models. This has led to limited scalability and generality, with most robust CBFs certifying only conservative subsets of the maximal robust safe set. In this paper, we introduce a new robust CBF framework for general nonlinear systems under bounded uncertainty. We first show that the safety value function solving the dynamic programming Isaacs equation is a valid robust discrete-time CBF that enforces safety on the maximal robust safe set. We then adopt the key reinforcement learning (RL) notion of quality function (or Q-function), which removes the need for explicit dynamics by lifting the barrier certificate into state-action space and yields a novel robust Q-CBF constraint for safety filtering. Combined with adversarial RL, this enables the synthesis and deployment of robust Q-CBFs on general nonlinear systems with black-box dynamics and unknown uncertainty structure. We validate the framework on a canonical inverted pendulum benchmark and a 36-D quadruped simulator, achieving substantially less conservative safe sets than barrier-based baselines on the pendulum and reliable safety enforcement even under adversarial uncertainty realizations on the quadruped.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that the safety value function solving the discrete-time Isaacs equation is a valid robust discrete-time CBF enforcing safety on the maximal robust safe set for general nonlinear systems under bounded uncertainty. It lifts this to a Q-function in state-action space to obtain a novel robust Q-CBF constraint, enabling model-free synthesis and deployment via adversarial RL on black-box dynamics with unknown uncertainty structure. Validation is provided on an inverted pendulum benchmark (showing less conservative safe sets than baselines) and a 36-D quadruped simulator (showing reliable safety under adversarial disturbances).
Significance. If the approximation and invariance guarantees hold, the framework would meaningfully advance scalable robust safety filtering beyond control-affine or explicitly modeled systems, allowing maximal safe sets to be synthesized for high-dimensional black-box robots where traditional robust CBF methods are intractable.
major comments (3)
- [Main theoretical results on value function and Q-CBF] The central step that the safety value function V solving the Isaacs equation is a valid robust CBF, and its lift to a Q-CBF whose robust decrease condition (min_u max_w Q(x,u,w) >= 0) preserves invariance, is load-bearing for the model-free claim; the manuscript must explicitly derive the Q-CBF constraint and show that any RL approximation error is bounded in a manner that does not violate the worst-case decrease (see the derivation following the Isaacs equation and the robust Q-CBF definition).
- [Adversarial RL procedure for Q-CBF synthesis] Adversarial RL is asserted to approximate the Q-function corresponding to the Isaacs solution without explicit dynamics, yet no convergence or error-bound analysis is given for general nonlinear systems; this directly affects whether the learned Q satisfies the robust CBF condition up to a margin that certifies safety (see the adversarial RL training section and the quadruped experiment).
- [36-D quadruped simulator experiments] In the quadruped validation, the claim of reliable safety enforcement under adversarial uncertainty realizations assumes the training adversary fully realizes the bounded but unstructured disturbances used in the theory; without reporting the realized disturbance bounds or a post-training verification that the learned Q meets the decrease condition on held-out worst-case samples, the invariance guarantee remains unverified.
minor comments (2)
- [Abstract and results] The abstract states 'substantially less conservative safe sets' on the pendulum; include quantitative metrics (e.g., volume ratios or boundary distances) and explicit baseline comparisons in the main results section.
- [Preliminaries and notation] Clarify early the precise difference between the proposed robust Q-CBF constraint and standard safety-filter Q-functions to avoid notation ambiguity.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed review. The comments identify key areas where the theoretical derivations and experimental validation can be strengthened for clarity and rigor. We address each major comment below, indicating where revisions will be made to the manuscript.
read point-by-point responses
-
Referee: [Main theoretical results on value function and Q-CBF] The central step that the safety value function V solving the Isaacs equation is a valid robust CBF, and its lift to a Q-CBF whose robust decrease condition (min_u max_w Q(x,u,w) >= 0) preserves invariance, is load-bearing for the model-free claim; the manuscript must explicitly derive the Q-CBF constraint and show that any RL approximation error is bounded in a manner that does not violate the worst-case decrease (see the derivation following the Isaacs equation and the robust Q-CBF definition).
Authors: We agree that greater explicitness is warranted. In the revised manuscript, we will expand the derivation immediately following the Isaacs equation to include a complete step-by-step proof that V is a valid robust discrete-time CBF on the maximal safe set, followed by the lifting argument showing that the Q-function satisfies the robust decrease condition min_u max_w Q(x,u,w) >= 0 and thereby preserves forward invariance. On approximation error, the adversarial RL objective directly penalizes violations of the worst-case Bellman inequality, so the learned Q is trained to satisfy the condition up to the optimization residual; we will add a remark acknowledging that a general a-priori error bound for arbitrary nonlinear systems is not derived here and remains an open question, while noting that the empirical safety margins observed in both benchmarks remain positive. revision: partial
-
Referee: [Adversarial RL procedure for Q-CBF synthesis] Adversarial RL is asserted to approximate the Q-function corresponding to the Isaacs solution without explicit dynamics, yet no convergence or error-bound analysis is given for general nonlinear systems; this directly affects whether the learned Q satisfies the robust CBF condition up to a margin that certifies safety (see the adversarial RL training section and the quadruped experiment).
Authors: We acknowledge that a formal convergence guarantee for adversarial RL with neural-network approximators on general nonlinear dynamics is not provided and is indeed difficult to obtain without restrictive assumptions (e.g., linear-quadratic structure or finite state-action spaces). The manuscript instead relies on the fact that the min-max robust Q-CBF constraint is embedded directly in the training loss, so that any converged Q approximately solves the Isaacs equation by construction. We will augment the training-section description with the precise loss formulation, network architectures, and hyper-parameters used, and we will add a short discussion of this theoretical limitation together with the empirical evidence that the learned Q-CBF maintains positive safety margins under adversarial disturbances. revision: no
-
Referee: [36-D quadruped simulator experiments] In the quadruped validation, the claim of reliable safety enforcement under adversarial uncertainty realizations assumes the training adversary fully realizes the bounded but unstructured disturbances used in the theory; without reporting the realized disturbance bounds or a post-training verification that the learned Q meets the decrease condition on held-out worst-case samples, the invariance guarantee remains unverified.
Authors: We will revise the quadruped experimental section to explicitly state the disturbance bounds employed during both training and testing, together with the realized disturbance magnitudes observed in the reported trials. In addition, we will include a post-training verification procedure that evaluates the robust decrease condition min_u max_w Q(x,u,w) >= 0 on a held-out set of worst-case disturbance samples drawn from the same bounded set; the results of this verification will be reported to confirm that the learned Q-CBF satisfies the invariance condition within a quantifiable margin. revision: yes
Circularity Check
No significant circularity; derivation is self-contained from dynamic programming.
full rationale
The core claim that the safety value function solving the discrete-time Isaacs equation is a valid robust CBF follows directly from standard dynamic programming arguments for zero-sum games and invariance, without reducing to fitted parameters or self-referential definitions. The subsequent lifting to a Q-CBF and use of adversarial RL for model-free synthesis on black-box dynamics is presented as an approximation technique for deployment, not as a proof that relies on the approximation itself being exact by construction. No load-bearing self-citations, ansatzes smuggled via prior work, or renaming of known results appear in the provided derivation chain. The paper validates on external benchmarks (inverted pendulum, quadruped simulator), keeping the central mathematical step independent of the learning procedure.
Axiom & Free-Parameter Ledger
free parameters (1)
- RL hyperparameters and adversarial training parameters
axioms (1)
- domain assumption Existence and uniqueness of the safety value function solving the Isaacs equation for the given system and uncertainty bounds.
invented entities (1)
-
robust Q-CBF
no independent evidence
Reference graph
Works this paper leans on
-
[1]
The safety filter: A unified view of safety-critical control in autonomous systems,
K. C. Hsu, H. Hu, and J. F. Fisac, “The safety filter: A unified view of safety-critical control in autonomous systems,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 7, no. 1, pp. 47–72, 2024
work page 2024
-
[2]
Robust control barrier functions for constrained sta- bilization of nonlinear systems,
M. Jankovic, “Robust control barrier functions for constrained sta- bilization of nonlinear systems,”Automatica, vol. 96, pp. 359–367, 2018
work page 2018
-
[3]
Robust adaptive control barrier functions: An adaptive and data-driven approach to safety,
B. T. Lopez, J.-J. E. Slotine, and J. P. How, “Robust adaptive control barrier functions: An adaptive and data-driven approach to safety,” IEEE Control Systems Letters, vol. 5, no. 3, pp. 1031–1036, 2020
work page 2020
-
[4]
Parameterized barrier functions to guarantee safety under uncertainty,
A. Alan, T. G. Molnar, A. D. Ames, and G. Orosz, “Parameterized barrier functions to guarantee safety under uncertainty,”IEEE Control Systems Letters, vol. 7, pp. 2077–2082, 2023
work page 2077
-
[5]
Distur- bance observers for robust safety-critical control with control barrier functions,
A. Alan, T. G. Molnar, E. Das ¸, A. D. Ames, and G. Orosz, “Distur- bance observers for robust safety-critical control with control barrier functions,”IEEE control systems letters, vol. 7, pp. 1123–1128, 2022
work page 2022
-
[6]
M. H. Cohen, C. Belta, and R. Tron, “Robust control barrier func- tions for nonlinear control systems with uncertainty: A duality-based approach,” in2022 IEEE 61st Conference on Decision and Control (CDC). IEEE, 2022, pp. 174–179
work page 2022
-
[7]
Robust control barrier functions with sector-bounded uncertainties,
J. Buch, S.-C. Liao, and P. Seiler, “Robust control barrier functions with sector-bounded uncertainties,”IEEE Control Systems Letters, vol. 6, pp. 1994–1999, 2021
work page 1994
-
[8]
Data-driven robust barrier functions for safe, long-term operation,
Y . Emam, P. Glotfelter, S. Wilson, G. Notomista, and M. Egerstedt, “Data-driven robust barrier functions for safe, long-term operation,” IEEE transactions on robotics, vol. 38, no. 3, pp. 1671–1685, 2021
work page 2021
-
[9]
Towards robust data-driven control synthesis for nonlinear systems with actuation uncertainty,
A. J. Taylor, V . D. Dorobantu, S. Dean, B. Recht, Y . Yue, and A. D. Ames, “Towards robust data-driven control synthesis for nonlinear systems with actuation uncertainty,” in2021 60th IEEE Conference on Decision and Control (CDC). IEEE, 2021, pp. 6469–6476
work page 2021
-
[10]
Learning robust output control barrier functions from safe expert demonstrations,
L. Lindemann, A. Robey, L. Jiang, S. Das, S. Tu, and N. Matni, “Learning robust output control barrier functions from safe expert demonstrations,”IEEE Open Journal of Control Systems, vol. 3, pp. 158–172, 2024
work page 2024
-
[11]
A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games,
I. M. Mitchell, A. M. Bayen, and C. J. Tomlin, “A time-dependent Hamilton-Jacobi formulation of reachable sets for continuous dynamic games,”IEEE Transactions on automatic control, vol. 50, no. 7, pp. 947–957, 2005
work page 2005
-
[12]
ISAACS: Iterative soft adversarial actor-critic for safety,
K.-C. Hsu, D. P. Nguyen, and J. F. Fisac, “ISAACS: Iterative soft adversarial actor-critic for safety,” inProceedings of the 5th Annual Learning for Dynamics and Control Conference (L4DC). PMLR, 2023, pp. 90–103. [Online]. Available: https: //proceedings.mlr.press/v211/hsu23a.html
work page 2023
-
[13]
J. Wang, H. Hu, D. P. Nguyen, and J. F. Fisac, “MAGICS: Adversarial RL with Minimax Actors Guided by Implicit Critic Stackelberg for Convergent Neural Synthesis of Robot Safety,” inProceedings of the 16th Workshop on the Algorithmic Foundations of Robotics (WAFR), 2024
work page 2024
-
[14]
Gameplay filters: Robust zero-shot safety through adversarial imagination,
D. P. Nguyen, K.-C. Hsu, W. Yu, J. Tan, and J. F. Fisac, “Gameplay filters: Robust zero-shot safety through adversarial imagination,” in Conference on Robot Learning. PMLR, 2025, pp. 387–407. [Online]. Available: https://proceedings.mlr.press/v270/nguyen25a.html
work page 2025
-
[15]
Robust control barrier–value functions for safety-critical control,
J. J. Choi, D. Lee, K. Sreenath, C. J. Tomlin, and S. L. Herbert, “Robust control barrier–value functions for safety-critical control,” in 2021 60th IEEE Conference on Decision and Control (CDC). IEEE, 2021, pp. 6814–6821
work page 2021
-
[16]
D. Hirsch, J. F. Fisac, and S. Herbert, “Viscosity cbfs: Bridging the control barrier function and hamilton-jacobi reachability frameworks in safe control theory,”arXiv preprint arXiv:2510.09929, 2025
-
[17]
C. J. Watkins and P. Dayan, “Q-learning,”Machine learning, vol. 8, pp. 279–292, 1992
work page 1992
-
[18]
O.-R. A. D. O. Committee,Taxonomy and definitions for terms related to driving automation systems for on-road motor vehicles. SAE international, 2021
work page 2021
-
[19]
H. K. Khalil and J. W. Grizzle,Nonlinear systems. Prentice hall Upper Saddle River, NJ, 2002, vol. 3
work page 2002
-
[20]
M. Chen and C. J. Tomlin, “Hamilton–Jacobi reachability: Some recent theoretical advances and applications in unmanned airspace management,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 1, no. 1, pp. 333–358, 2018
work page 2018
-
[21]
Hamilton-Jacobi reachability: A brief overview and recent advances,
S. Bansal, M. Chen, S. Herbert, and C. J. Tomlin, “Hamilton-Jacobi reachability: A brief overview and recent advances,” in2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE, 2017, pp. 2242–2253
work page 2017
-
[22]
Provably op- timal reinforcement learning under safety filtering,
D. D. Oh, D. P. Nguyen, H. Hu, and J. F. Fisac, “Provably op- timal reinforcement learning under safety filtering,”arXiv preprint arXiv:2510.18082, 2025
-
[23]
Safety with Agency: Human-Centered Safety Filter with Application to AI-Assisted Motorsports,
D. D. Oh, J. Lidard, H. Hu, H. Sinhmar, E. Lazarski, D. Gopinath, E. S. Sumner, J. A. DeCastro, G. Rosman, N. E. Leonardet al., “Safety with Agency: Human-Centered Safety Filter with Application to AI-Assisted Motorsports,” inProceedings of Robotics: Science and Systems (RSS), 2025
work page 2025
-
[24]
Local convergence analysis of gradient descent ascent with finite timescale separation,
T. Fiez and L. J. Ratliff, “Local convergence analysis of gradient descent ascent with finite timescale separation,” inProceedings of the International Conference on Learning Representation, 2021
work page 2021
-
[25]
What is local optimality in nonconvex-nonconcave minimax optimization?
C. Jin, P. Netrapalli, and M. Jordan, “What is local optimality in nonconvex-nonconcave minimax optimization?” inInternational conference on machine learning. PMLR, 2020, pp. 4880–4889
work page 2020
-
[26]
Verification of neural reachable tubes via sce- nario optimization and conformal prediction,
A. Lin and S. Bansal, “Verification of neural reachable tubes via sce- nario optimization and conformal prediction,” in6th Annual Learning for Dynamics & Control Conference. PMLR, 2024, pp. 719–731
work page 2024
-
[27]
D. D. Oh, D. Lee, and H. J. Kim, “Safety-critical control under multiple state and input constraints and application to fixed-wing UA V,” in2023 62nd IEEE Conference on Decision and Control (CDC). IEEE, 2023, pp. 1748–1755
work page 2023
-
[28]
Optimizeddp: An efficient, user-friendly library for optimal control and dynamic programming,
M. Bui, G. Giovanis, M. Chen, and A. Shriraman, “Optimizeddp: An efficient, user-friendly library for optimal control and dynamic programming,”arXiv preprint arXiv:2204.05520, 2022
-
[29]
H. Hu, D. Isele, S. Bae, and J. F. Fisac, “Active uncertainty reduction for safe and efficient interaction planning: A shielding-aware dual control approach,”The International Journal of Robotics Research, vol. 43, no. 9, pp. 1382–1408, 2024
work page 2024
-
[30]
F. P. Bejarano, L. Brunke, and A. P. Schoellig, “Multi-step model pre- dictive safety filters: Reducing chattering by increasing the prediction horizon,” in2023 62nd IEEE Conference on Decision and Control (CDC). IEEE, 2023, pp. 4723–4730
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.