Solving Reach- and Stabilize-Avoid Problems Using Discounted Reachability
Pith reviewed 2026-05-22 16:20 UTC · model grok-4.3
The pith
A new Lipschitz continuous value function exactly identifies the states from which a controller can reach a target without violating constraints despite worst-case disturbances.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We address the reach-avoid problem by designing a new Lipschitz continuous reach-avoid value function whose zero sublevel set exactly characterizes the reach-avoid set. We establish that the associated Bellman backup operator is contractive and that the reach-avoid value function is the unique viscosity solution of a Hamilton-Jacobi variational inequality. For the stabilize-avoid problem we develop a two-step framework that integrates our reach-avoid strategies with a robust control Lyapunov-value function to ensure both target reachability and long-term stability.
What carries the argument
The Lipschitz continuous reach-avoid value function, whose zero sublevel set encodes the desired safe reachable states and which solves the Hamilton-Jacobi variational inequality uniquely as a viscosity solution.
Load-bearing premise
A Lipschitz continuous reach-avoid value function exists whose zero sublevel set exactly coincides with the true reach-avoid set under the discounted formulation for arbitrary nonlinear continuous-time systems.
What would settle it
A concrete nonlinear system together with target and unsafe sets for which the numerically computed zero sublevel set either contains a state from which no control avoids the unsafe region while reaching the target, or excludes a state where such a control exists.
Figures
read the original abstract
In this article, we consider the infinite-horizon reach-avoid (RA) and stabilize-avoid (SA) zero-sum game problems for general nonlinear continuous-time systems, where the goal is to find the set of states that can be controlled to reach or stabilize to a target set, without violating constraints even under the worst-case disturbance. Based on the Hamilton-Jacobi reachability method, we address the RA problem by designing a new Lipschitz continuous RA value function, whose zero sublevel set exactly characterizes the RA set. We establish that the associated Bellman backup operator is contractive and that the RA value function is the unique viscosity solution of a Hamilton-Jacobi variational inequality. Finally, we develop a two-step framework for the SA problem by integrating our RA strategies with a recently proposed Robust Control Lyapunov-Value Function, thereby ensuring both target reachability and long-term stability. We numerically verify our RA and SA frameworks on a 3D Dubins car system to demonstrate the efficacy of the proposed approach.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper addresses infinite-horizon reach-avoid (RA) and stabilize-avoid (SA) zero-sum games for general nonlinear continuous-time systems with disturbances. It introduces a new Lipschitz continuous RA value function whose zero sublevel set is asserted to exactly characterize the RA set, shows that the associated Bellman backup operator is contractive, and proves that this value function is the unique viscosity solution of a Hamilton-Jacobi variational inequality. For the SA problem, the RA strategies are combined with a robust control Lyapunov-value function to ensure both reachability and asymptotic stability. The claims are supported by numerical experiments on a 3D Dubins car.
Significance. If the exact zero-sublevel characterization holds under the discounted formulation, the work supplies a contractive operator whose fixed point yields the precise RA set via viscosity solution theory, offering a route to both theoretical guarantees and potentially more stable numerical schemes for infinite-horizon problems. The two-step integration with the robust CLVF for SA problems is a natural and useful extension. The numerical verification on the Dubins car provides concrete evidence of practical utility.
major comments (2)
- [§3.2, Definition 3 and Theorem 3.1] §3.2, Definition 3 and Theorem 3.1: The central claim that the zero sublevel set of the proposed Lipschitz RA value function exactly coincides with the reach-avoid set for arbitrary Lipschitz dynamics and bounded disturbances is not yet load-bearingly established. The discounted infinite-horizon cost typically produces a strictly positive value outside the set that only approaches the indicator function as the discount factor tends to zero; the manuscript must supply an explicit argument showing why the chosen running-cost design recovers the exact boundary for any fixed positive discount factor.
- [§4] §4, the viscosity-solution uniqueness argument: While contractivity of the Bellman operator is asserted, the proof that this operator maps the space of Lipschitz functions into itself and that the fixed point satisfies the HJ variational inequality with the exact level-set property needs to be checked against possible boundary discrepancies introduced by discounting. A concrete counter-example or a limiting argument should be added if the exactness does not follow directly from contractivity alone.
minor comments (2)
- Notation for the target set and constraint set is introduced without a dedicated table or diagram; adding one would improve readability when comparing the RA and SA formulations.
- The numerical section would benefit from an explicit statement of the chosen discount factor and a brief sensitivity study showing that the computed level sets remain stable under moderate changes in this parameter.
Simulated Author's Rebuttal
Thank you for the opportunity to respond to the referee's report. We are grateful for the positive assessment of the significance of our work and for the constructive major comments. Below we address each comment in turn, indicating how we will revise the manuscript to strengthen the presentation and proofs.
read point-by-point responses
-
Referee: [§3.2, Definition 3 and Theorem 3.1] §3.2, Definition 3 and Theorem 3.1: The central claim that the zero sublevel set of the proposed Lipschitz RA value function exactly coincides with the reach-avoid set for arbitrary Lipschitz dynamics and bounded disturbances is not yet load-bearingly established. The discounted infinite-horizon cost typically produces a strictly positive value outside the set that only approaches the indicator function as the discount factor tends to zero; the manuscript must supply an explicit argument showing why the chosen running-cost design recovers the exact boundary for any fixed positive discount factor.
Authors: We acknowledge that the current manuscript would benefit from a more explicit derivation of the exact level-set property. The running cost in our formulation is constructed as a positive definite function that vanishes exactly on the target set and is bounded below by a positive constant on the complement of the avoid set. Combined with the discounting, this ensures that the value function is strictly positive outside the reach-avoid set for any fixed discount factor, because any trajectory starting outside must incur a positive integrated cost before reaching the target. In the revised version, we will insert a dedicated lemma following Definition 3 that proves this property using the Lipschitz continuity of the dynamics and the boundedness of the disturbance set. This will make the argument load-bearing as requested. revision: yes
-
Referee: [§4] §4, the viscosity-solution uniqueness argument: While contractivity of the Bellman operator is asserted, the proof that this operator maps the space of Lipschitz functions into itself and that the fixed point satisfies the HJ variational inequality with the exact level-set property needs to be checked against possible boundary discrepancies introduced by discounting. A concrete counter-example or a limiting argument should be added if the exactness does not follow directly from contractivity alone.
Authors: We agree that the connection between contractivity, the viscosity solution property, and the exact level set requires additional clarification to rule out boundary effects from discounting. In the revision, we will augment the proof in §4 with a limiting argument as the discount factor is held fixed but the analysis considers the behavior near the boundary of the reach-avoid set. Specifically, we will show that the fixed point of the contractive operator coincides with the unique viscosity solution of the HJVI, and that this solution's zero sublevel set is invariant under the dynamics in the required sense. We will also note that a counter-example would contradict the contraction property in the complete metric space of Lipschitz functions equipped with the sup norm. revision: yes
Circularity Check
No significant circularity; core RA value function and operator properties derived independently
full rationale
The paper constructs a new Lipschitz continuous RA value function for the discounted infinite-horizon formulation and proves contractivity of the associated Bellman backup operator plus uniqueness as the viscosity solution to the Hamilton-Jacobi variational inequality. These steps are presented as direct consequences of the chosen running-cost design and discounting for general nonlinear systems. The SA extension integrates the RA results with a cited Robust Control Lyapunov-Value Function but does not make the central RA claims depend on self-referential definitions, fitted inputs renamed as predictions, or load-bearing self-citation chains. The derivation chain remains self-contained against the stated assumptions without reducing the zero-sublevel characterization to an input by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The problems are posed for general nonlinear continuous-time systems with disturbances in a zero-sum game setting.
invented entities (1)
-
Lipschitz continuous RA value function
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We define a new RA value function V^γ(x) := sup_λ inf_u inf_t max{ e^{-γ t} ℓ(ξ(t)), max_s e^{-γ s} c(ξ(s)) } … whose zero sublevel set exactly characterizes the RA set.
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We establish that the associated Bellman backup operator is contractive and that the RA value function is the unique viscosity solution of a Hamilton–Jacobi variational inequality.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
Value Functions for Temporal Logic: Optimal Policies and Safety Filters
Non-Markovian policies from decomposed temporal logic value functions are proven optimal for nested Until, Globally, and Globally-Until specifications and extend Q-function safety filters to complex tasks.
Reference graph
Works this paper leans on
-
[1]
Hamilton–jacobi formulation for reach–avoid differential games,
K. Margellos and J. Lygeros, “Hamilton–jacobi formulation for reach–avoid differential games,”IEEE Transactions on Automatic Con- trol, vol. 56, no. 8, pp. 1849–1861, 2011
work page 2011
-
[2]
A general hamilton-jacobi framework for non-linear state-constrained control problems,
A. Altarovici, O. Bokanowski, and H. Zidani, “A general hamilton-jacobi framework for non-linear state-constrained control problems,”ESAIM. Control, Optimisation and Calculus of Variations, vol. 19, no. 2, p. 337–357, Apr. 2013
work page 2013
-
[3]
Reach-avoid problems with time-varying dynamics, targets and constraints,
J. F. Fisac, M. Chen, C. J. Tomlin, and S. S. Sastry, “Reach-avoid problems with time-varying dynamics, targets and constraints,” in Proceedings of the 18th International Conference on Hybrid Systems: Computation and Control. New York, NY , USA: ACM, Apr. 2015. [Online]. Available: http://dx.doi.org/10.1145/2728606.2728612
-
[4]
Reach-avoid differential games with targets and obstacles depending on controls,
E. N. Barron, “Reach-avoid differential games with targets and obstacles depending on controls,”Dynamic Games and Applications, vol. 8, no. 4, pp. 696–712, 2018. [Online]. Available: https: //doi.org/10.1007/s13235-017-0235-5
-
[5]
A time-dependent hamilton-jacobi formulation of reachable sets for continuous dynamic games,
I. Mitchell, A. Bayen, and C. Tomlin, “A time-dependent hamilton-jacobi formulation of reachable sets for continuous dynamic games,”IEEE Transactions on Automatic Control, vol. 50, no. 7, pp. 947–957, 2005
work page 2005
-
[6]
Reachability- based safety guarantees using efficient initializations,
S. L. Herbert, S. Bansal, S. Ghosh, and C. J. Tomlin, “Reachability- based safety guarantees using efficient initializations,” in2019 IEEE 58th Conference on Decision and Control (CDC), 2019, pp. 4810–4816
work page 2019
-
[7]
Bridging hamilton-jacobi safety analysis and reinforcement learning,
J. F. Fisac, N. F. Lugovoy, V . Rubies-Royo, S. Ghosh, and C. J. Tomlin, “Bridging hamilton-jacobi safety analysis and reinforcement learning,” in2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 8550–8556
work page 2019
-
[8]
Safety and liveness guarantees through reach-avoid reinforcement learning,
K.-C. Hsu*, V . Rubies-Royo*, C. Tomlin, and J. Fisac, “Safety and liveness guarantees through reach-avoid reinforcement learning,” in Robotics: Science and Systems XVII. Robotics: Science and Systems Foundation, Jul. 2021. [Online]. Available: http://dx.doi.org/10.15607/ rss.2021.xvii.077
work page 2021
-
[9]
Sim-to-lab-to-real: Safe reinforcement learning with shielding and generalization guarantees,
K.-C. Hsu, A. Z. Ren, D. P. Nguyen, A. Majumdar, and J. F. Fisac, “Sim-to-lab-to-real: Safe reinforcement learning with shielding and generalization guarantees,”Artificial Intelligence, vol. 314, p. 103811, 2023. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S0004370222001515
work page 2023
-
[10]
Learning predictive safety filter via decomposition of robust invariant set,
Z. Li, C. Hu, W. Zhao, and C. Liu, “Learning predictive safety filter via decomposition of robust invariant set,” 2023. [Online]. Available: https://arxiv.org/abs/2311.06769
-
[11]
Isaacs: Iterative soft adversarial actor-critic for safety,
K.-C. Hsu, D. P. Nguyen, and J. F. Fisac, “Isaacs: Iterative soft adversarial actor-critic for safety,” inProceedings of the 5th Annual Learning for Dynamics and Control Conference, ser. Proceedings of Machine Learning Research, N. Matni, M. Morari, and G. J. Pappas, Eds., vol. 211. PMLR, 15–16 Jun 2023. [Online]. Available: https://proceedings.mlr.press/...
work page 2023
-
[12]
J. Wang, H. Hu, D. P. Nguyen, and J. F. Fisac, “Magics: Adversarial rl with minimax actors guided by implicit critic stackelberg for convergent neural synthesis of robot safety,” 2024. [Online]. Available: https://arxiv.org/abs/2409.13867
-
[13]
Gameplay filters: Robust zero-shot safety through adversarial imagination,
D. P. Nguyen*, K.-C. Hsu*, W. Yu, J. Tan, and J. F. Fisac, “Gameplay filters: Robust zero-shot safety through adversarial imagination,” in 8th Annual Conference on Robot Learning, 2024. [Online]. Available: https://openreview.net/forum?id=Ke5xrnBFAR
work page 2024
-
[14]
Certifiable reachability learning using a new lipschitz continuous value function,
J. Li, D. Lee, J. Lee, K. S. Dong, S. Sojoudi, and C. Tomlin, “Certifiable reachability learning using a new lipschitz continuous value function,”IEEE Robotics and Automation Letters, vol. 9, no. 2, pp. 1–8, 2024. [Online]. Available: https://arxiv.org/pdf/2408.07866
-
[15]
A minimum discounted reward hamilton–jacobi formulation for computing reachable sets,
A. K. Akametalu, S. Ghosh, J. F. Fisac, V . Rubies-Royo, and C. J. Tomlin, “A minimum discounted reward hamilton–jacobi formulation for computing reachable sets,”IEEE Transactions on Automatic Control, vol. 69, no. 2, pp. 1097–1103, 2024
work page 2024
-
[16]
Solving stabilize-avoid optimal control via epigraph form and deep reinforcement learning,
O. So and C. Fan, “Solving stabilize-avoid optimal control via epigraph form and deep reinforcement learning,” inRobotics: Science and Sys- tems, Daegu, Republic of Korea, July 2023, pp. 10–14
work page 2023
-
[17]
Solving reach-avoid-stay problems using deep deterministic policy gradients,
G. Chenevert, J. Li, A. Kannan, S. Bae, and D. Lee, “Solving reach-avoid-stay problems using deep deterministic policy gradients,” Oct. 2024. [Online]. Available: http://arxiv.org/abs/2410.02898
-
[18]
Stabilization with guaranteed safety using control lyapunov–barrier function,
M. Z. Romdlony and B. Jayawardhana, “Stabilization with guaranteed safety using control lyapunov–barrier function,”Automatica, vol. 66, pp. 39–47, 2016. [Online]. Available: https://www.sciencedirect.com/ science/article/pii/S0005109815005439
work page 2016
-
[19]
Y . Meng, Y . Li, M. Fitzsimmons, and J. Liu, “Smooth converse lyapunov-barrier theorems for asymptotic stability with safety con- straints and reach-avoid-stay specifications,”Automatica, vol. 144, p. 110478, 2022
work page 2022
-
[20]
Safe nonlinear control using robust neural lyapunov-barrier functions,
C. Dawson, Z. Qin, S. Gao, and C. Fan, “Safe nonlinear control using robust neural lyapunov-barrier functions,” inProceedings of the 5th Conference on Robot Learning, ser. Proceedings of Machine Learning Research, A. Faust, D. Hsu, and G. Neumann, Eds., vol
- [21]
-
[22]
Robust control lyapunov-value functions for nonlinear disturbed systems,
Z. Gong and S. Herbert, “Robust control lyapunov-value functions for nonlinear disturbed systems,” 2024. [Online]. Available: https: //arxiv.org/abs/2403.03455
-
[23]
Partial differential equations: Second edition,
L. C. Evans, “Partial differential equations: Second edition,” inPartial Differential Equations: Second Edition, ser. Graduate Studies in Mathe- matics. Providence, RI: American Mathematical Society, 2010, vol. 19
work page 2010
-
[24]
The bellman equation for minimizing the maximum cost,
E. Barron and H. Ishii, “The bellman equation for minimizing the maximum cost,”Nonlinear Analysis: Theory, Methods & Applications, vol. 13, no. 9, pp. 1067–1090, 1989. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/0362546X89900965
-
[25]
M. Bardi and I. Capuzzo-Dolcetta,Optimal control and viscosity so- lutions of Hamilton-Jacobi-bellman equations, 1st ed., ser. Modern Birkh¨auser Classics. Cambridge, MA: Birkh ¨auser, May 2009
work page 2009
-
[26]
J. J. Choi, D. Lee, B. Li, J. P. How, K. Sreenath, S. L. Herbert, and C. J. Tomlin, “A forward reachability perspective on robust control invariance and discount factors in reachability analysis,”arXiv preprint arXiv:2310.17180, 2023
-
[27]
A toolbox of hamilton-jacobi solvers for analysis of nondeterministic continuous and hybrid systems,
I. M. Mitchell and J. A. Templeton, “A toolbox of hamilton-jacobi solvers for analysis of nondeterministic continuous and hybrid systems,” inInt. Work. on Hybrid Sys.: Computation and Control. Springer, 2005
work page 2005
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.