Unifying Hamilton-Jacobi Reachability and Reinforcement Learning

Coen de Visser; Erik-jan van Kampen; Isabelle El-Hajj; Jasper van Beers; Prashant Solanki

arxiv: 2601.08050 · v2 · submitted 2026-01-12 · 📡 eess.SY · cs.SY

Unifying Hamilton-Jacobi Reachability and Reinforcement Learning

Prashant Solanki , Isabelle El-Hajj , Jasper van Beers , Erik-jan van Kampen , Coen de Visser This is my paper

Pith reviewed 2026-05-16 14:33 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords Hamilton-Jacobi reachabilityreinforcement learningvalue iterationviscosity solutionbackward reachable tubeHJB PDEsafety analysistravel cost

0 comments

The pith

A running cost in RL makes the value function the unique viscosity solution to the time-dependent HJB PDE whose negative sublevel set is the strict backward reachable tube.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how a carefully chosen running cost turns standard reinforcement learning value functions into exact solutions of the Hamilton-Jacobi-Bellman PDE used in reachability analysis. This equivalence means the learned value function directly encodes the strict backward reachable tube through its negative sublevel set. A reader would care because the result lets RL methods inherit the safety guarantees of classical reachability while remaining compatible with deep function approximation and value iteration. The proof proceeds by showing the travel-cost value function is the unique bounded viscosity solution and that small-step Bellman updates converge to it under forward reparameterization.

Core claim

The resultant travel-cost value function is the unique bounded viscosity solution of a time-dependent Hamilton-Jacobi Bellman (HJB) Partial Differential Equation (PDE) with zero terminal data, whose negative sublevel set equals the strict backward-reachable tube. Using a forward reparameterization and a contraction inducing Bellman update, fixed points of small-step RL value iteration converge to the viscosity solution of the forward discounted HJB.

What carries the argument

The proposed running cost formulation that makes the RL travel-cost value function exactly equal the reachability indicator function.

If this is right

The negative sublevel set of the value function equals the strict backward-reachable tube.
Fixed points of small-step RL value iteration converge to the viscosity solution of the forward discounted HJB.
The framework preserves reachability-based safety semantics while remaining compatible with deep RL implementations.
Learned value functions converge toward semi-Lagrangian HJB solutions with quantifiable approximation error across the state space.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This link could let model-free RL algorithms compute reachable sets in high-dimensional systems where grid-based HJB solvers become intractable.
Safety-critical RL policies might be trained by directly optimizing the proposed travel cost, inheriting reachability guarantees without separate verification.
The same cost construction might extend to stochastic reachability or differential games by modifying the underlying HJB equation accordingly.

Load-bearing premise

The running cost is chosen so the RL value function exactly matches the reachability indicator function, together with standard Lipschitz regularity on the dynamics and costs.

What would settle it

Numerical experiments in which the negative sublevel set of the learned value function deviates from the true strict backward reachable tube computed by an independent semi-Lagrangian HJB solver, or in which the value function fails to satisfy the HJB PDE in the viscosity sense.

Figures

Figures reproduced from arXiv: 2601.08050 by Coen de Visser, Erik-jan van Kampen, Isabelle El-Hajj, Jasper van Beers, Prashant Solanki.

**Figure 2.** Figure 2: Forward discounted HJB ↔ RL on X2.5 = [−2.5, 2.5]2 with ∆τ = 0.05, λ = 1.0 (γ = e −0.05). Visual agreement is strong across the ROI; quantitative errors are reported in equation (76). be extracted directly. To make the correspondence visible, we overlay the reach-cost zero-level contour on the travel-cost field and inspect the interior values (Fig. 1c), which all lie strictly below zero. 7.2 Stage II: For… view at source ↗

read the original abstract

We unify Hamilton-Jacobi (HJ) reachability and Reinforcement Learning (RL) through a proposed running cost formulation. We prove that the resultant travel-cost value function is the unique bounded viscosity solution of a time-dependent Hamilton-Jacobi Bellman (HJB) Partial Differential Equation (PDE) with zero terminal data, whose negative sublevel set equals the strict backward-reachable tube. Using a forward reparameterization and a contraction inducing Bellman update, we show that fixed points of small-step RL value iteration converge to the viscosity solution of the forward discounted HJB. Experiments on a classical benchmark validate this connection by demonstrating convergence of learned value functions toward semi-Lagrangian HJB solutions and by quantifying approximation error across the state space. These results empirically support the theoretical analysis, showing that the proposed framework preserves reachability-based safety semantics while remaining compatible with deep RL implementations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper links HJ reachability to RL value functions through a specific running cost that makes the travel-cost value function the unique viscosity solution to the time-dependent HJB PDE, with a convergence result for small-step value iteration.

read the letter

The main point is that they pick a running cost so the RL value function solves the time-dependent HJB PDE with zero terminal data and its negative sublevel set recovers the strict backward-reachable tube. They prove this is the unique bounded viscosity solution under standard Lipschitz conditions on the dynamics and costs, then use a forward reparameterization and contraction property of the Bellman update to show small-step value iteration converges to the forward discounted HJB solution. Experiments on a classical benchmark compare the learned values to semi-Lagrangian HJB solutions and measure approximation error across the state space. This equivalence is new and gives a direct route from reachability safety sets to learned policies. The proof structure is direct and avoids circularity or free parameters. The contraction argument is standard RL material applied cleanly here. The experiments provide concrete numbers on convergence and error, which is useful. A minor limitation is that the tests stay with one benchmark, so scaling to higher-dimensional or strongly nonlinear systems is not yet shown. The abstract claims compatibility with deep RL but the reported results focus on the theoretical link rather than large-scale implementations. This is for researchers in safe control and RL who want a formal bridge between the two areas. It deserves peer review because the core math is grounded and the empirical check is present, even if more testing would help.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes a running cost formulation to unify Hamilton-Jacobi reachability and reinforcement learning. It proves that the resulting travel-cost value function is the unique bounded viscosity solution of a time-dependent HJB PDE with zero terminal data, whose negative sublevel set equals the strict backward-reachable tube. Using forward reparameterization and a contraction-inducing Bellman update, it shows that fixed points of small-step RL value iteration converge to the viscosity solution of the forward discounted HJB. Experiments on a classical benchmark demonstrate convergence of learned value functions toward semi-Lagrangian HJB solutions and quantify approximation error across the state space.

Significance. If the central claims hold, the work is significant for providing a rigorous bridge between reachability analysis and RL, enabling RL methods to preserve reachability-based safety semantics. The direct mathematical proof establishing equivalence (without circularity or free parameters) and the contraction argument for convergence are strengths, as is the empirical validation with quantified errors. This could support safer deep RL implementations in control systems.

major comments (2)

The uniqueness proof for the bounded viscosity solution of the time-dependent HJB PDE with zero terminal data (central to the equivalence claim) invokes standard Lipschitz conditions on dynamics and costs. The manuscript should cite the specific theorem (e.g., from the Crandall-Lions theory or a direct reference) that guarantees uniqueness in this setting to fully substantiate that the negative sublevel set matches the strict backward-reachable tube.
Experiments section: the reported approximation errors and convergence to semi-Lagrangian solutions are used to support preservation of safety semantics. The error metric and its relation to the reachability tube should be defined more explicitly (e.g., via a specific equation or table) to confirm the link to the theoretical equivalence.

minor comments (2)

Abstract and experiments: the term 'semi-Lagrangian HJB solutions' appears without definition or reference; add a brief explanation or citation in the main text for accessibility.
Notation throughout: ensure consistent terminology between 'travel-cost value function' and 'reachability indicator function' to prevent minor reader confusion.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive comments. We address each major comment point by point below and have incorporated revisions to strengthen the manuscript.

read point-by-point responses

Referee: The uniqueness proof for the bounded viscosity solution of the time-dependent HJB PDE with zero terminal data (central to the equivalence claim) invokes standard Lipschitz conditions on dynamics and costs. The manuscript should cite the specific theorem (e.g., from the Crandall-Lions theory or a direct reference) that guarantees uniqueness in this setting to fully substantiate that the negative sublevel set matches the strict backward-reachable tube.

Authors: We agree that an explicit citation will strengthen the substantiation. In the revised manuscript, we now cite the relevant uniqueness result for bounded viscosity solutions of time-dependent HJB equations under standard Lipschitz assumptions on the dynamics and running cost (specifically, we reference Theorem 2.1 from Crandall, Lions, and Souganidis (1992) on viscosity solutions for Hamilton-Jacobi equations, adapted to the zero-terminal-data case). This citation is added directly to the uniqueness proof in Section 3, confirming that the negative sublevel set equals the strict backward-reachable tube without circularity. revision: yes
Referee: Experiments section: the reported approximation errors and convergence to semi-Lagrangian solutions are used to support preservation of safety semantics. The error metric and its relation to the reachability tube should be defined more explicitly (e.g., via a specific equation or table) to confirm the link to the theoretical equivalence.

Authors: We appreciate the suggestion for greater explicitness. In the revised Experiments section, we now define the error metric explicitly in a new Equation (12) as the supremum norm of the pointwise difference between the learned RL value function and the semi-Lagrangian HJB solution over a discretized state grid. We have also added Table 1, which reports both the global average error and the maximum error restricted to the negative sublevel set (i.e., inside the reachability tube). This directly ties the quantified approximation errors to the preservation of safety semantics as established by the theoretical equivalence. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central derivation is a direct mathematical proof that the travel-cost value function, obtained from an explicitly chosen running cost, is the unique bounded viscosity solution to the time-dependent HJB PDE with zero terminal data under standard Lipschitz assumptions on the dynamics and costs. This equivalence is established by construction of the cost and application of known viscosity solution theory, without reducing to any fitted parameter, self-referential definition, or load-bearing self-citation. The forward reparameterization and Bellman contraction are standard RL results invoked independently of the paper's own data or prior claims. Experiments provide empirical validation but do not form part of the theoretical chain. The derivation is therefore self-contained against external mathematical benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard domain assumptions from optimal control theory for viscosity solutions plus the paper-specific choice of running cost. No free parameters are fitted to data and no new entities are postulated.

axioms (2)

domain assumption System dynamics and cost functions are Lipschitz continuous and satisfy standard regularity conditions for existence and uniqueness of viscosity solutions to HJB PDEs
Invoked to guarantee that the travel-cost value function is the unique bounded viscosity solution.
ad hoc to paper The running cost is formulated so that the resulting value function coincides with the reachability indicator
This is the key design choice that produces the unification; it is stated as the proposed formulation rather than derived from prior results.

pith-pipeline@v0.9.0 · 5461 in / 1572 out tokens · 96855 ms · 2026-05-16T14:33:11.308095+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We prove that the resultant travel-cost value function is the unique bounded viscosity solution of a time-dependent Hamilton-Jacobi Bellman (HJB) Partial Differential Equation (PDE) with zero terminal data, whose negative sublevel set equals the strict backward-reachable tube.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Using a forward reparameterization and a contraction inducing Bellman update, we show that fixed points of small-step RL value iteration converge to the viscosity solution of the forward discounted HJB.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

Reachability-based safe learning with gaussian processes

Anayo K Akametalu, Jaime F Fisac, Jeremy H Gillula, Shahab Kaynama, Melanie N Zeilinger, and Claire J Tomlin. Reachability-based safe learning with gaussian processes. In 53rd IEEE conference on decision and control, pages 1424–

work page
[2]

A minimum discounted reward hamilton–jacobi formulation for computing reachable sets.IEEE Transactions on Automatic Control, 69(2):1097–1103, 2023

Anayo K Akametalu, Shromona Ghosh, Jaime F Fisac, Vicenc Rubies-Royo, and Claire J Tomlin. A minimum discounted reward hamilton–jacobi formulation for computing reachable sets.IEEE Transactions on Automatic Control, 69(2):1097–1103, 2023

work page 2023
[3]

Control barrier functions: Theory and applications

Aaron D Ames, Samuel Coogan, Magnus Egerstedt, Gennaro Notomista, Koushil Sreenath, and Paulo Tabuada. Control barrier functions: Theory and applications. In2019 18th European control conference (ECC), pages 3420–3431. Ieee, 2019

work page 2019
[4]

Bansal, M

S. Bansal, M. Chen, S. Herbert, and C. Tomlin. Hamilton- jacobi reachability: A brief overview and recent advances. Proceedings of the IEEE Conference on Decision and Control (CDC), 2017

work page 2017
[5]

Deepreach: A deep learning approach to high-dimensional reachability

Somil Bansal and Claire J Tomlin. Deepreach: A deep learning approach to high-dimensional reachability. In2021 IEEE International Conference on Robotics and Automation (ICRA), pages 1817–1824. IEEE, 2021

work page 2021
[6]

Springer, 1997

Martino Bardi, Italo Capuzzo Dolcetta, et al.Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations, volume 12. Springer, 1997

work page 1997
[7]

Convergence of approximation schemes for fully nonlinear second order equations.Asymptotic analysis, 4(3):271–283, 1991

Guy Barles and Panagiotis E Souganidis. Convergence of approximation schemes for fully nonlinear second order equations.Asymptotic analysis, 4(3):271–283, 1991

work page 1991
[8]

Decomposition of reachable sets and tubes for a class of nonlinear systems.IEEE Transactions on Automatic Control, 63(11):3675–3688, 2018

Mo Chen, Sylvia L Herbert, Mahesh S Vashishtha, Somil Bansal, and Claire J Tomlin. Decomposition of reachable sets and tubes for a class of nonlinear systems.IEEE Transactions on Automatic Control, 63(11):3675–3688, 2018

work page 2018
[9]

Hamilton- jacobi reachability in reinforcement learning: A survey.arXiv preprint arXiv:2310.06764, 2023

Xuchan Chen, Ugo Rosolia, and Claire Tomlin. Hamilton- jacobi reachability in reinforcement learning: A survey.arXiv preprint arXiv:2310.06764, 2023

work page arXiv 2023
[10]

Robust control barrier–value functions for safety-critical control

Jason J Choi, Donggun Lee, Koushil Sreenath, Claire J Tomlin, and Sylvia L Herbert. Robust control barrier–value functions for safety-critical control. In2021 60th IEEE Conference on Decision and Control (CDC), pages 6814–

work page
[11]

User’s guide to viscosity solutions of second order partial differential equations.Bulletin of the American mathematical society, 27(1):1–67, 1992

Michael G Crandall, Hitoshi Ishii, and Pierre-Louis Lions. User’s guide to viscosity solutions of second order partial differential equations.Bulletin of the American mathematical society, 27(1):1–67, 1992

work page 1992
[12]

Algorithms for overcoming the curse of dimensionality for certain hamilton– jacobi equations arising in control theory and elsewhere

J´ erˆ ome Darbon and Stanley Osher. Algorithms for overcoming the curse of dimensionality for certain hamilton– jacobi equations arising in control theory and elsewhere. Research in the Mathematical Sciences, 3(1):19, 2016

work page 2016
[13]

Differential games and representation formulas for solutions of hamilton- jacobi-isaacs equations.Indiana University mathematics journal, 33(5):773–797, 1984

Lawrence C Evans and Panagiotis E Souganidis. Differential games and representation formulas for solutions of hamilton- jacobi-isaacs equations.Indiana University mathematics journal, 33(5):773–797, 1984

work page 1984
[14]

SIAM, 2013

Maurizio Falcone and Roberto Ferretti.Semi-Lagrangian approximation schemes for linear and Hamilton—Jacobi equations. SIAM, 2013

work page 2013
[15]

Bridging hamilton- jacobi safety analysis and reinforcement learning

Jaime F Fisac, Neil F Lugovoy, Vicen¸ c Rubies-Royo, Shromona Ghosh, and Claire J Tomlin. Bridging hamilton- jacobi safety analysis and reinforcement learning. In 2019 International Conference on Robotics and Automation (ICRA), pages 8550–8556. IEEE, 2019

work page 2019
[16]

Iterative reachability estimation for safe reinforcement learning.Advances in Neural Information Processing Systems, 36:69764–69797, 2023

Milan Ganai, Zheng Gong, Chenning Yu, Sylvia Herbert, and Sicun Gao. Iterative reachability estimation for safe reinforcement learning.Advances in Neural Information Processing Systems, 36:69764–69797, 2023

work page 2023
[17]

Calculation of gauss quadrature rules.Mathematics of computation, 23(106):221– 230, 1969

Gene H Golub and John H Welsch. Calculation of gauss quadrature rules.Mathematics of computation, 23(106):221– 230, 1969

work page 1969
[18]

On reachability and minimum cost optimal control.Automatica, 40(6):917–927, 2004

John Lygeros. On reachability and minimum cost optimal control.Automatica, 40(6):917–927, 2004

work page 2004
[19]

I. M. Mitchell, A. M. Bayen, and C. J. Tomlin. A time- dependent hamilton-jacobi formulation of reachable sets for continuous dynamic games.IEEE Transactions on Automatic Control, 50(7):947–957, 2005

work page 2005
[20]

The flexible, extensible and efficient toolbox of level set methods.Journal of Scientific Computing, 35(2):300–329, 2008

Ian M Mitchell. The flexible, extensible and efficient toolbox of level set methods.Journal of Scientific Computing, 35(2):300–329, 2008

work page 2008
[21]

Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015

work page 2015
[22]

Hjb-rl: Initializing reinforcement learning with optimal control policies applied to autonomous drone racing

Keiko Nagami and Mac Schwager. Hjb-rl: Initializing reinforcement learning with optimal control policies applied to autonomous drone racing. InRobotics: science and systems, pages 1–9, 2021

work page 2021
[23]

Implicit neural representations with periodic activation functions.Advances in neural information processing systems, 33:7462–7473, 2020

Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. Implicit neural representations with periodic activation functions.Advances in neural information processing systems, 33:7462–7473, 2020

work page 2020
[24]

MIT press Cambridge, 1998

Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998

work page 1998
[25]

Reachability analysis using spectrum of koopman operator.IEEE Control Systems Letters, 7:595–600, 2022

Bhagyashree Umathe, Duvan Tellez-Castro, and Umesh Vaidya. Reachability analysis using spectrum of koopman operator.IEEE Control Systems Letters, 7:595–600, 2022

work page 2022
[26]

Distributional hamilton- jacobi-bellman equations for continuous-time reinforcement learning

Harley E Wiltzer, David Meger, and Marc G Bellemare. Distributional hamilton- jacobi-bellman equations for continuous-time reinforcement learning. InInternational Conference on Machine Learning, pages 23832–23856. PMLR, 2022

work page 2022
[27]

Backward reachability for polynomial systems on a finite horizon.IEEE Transactions on Automatic Control, 66(12):6025–6032, 2021

He Yin, Murat Arcak, Andrew Packard, and Peter Seiler. Backward reachability for polynomial systems on a finite horizon.IEEE Transactions on Automatic Control, 66(12):6025–6032, 2021. 17

work page 2021

[1] [1]

Reachability-based safe learning with gaussian processes

Anayo K Akametalu, Jaime F Fisac, Jeremy H Gillula, Shahab Kaynama, Melanie N Zeilinger, and Claire J Tomlin. Reachability-based safe learning with gaussian processes. In 53rd IEEE conference on decision and control, pages 1424–

work page

[2] [2]

A minimum discounted reward hamilton–jacobi formulation for computing reachable sets.IEEE Transactions on Automatic Control, 69(2):1097–1103, 2023

Anayo K Akametalu, Shromona Ghosh, Jaime F Fisac, Vicenc Rubies-Royo, and Claire J Tomlin. A minimum discounted reward hamilton–jacobi formulation for computing reachable sets.IEEE Transactions on Automatic Control, 69(2):1097–1103, 2023

work page 2023

[3] [3]

Control barrier functions: Theory and applications

Aaron D Ames, Samuel Coogan, Magnus Egerstedt, Gennaro Notomista, Koushil Sreenath, and Paulo Tabuada. Control barrier functions: Theory and applications. In2019 18th European control conference (ECC), pages 3420–3431. Ieee, 2019

work page 2019

[4] [4]

Bansal, M

S. Bansal, M. Chen, S. Herbert, and C. Tomlin. Hamilton- jacobi reachability: A brief overview and recent advances. Proceedings of the IEEE Conference on Decision and Control (CDC), 2017

work page 2017

[5] [5]

Deepreach: A deep learning approach to high-dimensional reachability

Somil Bansal and Claire J Tomlin. Deepreach: A deep learning approach to high-dimensional reachability. In2021 IEEE International Conference on Robotics and Automation (ICRA), pages 1817–1824. IEEE, 2021

work page 2021

[6] [6]

Springer, 1997

Martino Bardi, Italo Capuzzo Dolcetta, et al.Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations, volume 12. Springer, 1997

work page 1997

[7] [7]

Convergence of approximation schemes for fully nonlinear second order equations.Asymptotic analysis, 4(3):271–283, 1991

Guy Barles and Panagiotis E Souganidis. Convergence of approximation schemes for fully nonlinear second order equations.Asymptotic analysis, 4(3):271–283, 1991

work page 1991

[8] [8]

Decomposition of reachable sets and tubes for a class of nonlinear systems.IEEE Transactions on Automatic Control, 63(11):3675–3688, 2018

Mo Chen, Sylvia L Herbert, Mahesh S Vashishtha, Somil Bansal, and Claire J Tomlin. Decomposition of reachable sets and tubes for a class of nonlinear systems.IEEE Transactions on Automatic Control, 63(11):3675–3688, 2018

work page 2018

[9] [9]

Hamilton- jacobi reachability in reinforcement learning: A survey.arXiv preprint arXiv:2310.06764, 2023

Xuchan Chen, Ugo Rosolia, and Claire Tomlin. Hamilton- jacobi reachability in reinforcement learning: A survey.arXiv preprint arXiv:2310.06764, 2023

work page arXiv 2023

[10] [10]

Robust control barrier–value functions for safety-critical control

Jason J Choi, Donggun Lee, Koushil Sreenath, Claire J Tomlin, and Sylvia L Herbert. Robust control barrier–value functions for safety-critical control. In2021 60th IEEE Conference on Decision and Control (CDC), pages 6814–

work page

[11] [11]

User’s guide to viscosity solutions of second order partial differential equations.Bulletin of the American mathematical society, 27(1):1–67, 1992

Michael G Crandall, Hitoshi Ishii, and Pierre-Louis Lions. User’s guide to viscosity solutions of second order partial differential equations.Bulletin of the American mathematical society, 27(1):1–67, 1992

work page 1992

[12] [12]

Algorithms for overcoming the curse of dimensionality for certain hamilton– jacobi equations arising in control theory and elsewhere

J´ erˆ ome Darbon and Stanley Osher. Algorithms for overcoming the curse of dimensionality for certain hamilton– jacobi equations arising in control theory and elsewhere. Research in the Mathematical Sciences, 3(1):19, 2016

work page 2016

[13] [13]

Differential games and representation formulas for solutions of hamilton- jacobi-isaacs equations.Indiana University mathematics journal, 33(5):773–797, 1984

Lawrence C Evans and Panagiotis E Souganidis. Differential games and representation formulas for solutions of hamilton- jacobi-isaacs equations.Indiana University mathematics journal, 33(5):773–797, 1984

work page 1984

[14] [14]

SIAM, 2013

Maurizio Falcone and Roberto Ferretti.Semi-Lagrangian approximation schemes for linear and Hamilton—Jacobi equations. SIAM, 2013

work page 2013

[15] [15]

Bridging hamilton- jacobi safety analysis and reinforcement learning

Jaime F Fisac, Neil F Lugovoy, Vicen¸ c Rubies-Royo, Shromona Ghosh, and Claire J Tomlin. Bridging hamilton- jacobi safety analysis and reinforcement learning. In 2019 International Conference on Robotics and Automation (ICRA), pages 8550–8556. IEEE, 2019

work page 2019

[16] [16]

Iterative reachability estimation for safe reinforcement learning.Advances in Neural Information Processing Systems, 36:69764–69797, 2023

Milan Ganai, Zheng Gong, Chenning Yu, Sylvia Herbert, and Sicun Gao. Iterative reachability estimation for safe reinforcement learning.Advances in Neural Information Processing Systems, 36:69764–69797, 2023

work page 2023

[17] [17]

Calculation of gauss quadrature rules.Mathematics of computation, 23(106):221– 230, 1969

Gene H Golub and John H Welsch. Calculation of gauss quadrature rules.Mathematics of computation, 23(106):221– 230, 1969

work page 1969

[18] [18]

On reachability and minimum cost optimal control.Automatica, 40(6):917–927, 2004

John Lygeros. On reachability and minimum cost optimal control.Automatica, 40(6):917–927, 2004

work page 2004

[19] [19]

I. M. Mitchell, A. M. Bayen, and C. J. Tomlin. A time- dependent hamilton-jacobi formulation of reachable sets for continuous dynamic games.IEEE Transactions on Automatic Control, 50(7):947–957, 2005

work page 2005

[20] [20]

The flexible, extensible and efficient toolbox of level set methods.Journal of Scientific Computing, 35(2):300–329, 2008

Ian M Mitchell. The flexible, extensible and efficient toolbox of level set methods.Journal of Scientific Computing, 35(2):300–329, 2008

work page 2008

[21] [21]

Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015

work page 2015

[22] [22]

Hjb-rl: Initializing reinforcement learning with optimal control policies applied to autonomous drone racing

Keiko Nagami and Mac Schwager. Hjb-rl: Initializing reinforcement learning with optimal control policies applied to autonomous drone racing. InRobotics: science and systems, pages 1–9, 2021

work page 2021

[23] [23]

Implicit neural representations with periodic activation functions.Advances in neural information processing systems, 33:7462–7473, 2020

Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein. Implicit neural representations with periodic activation functions.Advances in neural information processing systems, 33:7462–7473, 2020

work page 2020

[24] [24]

MIT press Cambridge, 1998

Richard S Sutton, Andrew G Barto, et al.Reinforcement learning: An introduction, volume 1. MIT press Cambridge, 1998

work page 1998

[25] [25]

Reachability analysis using spectrum of koopman operator.IEEE Control Systems Letters, 7:595–600, 2022

Bhagyashree Umathe, Duvan Tellez-Castro, and Umesh Vaidya. Reachability analysis using spectrum of koopman operator.IEEE Control Systems Letters, 7:595–600, 2022

work page 2022

[26] [26]

Distributional hamilton- jacobi-bellman equations for continuous-time reinforcement learning

Harley E Wiltzer, David Meger, and Marc G Bellemare. Distributional hamilton- jacobi-bellman equations for continuous-time reinforcement learning. InInternational Conference on Machine Learning, pages 23832–23856. PMLR, 2022

work page 2022

[27] [27]

Backward reachability for polynomial systems on a finite horizon.IEEE Transactions on Automatic Control, 66(12):6025–6032, 2021

He Yin, Murat Arcak, Andrew Packard, and Peter Seiler. Backward reachability for polynomial systems on a finite horizon.IEEE Transactions on Automatic Control, 66(12):6025–6032, 2021. 17

work page 2021