HJ-SafeDMP: Hamilton-Jacobi Reachability-Guided Dynamic Movement Primitives for Provably Safe Robot Motion

Ravi Prakash; Siddhanth Ramesh

arxiv: 2606.28995 · v1 · pith:B4MSAAFFnew · submitted 2026-06-27 · 💻 cs.RO

HJ-SafeDMP: Hamilton-Jacobi Reachability-Guided Dynamic Movement Primitives for Provably Safe Robot Motion

Siddhanth Ramesh , Ravi Prakash This is my paper

Pith reviewed 2026-06-30 09:24 UTC · model grok-4.3

classification 💻 cs.RO

keywords dynamic movement primitivesHamilton-Jacobi reachabilitycontrol barrier functionsrobot safetymotion primitiveshuman-robot interactionconformal predictionsafe robot control

0 comments

The pith

HJ-SafeDMP learns a Control Barrier Value Function offline from demonstrations and uses it to modulate DMP outputs via a closed-form law, delivering formal safety without online optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that DMPs, which generate flexible and stable motions from single demonstrations, can be made provably safe by filtering their outputs with a value function derived from Hamilton-Jacobi reachability analysis. This matters for robots working near humans because existing safety methods either lack formal guarantees or require slow online optimization that breaks real-time performance. The framework learns the Control Barrier Value Function through model-free finite-difference recursion on demonstration data, then applies a closed-form control law that modulates the DMP velocity command. An expectile objective keeps learning in-distribution, and conformal prediction adds finite-sample coverage guarantees. Experiments on a 7-DOF manipulator show the approach matches DMP adaptability while running orders of magnitude faster than quadratic-program baselines.

Core claim

HJ-SafeDMP integrates DMPs with a learned Control Barrier Value Function obtained via offline model-free finite-difference HJ recursion; the function is deployed as a closed-form safety filter that modulates DMP outputs to render the safe set forward-invariant, yielding formal safety certificates without requiring online quadratic-program solves.

What carries the argument

The Control Barrier Value Function (CBVF), learned from demonstration data and applied through a closed-form modulation law that overrides unsafe DMP commands.

If this is right

Safety filtering occurs without runtime quadratic programs, enabling orders-of-magnitude faster execution than CBF-QP methods.
The closed-form law preserves the temporal flexibility and generalization properties of standard DMPs for human-robot tasks.
Expectile-based offline learning combined with conformal prediction supplies finite-sample probabilistic safety coverage.
The method avoids querying out-of-distribution actions during learning, improving reliability of the learned value function.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same offline learning pipeline could be applied to other primitive-based controllers that output velocity or acceleration commands.
Because the filter is closed-form, it may scale to higher-dimensional state spaces where grid-based HJ methods fail.
Calibration via conformal prediction could be repeated online with new demonstrations to tighten safety bounds over time.

Load-bearing premise

The value function learned offline from demonstrations actually renders the safe set forward-invariant on the physical robot under real disturbances.

What would settle it

A physical experiment in which the robot collides with an obstacle while the closed-form safety filter is active, or a timing measurement showing execution speed comparable to or slower than optimization-based baselines.

Figures

Figures reproduced from arXiv: 2606.28995 by Ravi Prakash, Siddhanth Ramesh.

read the original abstract

Robots deployed in safety-critical environments must execute motions that are simultaneously robust to disturbances and provably safe from collisions. Dynamic Movement Primitives (DMPs) offer inherent stability, temporal flexibility, and efficient trajectory generalization from single demonstrations, but they lack formal safety certificates. Conversely, Hamilton-Jacobi (HJ) Reachability analysis provides a principled framework for computing worst-case safety margins and forward-invariant safe sets, but classical grid-based methods suffer from the curse of dimensionality and are impractical for real-time control. This paper introduces HJ-SafeDMP, a framework that integrates DMPs with learned HJ Reachability-based safety value functions to achieve provably safe, robust, and computationally efficient robot motion. We learn a Control Barrier Value Function (CBVF) from offline demonstration data using a model-free, finite-difference HJ recursion and deploy it as a real-time safety filter via a closed-form control law that modulates the DMP output. Unlike optimization-based CBF-QP approaches, our method achieves safety filtering without online quadratic program solves, preserving the computational efficiency of DMPs. We further incorporate an expectile-based offline learning objective that avoids querying out-of-distribution actions, and a conformal prediction calibration step that provides finite-sample probabilistic safety coverage. Experimental evaluation on a 7-DOF robot manipulator demonstrates that HJ-SafeDMP achieves formal safety guarantees with orders-of-magnitude faster execution than optimization-based baselines, while maintaining the robustness and adaptability of DMPs for human-robot interaction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HJ-SafeDMP gives a concrete way to learn and deploy an HJ-based safety filter on DMPs without online optimization, but the formal safety claim rests on an unverified transfer from the learned value function to real closed-loop invariance.

read the letter

The paper's main move is to fit a Control Barrier Value Function offline from demonstration trajectories using a model-free finite-difference HJ recursion, then modulate DMP outputs with a closed-form safety law. Expectile regression and conformal calibration are added to handle out-of-distribution actions and give finite-sample coverage. On a 7-DOF arm the method runs orders of magnitude faster than CBF-QP baselines while preserving DMP generalization.

That combination is new enough to be worth looking at; the closed-form filter plus conformal step is not just a restatement of prior HJ-DMP work. The experiments appear to show usable safety margins in practice.

The soft spot is exactly where the stress-test points: the learned value function is only an approximation, and nothing in the abstract or setup supplies an error bound that would guarantee the safe set stays forward-invariant under the real robot dynamics. Model-free finite differences on demonstration data can miss the true HJ PDE solution, and the closed-form law inherits whatever approximation error exists. Conformal prediction gives probabilistic coverage, not deterministic invariance. If the paper has a rigorous transfer argument or verified bounds on the value-function error, that would change the picture; otherwise the "provably safe" language overreaches.

This is for groups already working on learned safety filters or DMP extensions. It is coherent on its own terms and has enough implementation detail to merit referee time, even if the central guarantee needs more scrutiny.

Referee Report

3 major / 1 minor

Summary. The paper introduces HJ-SafeDMP, which augments Dynamic Movement Primitives (DMPs) with a learned Control Barrier Value Function (CBVF) obtained via model-free finite-difference Hamilton-Jacobi (HJ) recursion on demonstration data. The CBVF is deployed as a closed-form safety filter that modulates DMP outputs in real time, avoiding online quadratic programs. An expectile learning objective and conformal prediction step are added for robustness and finite-sample probabilistic coverage. Experiments on a 7-DOF manipulator are reported to show formal safety guarantees together with orders-of-magnitude faster execution than optimization-based baselines while retaining DMP adaptability for human-robot interaction.

Significance. If the learned CBVF can be shown to render the safe set forward-invariant under the closed-form law on the real robot dynamics, the work would offer a computationally attractive route to certified safety that preserves the sample efficiency and generalization properties of DMPs. This would be a meaningful advance for real-time safe control in unstructured environments.

major comments (3)

[Abstract] Abstract: the central claim of 'provably safe' motion and 'formal safety guarantees' rests on the learned CBVF ensuring forward invariance when used as a closed-form filter, yet the abstract supplies no theorem, error bound, or derivation showing that the model-free finite-difference HJ recursion yields a value function whose sublevel sets remain invariant under the deployed control law.
[Abstract] Abstract: the expectile objective and conformal calibration are described as providing robustness and finite-sample probabilistic coverage, but these tools do not establish the deterministic forward-invariance property required by the safety claim; the gap between probabilistic guarantees and the asserted deterministic invariance is load-bearing for the paper's main contribution.
[Abstract] Abstract / experimental claims: the reported transfer of the offline-learned CBVF to a physical 7-DOF manipulator (including unmodeled dynamics) is asserted to preserve safety, but no verification procedure, violation metric, or comparison against a ground-truth HJ value function is referenced, leaving the weakest assumption untested.

minor comments (1)

[Abstract] The abstract would benefit from a brief statement of the specific robot task and disturbance magnitudes used in the 7-DOF experiments to allow readers to gauge the scope of the claimed robustness.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for greater clarity on the safety claims. We address each major comment below and will revise the abstract and related sections to better distinguish the theoretical assumptions, the role of probabilistic tools, and the experimental verification approach.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of 'provably safe' motion and 'formal safety guarantees' rests on the learned CBVF ensuring forward invariance when used as a closed-form filter, yet the abstract supplies no theorem, error bound, or derivation showing that the model-free finite-difference HJ recursion yields a value function whose sublevel sets remain invariant under the deployed control law.

Authors: We agree the abstract should reference the supporting derivation. Section 3.2 derives that the closed-form filter renders the sublevel sets forward-invariant when the CBVF equals the true value function; the finite-difference recursion introduces a discretization error whose effect on invariance is analyzed via an error bound. We will revise the abstract to cite this result and the associated approximation error. revision: yes
Referee: [Abstract] Abstract: the expectile objective and conformal calibration are described as providing robustness and finite-sample probabilistic coverage, but these tools do not establish the deterministic forward-invariance property required by the safety claim; the gap between probabilistic guarantees and the asserted deterministic invariance is load-bearing for the paper's main contribution.

Authors: The manuscript distinguishes these: deterministic invariance holds for an exact CBVF under the closed-form law, while expectile learning and conformal prediction supply finite-sample probabilistic coverage to account for approximation and model mismatch. We will revise the abstract to explicitly qualify the guarantees as probabilistic on the deployed system while retaining the deterministic property under the exact-value-function assumption. revision: yes
Referee: [Abstract] Abstract / experimental claims: the reported transfer of the offline-learned CBVF to a physical 7-DOF manipulator (including unmodeled dynamics) is asserted to preserve safety, but no verification procedure, violation metric, or comparison against a ground-truth HJ value function is referenced, leaving the weakest assumption untested.

Authors: We acknowledge that the experimental section would benefit from an explicit verification procedure. The current experiments monitor the learned CBVF during execution to confirm it remains non-negative, with zero observed violations across trials; however, a direct comparison to ground-truth HJ values is intractable in 7D. We will add a paragraph describing the monitoring metric and explicitly note the absence of ground-truth comparison as a limitation. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives its safety claims from standard Hamilton-Jacobi reachability theory applied to a Control Barrier Value Function learned offline via model-free finite-difference recursion on demonstration data, then deployed as a closed-form filter modulating DMP outputs. No equations or steps in the provided text reduce the formal guarantees to a tautology, a fitted parameter renamed as a prediction, or a load-bearing self-citation chain. The expectile objective and conformal calibration are presented as robustness enhancements, not as the source of the invariance property itself. The derivation remains self-contained against external HJ theory benchmarks rather than collapsing to its own inputs by construction.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Review is limited to the abstract; the central claim rests on the unverified assumption that the learned value function supplies formal safety and that the closed-form law preserves invariance.

free parameters (2)

expectile parameter
Used in the offline learning objective to avoid out-of-distribution actions.
conformal calibration parameter
Used to obtain finite-sample probabilistic safety coverage.

axioms (2)

domain assumption Model-free finite-difference HJ recursion on demonstration data produces a valid Control Barrier Value Function.
Invoked when learning the safety filter from offline data.
domain assumption Closed-form modulation of DMP output by the learned value function renders the safe set forward-invariant.
Central to the real-time safety filter claim.

pith-pipeline@v0.9.1-grok · 5800 in / 1487 out tokens · 50540 ms · 2026-06-30T09:24:47.746824+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

65 extracted references · 11 canonical work pages · 3 internal anchors

[1]

Safe learning in robotics: From learning-based control to safe reinforcement learning,

L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig, “Safe learning in robotics: From learning-based control to safe reinforcement learning,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 5, pp. 411–444, 2022

2022
[2]

Siciliano, L

B. Siciliano, L. Sciavicco, L. Villani, and G. Oriolo,Robotics: Mod- elling, Planning and Control. Springer Science & Business Media, 2009

2009
[3]

Dynamical movement primitives: learning attractor models for motor behaviors,

A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and S. Schaal, “Dynamical movement primitives: learning attractor models for motor behaviors,”Neural Computation, vol. 25, no. 2, pp. 328–373, 2013

2013
[4]

Dynamic movement primitives – a framework for motor control in humans and humanoid robotics,

S. Schaal, “Dynamic movement primitives – a framework for motor control in humans and humanoid robotics,”Adaptive Motion of Ani- mals and Machines, pp. 261–280, 2006

2006
[5]

Recent advances in robot learning from demonstration,

H. Ravichandar, A. S. Polydoros, S. Chernova, and A. Billard, “Recent advances in robot learning from demonstration,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 3, pp. 297–330, 2020

2020
[6]

Learning and generalization of motor skills by learning from demonstration,

P. Pastor, H. Hoffmann, T. Asfour, and S. Schaal, “Learning and generalization of motor skills by learning from demonstration,” in2009 IEEE International Conference on Robotics and Automation. IEEE, 2009, pp. 763–768

2009
[7]

Dynamic movement primitives in robotics: A tutorial survey,

M. Saveriano, F. J. Abu-Dakka, A. Kramberger, and L. Peternel, “Dynamic movement primitives in robotics: A tutorial survey,”The International Journal of Robotics Research, vol. 42, no. 14, pp. 1133– 1184, 2023

2023
[8]

Biologically- inspired dynamical systems for movement generation: Automatic real-time goal adaptation and obstacle avoidance,

H. Hoffmann, P. Pastor, D.-H. Park, and S. Schaal, “Biologically- inspired dynamical systems for movement generation: Automatic real-time goal adaptation and obstacle avoidance,” in2009 IEEE International Conference on Robotics and Automation. IEEE, 2009, pp. 2587–2592

2009
[9]

Real-time obstacle avoidance for manipulators and mobile robots,

O. Khatib, “Real-time obstacle avoidance for manipulators and mobile robots,”The International Journal of Robotics Research, vol. 5, no. 1, pp. 90–98, 1986

1986
[11]

Control barrier functions: Theory and applications,

A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control barrier functions: Theory and applications,” 2019 18th European control conference (ECC), pp. 3420–3431, 2019

2019
[12]

Constructive safety using control barrier functions,

P. Wieland and F. Allg ¨ower, “Constructive safety using control barrier functions,” inProceedings of the 7th IFAC Symposium on Nonlinear Control Systems, 2007, pp. 462–467

2007
[13]

Learning for safety-critical control with control barrier functions,

A. J. Taylor, A. Singletary, Y . Yue, and A. D. Ames, “Learning for safety-critical control with control barrier functions,” inLearning for Dynamics and Control. PMLR, 2020, pp. 708–717

2020
[14]

Safe control with learned certificates: A survey of neural Lyapunov, barrier, and contraction methods for robotics and control,

C. Dawson, S. Gao, and C. Fan, “Safe control with learned certificates: A survey of neural Lyapunov, barrier, and contraction methods for robotics and control,”IEEE Transactions on Robotics, 2023

2023
[15]

Hamilton-jacobi reachability: A brief overview and recent advances,

S. Bansal, M. Chen, S. Herbert, and C. J. Tomlin, “Hamilton-jacobi reachability: A brief overview and recent advances,” in2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE, 2017, pp. 2242–2253

2017
[16]

Bridging hamilton-jacobi safety analysis and reinforcement learning,

J. F. Fisac, N. F. Lugovoy, V . Rubies-Royo, S. Ghosh, and C. J. Tomlin, “Bridging hamilton-jacobi safety analysis and reinforcement learning,” in2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 8550–8556

2019
[17]

Fastrack: A modular framework for fast and guaranteed safe motion planning,

S. L. Herbert, M. Chen, S. Han, S. Bansal, J. F. Fisac, and C. J. Tomlin, “Fastrack: A modular framework for fast and guaranteed safe motion planning,” in2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE, 2017, pp. 1517–1522

2017
[18]

A general safety framework for learning-based control in uncertain robotic systems,

J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula, and C. J. Tomlin, “A general safety framework for learning-based control in uncertain robotic systems,”IEEE Transactions on Automatic Control, vol. 64, no. 7, pp. 2737–2752, 2018

2018
[19]

Decomposition of reachable sets and tubes for a class of nonlinear systems,

M. Chen, S. L. Herbert, M. S. Vashishtha, S. Bansal, and C. J. Tomlin, “Decomposition of reachable sets and tubes for a class of nonlinear systems,”IEEE Transactions on Automatic Control, vol. 63, no. 11, pp. 3675–3688, 2018

2018
[20]

Robust control barrier–value functions for safety-critical control,

J. J. Choi, D. Lee, K. Sreenath, C. J. Tomlin, and S. L. Herbert, “Robust control barrier–value functions for safety-critical control,” in 2021 60th IEEE Conference on Decision and Control (CDC). IEEE, 2021, pp. 6814–6821

2021
[21]

V-OCBF: Learning safety filters from offline data via value-guided offline control barrier functions,

M. Tayal, M. Tayal, A. Singh, S. Kolathaya, and R. Prakash, “V-OCBF: Learning safety filters from offline data via value-guided offline control barrier functions,”Transactions on Machine Learning Research, 2026. [Online]. Available: https://openreview.net/forum?id= PGO9mpIyyb

2026
[22]

Offline reinforcement learning with implicit q-learning,

I. Kostrikov, A. Nair, and S. Levine, “Offline reinforcement learning with implicit q-learning,” inInternational Conference on Learning Representations, 2022

2022
[23]

Safe flow q-learning: Offline safe reinforcement learning with reachability-based flow policies,

M. Tayal, M. Tayal, and R. Prakash, “Safe flow q-learning: Offline safe reinforcement learning with reachability-based flow policies,”arXiv preprint arXiv:2603.15136, 2026

work page arXiv 2026
[24]

Formal verification and control with conformal prediction: Practical safety guarantees for autonomous systems,

L. Lindemann, Y . Zhao, X. Yu, G. J. Pappas, and J. V . Deshmukh, “Formal verification and control with conformal prediction: Practical safety guarantees for autonomous systems,”IEEE Control Systems, vol. 45, no. 6, pp. 72–122, 2025

2025
[25]

A toolbox of level set methods,

I. M. Mitchell, “A toolbox of level set methods,” inA Toolbox of Level Set Methods, 2005. [Online]. Available: https://api.semanticscholar. org/CorpusID:59892255

2005
[26]

Learning from demonstration with model-based gaussian process,

N. Jaquier, D. Ginsbourger, and S. Calinon, “Learning from demonstration with model-based gaussian process,”arXiv preprint arXiv:1910.05005, 2019

work page arXiv 1910
[27]

A probabilistic approach to robot trajectory generation,

A. Paraschos, G. Neumann, and J. Peters, “A probabilistic approach to robot trajectory generation,” in2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids). IEEE, 2013, pp. 477–483

2013
[28]

Kernelized Movement Primitives

Y . Huang, L. D. Rozo, J. Silv ´erio, and D. G. Caldwell, “Kernelized movement primitives,”arXiv preprint arXiv:1708.08638, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[29]

Learning stable nonlinear dynamical systems with gaussian mixture models,

S. M. Khansari-Zadeh and A. Billard, “Learning stable nonlinear dynamical systems with gaussian mixture models,”IEEE Transactions on Robotics, vol. 27, no. 5, pp. 943–957, 2011

2011
[30]

Learning complex motion plans using neural odes with safety and stability guarantees,

F. Nawaz, T. Li, N. Matni, and N. Figueroa, “Learning complex motion plans using neural odes with safety and stability guarantees,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 17 216–17 222

2024
[31]

Planning with diffusion for flexible behavior synthesis,

M. Janner, Y . Du, J. Tenenbaum, and S. Levine, “Planning with diffusion for flexible behavior synthesis,” inInternational Conference on Machine Learning. PMLR, 2022, pp. 9902–9915

2022
[32]

Control barrier function based quadratic programs with application to adaptive cruise control,

A. D. Ames, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs with application to adaptive cruise control,” in53rd IEEE Conference on Decision and Control. IEEE, 2014, pp. 6271–6278

2014
[33]

Control barrier function based quadratic programs with application to bipedal robotic walking,

S.-C. Hsu, X. Xu, and A. D. Ames, “Control barrier function based quadratic programs with application to bipedal robotic walking,” in 2015 American Control Conference (ACC). IEEE, 2015, pp. 4542– 4548

2015
[34]

Safety-critical control for dy- namical bipedal walking with precise footstep placement,

Q. Nguyen and K. Sreenath, “Safety-critical control for dy- namical bipedal walking with precise footstep placement,”IFAC- PapersOnLine, vol. 48, no. 27, pp. 147–154, 2015

2015
[35]

Onboard safety guarantees for racing drones: High- speed geofencing with control barrier functions,

A. Singletary, K. Klingebiel, J. Bourne, A. Browning, P. Tokumaru, and A. Ames, “Onboard safety guarantees for racing drones: High- speed geofencing with control barrier functions,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2897–2904, 2022

2022
[36]

Control barrier functions in dynamic uavs for kinematic obstacle avoidance: A col- lision cone approach,

M. Tayal, R. Singh, J. Keshavan, and S. Kolathaya, “Control barrier functions in dynamic uavs for kinematic obstacle avoidance: A col- lision cone approach,” in2024 American Control Conference (ACC). IEEE, 2024, pp. 3722–3727

2024
[37]

End-to-end imitation learning with safety guarantees using control barrier functions,

R. K. Cosner, Y . Yue, and A. D. Ames, “End-to-end imitation learning with safety guarantees using control barrier functions,” in2022 IEEE 61st Conference on Decision and Control (CDC). IEEE, 2022, pp. 5316–5322

2022
[38]

Safe-gil: Safety guided imitation learning for robotic systems,

Y . U. Ciftci, D. Chiu, Z. Feng, G. S. Sukhatme, and S. Bansal, “Safe-gil: Safety guided imitation learning for robotic systems,”arXiv preprint arXiv:2404.05249, 2024

work page arXiv 2024
[39]

Safe nonlinear control using robust neural Lyapunov-barrier functions,

C. Dawson, Z. Qin, S. Gao, and C. Fan, “Safe nonlinear control using robust neural Lyapunov-barrier functions,” inConference on Robot Learning. PMLR, 2022, pp. 1724–1735

2022
[40]

Learning control barrier functions from expert demonstrations,

A. Robey, H. Hu, L. Lindemann, H. Zhang, D. V . Dimarogonas, S. Tu, and N. Matni, “Learning control barrier functions from expert demonstrations,” in2020 59th IEEE Conference on Decision and Control (CDC), 2020, pp. 3717–3724

2020
[41]

A physics- informed machine learning framework for safe and optimal control of autonomous systems,

M. Tayal, A. Singh, S. Kolathaya, and S. Bansal, “A physics- informed machine learning framework for safe and optimal control of autonomous systems,” inForty-second International Conference on Machine Learning, 2025. [Online]. Available: https://openreview.net/forum?id=SrfwiloGQF

2025
[42]

Learning a formally verified control barrier function in stochastic environment,

M. Tayal, H. Zhang, P. Jagtap, A. Clark, and S. Kolathaya, “Learning a formally verified control barrier function in stochastic environment,” in2024 IEEE 63rd Conference on Decision and Control (CDC), 2024, pp. 4098–4104

2024
[43]

Exact verification of relu neural control barrier functions,

H. Zhang, J. Wu, Y . V orobeychik, and A. Clark, “Exact verification of relu neural control barrier functions,” inAdvances in Neural Information Processing Systems, A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, Eds., vol. 36. Curran Associates, Inc., 2023, pp. 5685–5705

2023
[44]

Fossil: A software tool for the formal synthesis of Lyapunov functions and barrier certificates using neural networks,

A. Abate, D. Ahmed, A. Edwards, M. Giacobbe, and A. Peruffo, “Fossil: A software tool for the formal synthesis of Lyapunov functions and barrier certificates using neural networks,” inProceedings of the 24th International Conference on Hybrid Systems: Computation and Control, 2021, pp. 1–11

2021
[45]

Learning neural control barrier functions from offline data with conservatism,

I. Tabbara and H. Sibai, “Learning neural control barrier functions from offline data with conservatism,” 2025. [Online]. Available: https://arxiv.org/abs/2505.00908

work page arXiv 2025
[46]

In-distribution barrier functions: Self-supervised policy filters that avoid out-of-distribution states,

F. Casta ˜neda, H. Nishimura, R. T. McAllister, K. Sreenath, and A. Gaidon, “In-distribution barrier functions: Self-supervised policy filters that avoid out-of-distribution states,” inProceedings of The 5th Annual Learning for Dynamics and Control Conference, ser. Proceedings of Machine Learning Research, N. Matni, M. Morari, and G. J. Pappas, Eds., vol....

2023
[47]

Avoidance of concave obsta- cles through rotation of nonlinear dynamics,

L. Huber, J.-J. Slotine, and A. Billard, “Avoidance of concave obsta- cles through rotation of nonlinear dynamics,”IEEE Transactions on Robotics, vol. 40, pp. 1983–2002, 2023

1983
[48]

Spatiotemporal tubes for temporal reach-avoid-stay tasks in unknown systems,

R. Das, A. Basu, and P. Jagtap, “Spatiotemporal tubes for temporal reach-avoid-stay tasks in unknown systems,”IEEE Transactions on Automatic Control, pp. 1–8, 2025

2025
[49]

Constrained model predictive control: Stability and optimality,

D. Mayne, J. Rawlings, C. Rao, and P. Scokaert, “Constrained model predictive control: Stability and optimality,”Automatica, vol. 36, no. 6, pp. 789–814, 2000. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S0005109899002149

2000
[50]

A general Hamilton-Jacobi framework for non-linear state-constrained control problems,

A. Altarovici, O. Bokanowski, and H. Zidani, “A general Hamilton-Jacobi framework for non-linear state-constrained control problems,”ESAIM: Control, Optimisation and Calculus of Variations, vol. 19, no. 2, pp. 337–357, 2013. [Online]. Available: https: //www.numdam.org/articles/10.1051/cocv/2012011/

work page doi:10.1051/cocv/2012011/ 2013
[51]

Cooptimizing safety and performance with a control-constrained formulation,

H. Wang, A. Dhande, and S. Bansal, “Cooptimizing safety and performance with a control-constrained formulation,”IEEE Control Systems Letters, vol. 8, pp. 2739–2744, 2024

2024
[52]

Epigraph-guided flow matching for safe and performant offline reinforcement learning,

M. Tayal and M. Tayal, “Epigraph-guided flow matching for safe and performant offline reinforcement learning,”arXiv preprint arXiv:2602.08054, 2026

work page arXiv 2026
[53]

A tutorial on conformal prediction

G. Shafer and V . V ovk, “A tutorial on conformal prediction.”Journal of Machine Learning Research, vol. 9, no. 3, 2008

2008
[54]

Cp-ncbf: A con- formal prediction-based approach to synthesize verified neural control barrier functions,

M. Tayal, A. Singh, P. Jagtap, and S. Kolathaya, “Cp-ncbf: A con- formal prediction-based approach to synthesize verified neural control barrier functions,”arXiv preprint arXiv:2503.17395, 2025

work page arXiv 2025
[55]

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,”arXiv preprint arXiv:2005.01643, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2005
[56]

Conservative q-learning for offline reinforcement learning,

A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative q-learning for offline reinforcement learning,” inAdvances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 1179–1191. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 202...

2020
[57]

Datasets and benchmarks for offline safe reinforcement learning,

Z. Liu, Z. Guo, H. Lin, Y . Yao, J. Zhu, Z. Cen, H. Hu, W. Yu, T. Zhang, J. Tan, and D. Zhao, “Datasets and benchmarks for offline safe reinforcement learning,”Journal of Data-centric Machine Learning Research, 2024

2024
[58]

Safety gymnasium: A unified safe reinforcement learning benchmark,

J. Ji, B. Zhang, J. Zhou, X. Pan, W. Huang, R. Sun, Y . Geng, Y . Zhong, J. Dai, and Y . Yang, “Safety gymnasium: A unified safe reinforcement learning benchmark,” inThirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. [Online]. Available: https://openreview.net/forum?id= WZmlxIuIGR

2023
[59]

Control barrier function based quadratic programs for safety critical systems,

A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,”IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 3861–3876, 2017

2017
[60]

Offline reinforcement learning with implicit q-learning,

I. Kostrikov, A. Nair, and S. Levine, “Offline reinforcement learning with implicit q-learning,”International Conference on Learning Rep- resentations, 2022

2022
[61]

A gentle introduction to conformal prediction and distribution-free uncertainty quantification,

A. N. Angelopoulos and S. Bates, “A gentle introduction to conformal prediction and distribution-free uncertainty quantification,”
[62]

A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

[Online]. Available: https://arxiv.org/abs/2107.07511

work page internal anchor Pith review Pith/arXiv arXiv
[63]

Safedmps: Integrating formal safety with dmps for adaptive hri,

P. Tiwari, S. Nath, and R. Prakash, “Safedmps: Integrating formal safety with dmps for adaptive hri,”arXiv preprint arXiv:2603.29708, 2026

work page arXiv 2026
[64]

MADER: Trajectory planner in mul- tiagent and dynamic environments,

J. Tordesillas and J. P. How, “MADER: Trajectory planner in mul- tiagent and dynamic environments,”IEEE Transactions on Robotics, vol. 38, no. 1, pp. 463–476, 2021

2021
[65]

A survey on model-based reinforcement learning,

F.-M. Luo, T. Xu, H. Lai, X.-H. Chen, W. Zhang, and Y . Yu, “A survey on model-based reinforcement learning,”Science China Information Sciences, vol. 67, no. 2, p. 121101, 2024

2024
[66]

Constrained policy optimization,

J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” inProceedings of the 34th International Conference on Machine Learning - Volume 70, ser. ICML’17. JMLR.org, 2017, p. 22–31

2017

[1] [1]

Safe learning in robotics: From learning-based control to safe reinforcement learning,

L. Brunke, M. Greeff, A. W. Hall, Z. Yuan, S. Zhou, J. Panerati, and A. P. Schoellig, “Safe learning in robotics: From learning-based control to safe reinforcement learning,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 5, pp. 411–444, 2022

2022

[2] [2]

Siciliano, L

B. Siciliano, L. Sciavicco, L. Villani, and G. Oriolo,Robotics: Mod- elling, Planning and Control. Springer Science & Business Media, 2009

2009

[3] [3]

Dynamical movement primitives: learning attractor models for motor behaviors,

A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and S. Schaal, “Dynamical movement primitives: learning attractor models for motor behaviors,”Neural Computation, vol. 25, no. 2, pp. 328–373, 2013

2013

[4] [4]

Dynamic movement primitives – a framework for motor control in humans and humanoid robotics,

S. Schaal, “Dynamic movement primitives – a framework for motor control in humans and humanoid robotics,”Adaptive Motion of Ani- mals and Machines, pp. 261–280, 2006

2006

[5] [5]

Recent advances in robot learning from demonstration,

H. Ravichandar, A. S. Polydoros, S. Chernova, and A. Billard, “Recent advances in robot learning from demonstration,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 3, pp. 297–330, 2020

2020

[6] [6]

Learning and generalization of motor skills by learning from demonstration,

P. Pastor, H. Hoffmann, T. Asfour, and S. Schaal, “Learning and generalization of motor skills by learning from demonstration,” in2009 IEEE International Conference on Robotics and Automation. IEEE, 2009, pp. 763–768

2009

[7] [7]

Dynamic movement primitives in robotics: A tutorial survey,

M. Saveriano, F. J. Abu-Dakka, A. Kramberger, and L. Peternel, “Dynamic movement primitives in robotics: A tutorial survey,”The International Journal of Robotics Research, vol. 42, no. 14, pp. 1133– 1184, 2023

2023

[8] [8]

Biologically- inspired dynamical systems for movement generation: Automatic real-time goal adaptation and obstacle avoidance,

H. Hoffmann, P. Pastor, D.-H. Park, and S. Schaal, “Biologically- inspired dynamical systems for movement generation: Automatic real-time goal adaptation and obstacle avoidance,” in2009 IEEE International Conference on Robotics and Automation. IEEE, 2009, pp. 2587–2592

2009

[9] [9]

Real-time obstacle avoidance for manipulators and mobile robots,

O. Khatib, “Real-time obstacle avoidance for manipulators and mobile robots,”The International Journal of Robotics Research, vol. 5, no. 1, pp. 90–98, 1986

1986

[10] [11]

Control barrier functions: Theory and applications,

A. D. Ames, S. Coogan, M. Egerstedt, G. Notomista, K. Sreenath, and P. Tabuada, “Control barrier functions: Theory and applications,” 2019 18th European control conference (ECC), pp. 3420–3431, 2019

2019

[11] [12]

Constructive safety using control barrier functions,

P. Wieland and F. Allg ¨ower, “Constructive safety using control barrier functions,” inProceedings of the 7th IFAC Symposium on Nonlinear Control Systems, 2007, pp. 462–467

2007

[12] [13]

Learning for safety-critical control with control barrier functions,

A. J. Taylor, A. Singletary, Y . Yue, and A. D. Ames, “Learning for safety-critical control with control barrier functions,” inLearning for Dynamics and Control. PMLR, 2020, pp. 708–717

2020

[13] [14]

Safe control with learned certificates: A survey of neural Lyapunov, barrier, and contraction methods for robotics and control,

C. Dawson, S. Gao, and C. Fan, “Safe control with learned certificates: A survey of neural Lyapunov, barrier, and contraction methods for robotics and control,”IEEE Transactions on Robotics, 2023

2023

[14] [15]

Hamilton-jacobi reachability: A brief overview and recent advances,

S. Bansal, M. Chen, S. Herbert, and C. J. Tomlin, “Hamilton-jacobi reachability: A brief overview and recent advances,” in2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE, 2017, pp. 2242–2253

2017

[15] [16]

Bridging hamilton-jacobi safety analysis and reinforcement learning,

J. F. Fisac, N. F. Lugovoy, V . Rubies-Royo, S. Ghosh, and C. J. Tomlin, “Bridging hamilton-jacobi safety analysis and reinforcement learning,” in2019 International Conference on Robotics and Automation (ICRA), 2019, pp. 8550–8556

2019

[16] [17]

Fastrack: A modular framework for fast and guaranteed safe motion planning,

S. L. Herbert, M. Chen, S. Han, S. Bansal, J. F. Fisac, and C. J. Tomlin, “Fastrack: A modular framework for fast and guaranteed safe motion planning,” in2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE, 2017, pp. 1517–1522

2017

[17] [18]

A general safety framework for learning-based control in uncertain robotic systems,

J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula, and C. J. Tomlin, “A general safety framework for learning-based control in uncertain robotic systems,”IEEE Transactions on Automatic Control, vol. 64, no. 7, pp. 2737–2752, 2018

2018

[18] [19]

Decomposition of reachable sets and tubes for a class of nonlinear systems,

M. Chen, S. L. Herbert, M. S. Vashishtha, S. Bansal, and C. J. Tomlin, “Decomposition of reachable sets and tubes for a class of nonlinear systems,”IEEE Transactions on Automatic Control, vol. 63, no. 11, pp. 3675–3688, 2018

2018

[19] [20]

Robust control barrier–value functions for safety-critical control,

J. J. Choi, D. Lee, K. Sreenath, C. J. Tomlin, and S. L. Herbert, “Robust control barrier–value functions for safety-critical control,” in 2021 60th IEEE Conference on Decision and Control (CDC). IEEE, 2021, pp. 6814–6821

2021

[20] [21]

V-OCBF: Learning safety filters from offline data via value-guided offline control barrier functions,

M. Tayal, M. Tayal, A. Singh, S. Kolathaya, and R. Prakash, “V-OCBF: Learning safety filters from offline data via value-guided offline control barrier functions,”Transactions on Machine Learning Research, 2026. [Online]. Available: https://openreview.net/forum?id= PGO9mpIyyb

2026

[21] [22]

Offline reinforcement learning with implicit q-learning,

I. Kostrikov, A. Nair, and S. Levine, “Offline reinforcement learning with implicit q-learning,” inInternational Conference on Learning Representations, 2022

2022

[22] [23]

Safe flow q-learning: Offline safe reinforcement learning with reachability-based flow policies,

M. Tayal, M. Tayal, and R. Prakash, “Safe flow q-learning: Offline safe reinforcement learning with reachability-based flow policies,”arXiv preprint arXiv:2603.15136, 2026

work page arXiv 2026

[23] [24]

Formal verification and control with conformal prediction: Practical safety guarantees for autonomous systems,

L. Lindemann, Y . Zhao, X. Yu, G. J. Pappas, and J. V . Deshmukh, “Formal verification and control with conformal prediction: Practical safety guarantees for autonomous systems,”IEEE Control Systems, vol. 45, no. 6, pp. 72–122, 2025

2025

[24] [25]

A toolbox of level set methods,

I. M. Mitchell, “A toolbox of level set methods,” inA Toolbox of Level Set Methods, 2005. [Online]. Available: https://api.semanticscholar. org/CorpusID:59892255

2005

[25] [26]

Learning from demonstration with model-based gaussian process,

N. Jaquier, D. Ginsbourger, and S. Calinon, “Learning from demonstration with model-based gaussian process,”arXiv preprint arXiv:1910.05005, 2019

work page arXiv 1910

[26] [27]

A probabilistic approach to robot trajectory generation,

A. Paraschos, G. Neumann, and J. Peters, “A probabilistic approach to robot trajectory generation,” in2013 13th IEEE-RAS International Conference on Humanoid Robots (Humanoids). IEEE, 2013, pp. 477–483

2013

[27] [28]

Kernelized Movement Primitives

Y . Huang, L. D. Rozo, J. Silv ´erio, and D. G. Caldwell, “Kernelized movement primitives,”arXiv preprint arXiv:1708.08638, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[28] [29]

Learning stable nonlinear dynamical systems with gaussian mixture models,

S. M. Khansari-Zadeh and A. Billard, “Learning stable nonlinear dynamical systems with gaussian mixture models,”IEEE Transactions on Robotics, vol. 27, no. 5, pp. 943–957, 2011

2011

[29] [30]

Learning complex motion plans using neural odes with safety and stability guarantees,

F. Nawaz, T. Li, N. Matni, and N. Figueroa, “Learning complex motion plans using neural odes with safety and stability guarantees,” in2024 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2024, pp. 17 216–17 222

2024

[30] [31]

Planning with diffusion for flexible behavior synthesis,

M. Janner, Y . Du, J. Tenenbaum, and S. Levine, “Planning with diffusion for flexible behavior synthesis,” inInternational Conference on Machine Learning. PMLR, 2022, pp. 9902–9915

2022

[31] [32]

Control barrier function based quadratic programs with application to adaptive cruise control,

A. D. Ames, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs with application to adaptive cruise control,” in53rd IEEE Conference on Decision and Control. IEEE, 2014, pp. 6271–6278

2014

[32] [33]

Control barrier function based quadratic programs with application to bipedal robotic walking,

S.-C. Hsu, X. Xu, and A. D. Ames, “Control barrier function based quadratic programs with application to bipedal robotic walking,” in 2015 American Control Conference (ACC). IEEE, 2015, pp. 4542– 4548

2015

[33] [34]

Safety-critical control for dy- namical bipedal walking with precise footstep placement,

Q. Nguyen and K. Sreenath, “Safety-critical control for dy- namical bipedal walking with precise footstep placement,”IFAC- PapersOnLine, vol. 48, no. 27, pp. 147–154, 2015

2015

[34] [35]

Onboard safety guarantees for racing drones: High- speed geofencing with control barrier functions,

A. Singletary, K. Klingebiel, J. Bourne, A. Browning, P. Tokumaru, and A. Ames, “Onboard safety guarantees for racing drones: High- speed geofencing with control barrier functions,”IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2897–2904, 2022

2022

[35] [36]

Control barrier functions in dynamic uavs for kinematic obstacle avoidance: A col- lision cone approach,

M. Tayal, R. Singh, J. Keshavan, and S. Kolathaya, “Control barrier functions in dynamic uavs for kinematic obstacle avoidance: A col- lision cone approach,” in2024 American Control Conference (ACC). IEEE, 2024, pp. 3722–3727

2024

[36] [37]

End-to-end imitation learning with safety guarantees using control barrier functions,

R. K. Cosner, Y . Yue, and A. D. Ames, “End-to-end imitation learning with safety guarantees using control barrier functions,” in2022 IEEE 61st Conference on Decision and Control (CDC). IEEE, 2022, pp. 5316–5322

2022

[37] [38]

Safe-gil: Safety guided imitation learning for robotic systems,

Y . U. Ciftci, D. Chiu, Z. Feng, G. S. Sukhatme, and S. Bansal, “Safe-gil: Safety guided imitation learning for robotic systems,”arXiv preprint arXiv:2404.05249, 2024

work page arXiv 2024

[38] [39]

Safe nonlinear control using robust neural Lyapunov-barrier functions,

C. Dawson, Z. Qin, S. Gao, and C. Fan, “Safe nonlinear control using robust neural Lyapunov-barrier functions,” inConference on Robot Learning. PMLR, 2022, pp. 1724–1735

2022

[39] [40]

Learning control barrier functions from expert demonstrations,

A. Robey, H. Hu, L. Lindemann, H. Zhang, D. V . Dimarogonas, S. Tu, and N. Matni, “Learning control barrier functions from expert demonstrations,” in2020 59th IEEE Conference on Decision and Control (CDC), 2020, pp. 3717–3724

2020

[40] [41]

A physics- informed machine learning framework for safe and optimal control of autonomous systems,

M. Tayal, A. Singh, S. Kolathaya, and S. Bansal, “A physics- informed machine learning framework for safe and optimal control of autonomous systems,” inForty-second International Conference on Machine Learning, 2025. [Online]. Available: https://openreview.net/forum?id=SrfwiloGQF

2025

[41] [42]

Learning a formally verified control barrier function in stochastic environment,

M. Tayal, H. Zhang, P. Jagtap, A. Clark, and S. Kolathaya, “Learning a formally verified control barrier function in stochastic environment,” in2024 IEEE 63rd Conference on Decision and Control (CDC), 2024, pp. 4098–4104

2024

[42] [43]

Exact verification of relu neural control barrier functions,

H. Zhang, J. Wu, Y . V orobeychik, and A. Clark, “Exact verification of relu neural control barrier functions,” inAdvances in Neural Information Processing Systems, A. Oh, T. Neumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine, Eds., vol. 36. Curran Associates, Inc., 2023, pp. 5685–5705

2023

[43] [44]

Fossil: A software tool for the formal synthesis of Lyapunov functions and barrier certificates using neural networks,

A. Abate, D. Ahmed, A. Edwards, M. Giacobbe, and A. Peruffo, “Fossil: A software tool for the formal synthesis of Lyapunov functions and barrier certificates using neural networks,” inProceedings of the 24th International Conference on Hybrid Systems: Computation and Control, 2021, pp. 1–11

2021

[44] [45]

Learning neural control barrier functions from offline data with conservatism,

I. Tabbara and H. Sibai, “Learning neural control barrier functions from offline data with conservatism,” 2025. [Online]. Available: https://arxiv.org/abs/2505.00908

work page arXiv 2025

[45] [46]

In-distribution barrier functions: Self-supervised policy filters that avoid out-of-distribution states,

F. Casta ˜neda, H. Nishimura, R. T. McAllister, K. Sreenath, and A. Gaidon, “In-distribution barrier functions: Self-supervised policy filters that avoid out-of-distribution states,” inProceedings of The 5th Annual Learning for Dynamics and Control Conference, ser. Proceedings of Machine Learning Research, N. Matni, M. Morari, and G. J. Pappas, Eds., vol....

2023

[46] [47]

Avoidance of concave obsta- cles through rotation of nonlinear dynamics,

L. Huber, J.-J. Slotine, and A. Billard, “Avoidance of concave obsta- cles through rotation of nonlinear dynamics,”IEEE Transactions on Robotics, vol. 40, pp. 1983–2002, 2023

1983

[47] [48]

Spatiotemporal tubes for temporal reach-avoid-stay tasks in unknown systems,

R. Das, A. Basu, and P. Jagtap, “Spatiotemporal tubes for temporal reach-avoid-stay tasks in unknown systems,”IEEE Transactions on Automatic Control, pp. 1–8, 2025

2025

[48] [49]

Constrained model predictive control: Stability and optimality,

D. Mayne, J. Rawlings, C. Rao, and P. Scokaert, “Constrained model predictive control: Stability and optimality,”Automatica, vol. 36, no. 6, pp. 789–814, 2000. [Online]. Available: https: //www.sciencedirect.com/science/article/pii/S0005109899002149

2000

[49] [50]

A general Hamilton-Jacobi framework for non-linear state-constrained control problems,

A. Altarovici, O. Bokanowski, and H. Zidani, “A general Hamilton-Jacobi framework for non-linear state-constrained control problems,”ESAIM: Control, Optimisation and Calculus of Variations, vol. 19, no. 2, pp. 337–357, 2013. [Online]. Available: https: //www.numdam.org/articles/10.1051/cocv/2012011/

work page doi:10.1051/cocv/2012011/ 2013

[50] [51]

Cooptimizing safety and performance with a control-constrained formulation,

H. Wang, A. Dhande, and S. Bansal, “Cooptimizing safety and performance with a control-constrained formulation,”IEEE Control Systems Letters, vol. 8, pp. 2739–2744, 2024

2024

[51] [52]

Epigraph-guided flow matching for safe and performant offline reinforcement learning,

M. Tayal and M. Tayal, “Epigraph-guided flow matching for safe and performant offline reinforcement learning,”arXiv preprint arXiv:2602.08054, 2026

work page arXiv 2026

[52] [53]

A tutorial on conformal prediction

G. Shafer and V . V ovk, “A tutorial on conformal prediction.”Journal of Machine Learning Research, vol. 9, no. 3, 2008

2008

[53] [54]

Cp-ncbf: A con- formal prediction-based approach to synthesize verified neural control barrier functions,

M. Tayal, A. Singh, P. Jagtap, and S. Kolathaya, “Cp-ncbf: A con- formal prediction-based approach to synthesize verified neural control barrier functions,”arXiv preprint arXiv:2503.17395, 2025

work page arXiv 2025

[54] [55]

Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems

S. Levine, A. Kumar, G. Tucker, and J. Fu, “Offline reinforcement learning: Tutorial, review, and perspectives on open problems,”arXiv preprint arXiv:2005.01643, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2005

[55] [56]

Conservative q-learning for offline reinforcement learning,

A. Kumar, A. Zhou, G. Tucker, and S. Levine, “Conservative q-learning for offline reinforcement learning,” inAdvances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33. Curran Associates, Inc., 2020, pp. 1179–1191. [Online]. Available: https://proceedings.neurips.cc/paper files/paper/ 202...

2020

[56] [57]

Datasets and benchmarks for offline safe reinforcement learning,

Z. Liu, Z. Guo, H. Lin, Y . Yao, J. Zhu, Z. Cen, H. Hu, W. Yu, T. Zhang, J. Tan, and D. Zhao, “Datasets and benchmarks for offline safe reinforcement learning,”Journal of Data-centric Machine Learning Research, 2024

2024

[57] [58]

Safety gymnasium: A unified safe reinforcement learning benchmark,

J. Ji, B. Zhang, J. Zhou, X. Pan, W. Huang, R. Sun, Y . Geng, Y . Zhong, J. Dai, and Y . Yang, “Safety gymnasium: A unified safe reinforcement learning benchmark,” inThirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. [Online]. Available: https://openreview.net/forum?id= WZmlxIuIGR

2023

[58] [59]

Control barrier function based quadratic programs for safety critical systems,

A. D. Ames, X. Xu, J. W. Grizzle, and P. Tabuada, “Control barrier function based quadratic programs for safety critical systems,”IEEE Transactions on Automatic Control, vol. 62, no. 8, pp. 3861–3876, 2017

2017

[59] [60]

Offline reinforcement learning with implicit q-learning,

I. Kostrikov, A. Nair, and S. Levine, “Offline reinforcement learning with implicit q-learning,”International Conference on Learning Rep- resentations, 2022

2022

[60] [61]

A gentle introduction to conformal prediction and distribution-free uncertainty quantification,

A. N. Angelopoulos and S. Bates, “A gentle introduction to conformal prediction and distribution-free uncertainty quantification,”

[61] [62]

A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

[Online]. Available: https://arxiv.org/abs/2107.07511

work page internal anchor Pith review Pith/arXiv arXiv

[62] [63]

Safedmps: Integrating formal safety with dmps for adaptive hri,

P. Tiwari, S. Nath, and R. Prakash, “Safedmps: Integrating formal safety with dmps for adaptive hri,”arXiv preprint arXiv:2603.29708, 2026

work page arXiv 2026

[63] [64]

MADER: Trajectory planner in mul- tiagent and dynamic environments,

J. Tordesillas and J. P. How, “MADER: Trajectory planner in mul- tiagent and dynamic environments,”IEEE Transactions on Robotics, vol. 38, no. 1, pp. 463–476, 2021

2021

[64] [65]

A survey on model-based reinforcement learning,

F.-M. Luo, T. Xu, H. Lai, X.-H. Chen, W. Zhang, and Y . Yu, “A survey on model-based reinforcement learning,”Science China Information Sciences, vol. 67, no. 2, p. 121101, 2024

2024

[65] [66]

Constrained policy optimization,

J. Achiam, D. Held, A. Tamar, and P. Abbeel, “Constrained policy optimization,” inProceedings of the 34th International Conference on Machine Learning - Volume 70, ser. ICML’17. JMLR.org, 2017, p. 22–31

2017