Leveraging Analytic Gradients in Provably Safe Reinforcement Learning

Hannah Markgraf; Jonathan K\"ulz; Matthias Althoff; Tim Walter

arxiv: 2506.01665 · v4 · submitted 2025-06-02 · 💻 cs.LG · cs.AI· cs.RO

Leveraging Analytic Gradients in Provably Safe Reinforcement Learning

Tim Walter , Hannah Markgraf , Jonathan K\"ulz , Matthias Althoff This is my paper

Pith reviewed 2026-05-19 11:01 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.RO

keywords provably safe reinforcement learninganalytic gradientsdifferentiable safeguardscontrol taskssafety guaranteesgradient-based RLdifferentiable simulation

0 comments

The pith

Analytic gradient-based reinforcement learning can now use adapted differentiable safeguards to guarantee safety during training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops the first effective safeguard for analytic gradient-based reinforcement learning to provide safety guarantees in safety-critical applications such as autonomous robots. It analyzes existing differentiable safeguards and adapts them using modified mappings and gradient formulations before integrating the result into a state-of-the-art learning algorithm and a differentiable simulation. Numerical experiments on three control tasks show that the safeguarded training proceeds without compromising performance. This closes a gap that previously existed between sampling-based and analytic gradient-based safe reinforcement learning.

Core claim

By adapting existing differentiable safeguards through modified mappings and gradient formulations, it becomes possible to integrate them into analytic gradient-based reinforcement learning algorithms and differentiable simulators, yielding the first effective safeguard for this paradigm that preserves safety properties while maintaining learning performance on control tasks.

What carries the argument

Modified mappings and gradient formulations that adapt differentiable safeguards for analytic gradient-based training and differentiable simulation.

If this is right

Provably safe training becomes feasible for analytic gradient methods that learn from fewer environment interactions than sampling-based approaches.
The sim-to-real gap narrows because safety constraints are enforced already during the differentiable training phase.
State-of-the-art gradient-based algorithms can incorporate safety without requiring a separate post-training verification step.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same adaptation pattern could be tested on other gradient-based control methods that rely on differentiable dynamics.
Physical robot experiments would directly test whether the reported safety carries over from the differentiable simulator to hardware.
The approach opens a route to combine analytic gradients with model-based safety filters in hybrid learning setups.

Load-bearing premise

The modified mappings and gradient formulations preserve the original safety properties of the differentiable safeguards when inserted into analytic gradient-based training and a differentiable simulator.

What would settle it

An experiment in which safeguarded analytic gradient-based training on one of the control tasks produces unsafe actions that violate the original safety constraints or shows markedly worse performance than the unguarded baseline would falsify the claim.

Figures

Figures reproduced from arXiv: 2506.01665 by Hannah Markgraf, Jonathan K\"ulz, Matthias Althoff, Tim Walter.

**Figure 4.** Figure 4: The zonotopic approach directly approximates the safe action set by inflating the generator lengths of a zonotope. The under-approximated zonotope ZAs is the solution to max cAs ,ls n vuut Yn i=1 ls,i (23a) subject to ZAs = ⟨cAs , GAs diag(ls)⟩ (23b) ZAs ⊆ A (23c) Si+1(ZAs , si) ⊆ Ss (23d) with n uniformally sampled generator directions GAs . Generally, the number of generators should be in the order of … view at source ↗

**Figure 5.** Figure 5: FIGURE 5 [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: FIGURE 6 [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: FIGURE 7 [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: FIGURE 8 [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 10.** Figure 10: FIGURE 10 [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

read the original abstract

The deployment of autonomous robots in safety-critical applications requires safety guarantees. Provably safe reinforcement learning is an active field of research that aims to provide such guarantees using safeguards. These safeguards should be integrated during training to reduce the sim-to-real gap. While there are several approaches for safeguarding sampling-based reinforcement learning, analytic gradient-based reinforcement learning often achieves superior performance from fewer environment interactions. However, there is no safeguarding approach for this learning paradigm yet. Our work addresses this gap by developing the first effective safeguard for analytic gradient-based reinforcement learning. We analyse existing, differentiable safeguards, adapt them through modified mappings and gradient formulations, and integrate them into a state-of-the-art learning algorithm and a differentiable simulation. Using numerical experiments on three control tasks, we evaluate how different safeguards affect learning. The results demonstrate safeguarded training without compromising performance. Additional visuals are provided at timwalter.github.io/safe-agb-rl.github.io.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper fills a practical gap by adapting differentiable safeguards for analytic-gradient RL and shows solid empirical results on three tasks, but the formal safety claims hinge on unverified assumptions about the modified gradients.

read the letter

The core contribution here is extending provably safe RL to analytic-gradient methods, which are more sample-efficient than sampling-based ones. They take existing differentiable safeguards, modify the mappings and gradient expressions, and slot them into a state-of-the-art learner plus differentiable simulator. That is new, and the experiments back it up: on three control tasks the safeguarded runs match unsafe performance and show no obvious violations in the reported numbers. Credit for actually running the comparison and making the integration concrete rather than leaving it at the abstract level.

Referee Report

2 major / 2 minor

Summary. The paper develops the first safeguard for analytic gradient-based reinforcement learning by analyzing existing differentiable safeguards, adapting them via modified mappings and gradient formulations, and integrating the result into a state-of-the-art learning algorithm inside a differentiable simulator. Numerical experiments on three control tasks show that the safeguarded training incurs no performance loss while avoiding obvious safety violations.

Significance. If the adapted safeguards retain their original safety certificates, the work would close a notable gap: analytic-gradient RL is more sample-efficient than sampling-based methods yet previously lacked provable-safety integration during training. The empirical demonstration across multiple tasks and the use of a differentiable simulator are practical strengths that could reduce the sim-to-real gap in safety-critical robotics.

major comments (2)

[§4] §4 (Adaptation of differentiable safeguards): The central claim of 'provably safe' training rests on the assertion that the modified mappings and gradient formulations inherit the original safety properties. No explicit invariance argument, re-derivation, or proof sketch is supplied showing that key assumptions (monotonicity, Lipschitz bounds, or barrier-function forms) remain satisfied after the changes. The numerical results on three tasks report no violations but do not substitute for a formal guarantee.
[§5] §5 (Integration into analytic-gradient algorithm): When the adapted safeguard is inserted into the analytic-gradient loop and differentiable simulator, it is unclear whether the safety certificate still holds at every policy update. The manuscript provides no theorem or lemma establishing that the combined system remains safe throughout training.

minor comments (2)

[Abstract] The abstract and introduction would benefit from a concise statement of the precise safety property (e.g., forward invariance of a safe set) that the adapted safeguard is claimed to enforce.
[Experiments] Figure captions and axis labels in the experimental section could be expanded to indicate which safeguard variant corresponds to each curve.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. We address each major comment below and outline the revisions we will make to strengthen the formal foundations of the work.

read point-by-point responses

Referee: [§4] §4 (Adaptation of differentiable safeguards): The central claim of 'provably safe' training rests on the assertion that the modified mappings and gradient formulations inherit the original safety properties. No explicit invariance argument, re-derivation, or proof sketch is supplied showing that key assumptions (monotonicity, Lipschitz bounds, or barrier-function forms) remain satisfied after the changes. The numerical results on three tasks report no violations but do not substitute for a formal guarantee.

Authors: We agree that the manuscript would benefit from an explicit invariance argument. In the revised version we will add a proof sketch in Section 4 (or a dedicated appendix) showing that the modified mappings preserve monotonicity and the original Lipschitz bounds. The argument will start from the barrier-function form of the source safeguards and demonstrate that the adapted gradient formulations do not violate the required contraction or invariance properties. revision: yes
Referee: [§5] §5 (Integration into analytic-gradient algorithm): When the adapted safeguard is inserted into the analytic-gradient loop and differentiable simulator, it is unclear whether the safety certificate still holds at every policy update. The manuscript provides no theorem or lemma establishing that the combined system remains safe throughout training.

Authors: We acknowledge the need for a formal statement covering the full training loop. We will insert a new lemma (likely in Section 5) that proves the safety certificate is preserved at each policy update. The lemma will explicitly account for the differentiability of the simulator, the analytic gradient path, and the fact that the safeguard is applied before each gradient step, thereby ensuring the state remains inside the safe set throughout training. revision: yes

Circularity Check

0 steps flagged

No circularity: adaptations build on external prior safeguards without reducing to self-definition or fitted predictions

full rationale

The paper's central contribution is analyzing existing differentiable safeguards, adapting them via modified mappings and gradient formulations, then integrating into an analytic-gradient RL algorithm and differentiable simulator. The abstract and provided text give no equations or steps that define safety in terms of the new method itself, rename fitted parameters as predictions, or rely on a self-citation chain for the load-bearing safety claim. The original safety properties are treated as coming from prior external work, with the adaptations presented as engineering changes whose preservation is left to empirical validation on three tasks rather than a closed self-referential loop. This keeps the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are specified in the provided text.

pith-pipeline@v0.9.0 · 5692 in / 957 out tokens · 54820 ms · 2026-05-19T11:01:04.360947+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We analyse existing, differentiable safeguards, adapt them through modified mappings and gradient formulations, and integrate them into a state-of-the-art learning algorithm and a differentiable simulation.
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Boundary projection maps unsafe actions to the boundary of the safe action set by determining the closest safe action.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Safe Reinforcement Learning using Action Projection: Safeguard the Policy or the Environment?
cs.LG 2025-09 conditional novelty 7.0

Action aliasing from safety projections harms policy-gradient estimates more severely when the projection is inside the policy than when it is outside, but a penalty term restores competitiveness.

Reference graph

Works this paper leans on

78 extracted references · 78 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Core challenges of social robot navigation: A survey,

C. Mavrogiannis et al., “Core challenges of social robot navigation: A survey,”ACM Transactions on Human-Robot Interaction, vol. 12, no. 3, pp. 1–39, Sep. 30, 2023

work page 2023
[2]

Safety issues in human-robot interactions,

M. Vasic and A. Billard, “Safety issues in human-robot interactions,” inProc. of the IEEE Int. Conf. on Robotics and Automation (ICRA), May 2013, pp. 197–204

work page 2013
[3]

Optimal and autonomous control using reinforce- ment learning: A survey,

B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforce- ment learning: A survey,”IEEE Transactions on Neural Net- works and Learning Systems, vol. 29, no. 6, pp. 2042–2062, 2018

work page 2042
[4]

Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control,

Z. Li, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath, “Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control,”The International Journal of Robotics Research, Oct. 23, 2024

work page 2024
[5]

Learning quadruped locomotion using differentiable simulation,

Y. Song, S. b. Kim, and D. Scaramuzza, “Learning quadruped locomotion using differentiable simulation,” pre- sented at the Proc. of the Conf. on Robot Learning (CoRL), Sep. 5, 2024

work page 2024
[6]

J. Heeg, Y. Song, and D. Scaramuzza, Learning quadrotor control from visual features using differentiable simulation, Mar. 6, 2025. arXiv: 2410.15979[cs]

work page arXiv 2025
[7]

Cross- ing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning,

E. Salvato, G. Fenu, E. Medvet, and F. A. Pellegrino, “Cross- ing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning,”IEEE Access, vol. 9, pp. 153171–153187, 2021

work page 2021
[8]

Sim-to-real transfer in deep reinforcement learning for robotics: A sur- vey,

W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-real transfer in deep reinforcement learning for robotics: A sur- vey,” inProc. of the IEEE Symp. Series on Computational Intelligence (SSCI), Dec. 2020, pp. 737–744. VOLUME 00 2021 13 F. A. Author ET AL .: PREPARATION OF PAPERS FOR IEEE OPEN JOURNAL OF CONTROL SYSTEMS TABLE 5. Comparison of the safe cen...

work page 2020
[9]

Safety fil- tering while training: Improving the performance and sample efficiency of reinforcement learning agents,

F. P. Bejarano, L. Brunke, and A. P. Schoellig, “Safety fil- tering while training: Improving the performance and sample efficiency of reinforcement learning agents,”IEEE Robotics and Automation Letters, vol. 10, no. 1, pp. 788–795, Jan. 2025

work page 2025
[10]

The effects of reward misspecification: Mapping and mitigating misaligned mod- els,

A. Pan, K. Bhatia, and J. Steinhardt, “The effects of reward misspecification: Mapping and mitigating misaligned mod- els,” presented at the Proc. of the Int. Conf. on Learning Representations (ICLR), Oct. 6, 2021

work page 2021
[11]

Data-efficient Deep Reinforcement Learning for Dexterous Manipulation

I. Popov et al.,Data-efficient deep reinforcement learning for dexterousmanipulation,Apr.10,2017.arXiv:1704.03073[cs]

work page internal anchor Pith review Pith/arXiv arXiv 2017
[12]

Excluding the irrelevant focusing reinforcement learning through continuous action masking,

R. Stolz, H. Krasowski, J. Thumm, M. Eichelbeck, P. Gassert, and M. Althoff, “Excluding the irrelevant focusing reinforcement learning through continuous action masking,” in Proc. of the Int. Conf. on Neural Information Processing Systems (NeurIPS), 2024

work page 2024
[13]

Provablysafedeepreinforcement learning for robotic manipulation in human environments,

J.ThummandM.Althoff,“Provablysafedeepreinforcement learning for robotic manipulation in human environments,” in Proc. of the IEEE Int. Conf. on Robotics and Automation (ICRA), 2022, pp. 6344–6350

work page 2022
[14]

A comprehensive survey on safe reinforcement learning,

J. García and F. Fernández, “A comprehensive survey on safe reinforcement learning,”Journal of Machine Learning Research, vol. 16, no. 1, pp. 1437–1480, Jan. 2015

work page 2015
[15]

Provably safe reinforcement learning: Con- ceptual analysis, survey, and benchmarking,

H. Krasowski, J. Thumm, M. Müller, L. Schäfer, X. Wang, and M. Althoff, “Provably safe reinforcement learning: Con- ceptual analysis, survey, and benchmarking,”Transactions on Machine Learning Research, 2023

work page 2023
[16]

Safe reinforcement learning using black- box reachability analysis,

M. Selim, A. Alanwar, S. Kousik, G. Gao, M. Pavone, and K. H. Johansson, “Safe reinforcement learning using black- box reachability analysis,”IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10665–10672, 2022

work page 2022
[17]

Provably safe reinforcement learning via action projection using reachability analysis and polynomial zono- topes,

N. Kochdumper, H. Krasowski, X. Wang, S. Bak, and M. Althoff, “Provably safe reinforcement learning via action projection using reachability analysis and polynomial zono- topes,”IEEE Open Journal of Control Systems , vol. 2, pp. 79–92, 2023

work page 2023
[18]

Enforcing policy feasibility constraints through dif- ferentiable projection for energy optimization,

B. Chen, P. L. Donti, K. Baker, J. Z. Kolter, and M. Bergés, “Enforcing policy feasibility constraints through dif- ferentiable projection for energy optimization,” inProc. of the ACM Int. Conf. on Future Energy Systems (e-Energy), Jun. 22, 2021, pp. 199–210

work page 2021
[19]

Computationally efficient safe reinforcement learning for power systems,

D. Tabas and B. Zhang, “Computationally efficient safe reinforcement learning for power systems,” inProc. of the American Control Conf. (ACC), 2022, pp. 3303–3310

work page 2022
[20]

Safe reinforcement learning via projection on a safe set: How to achieve opti- mality?

S. Gros, M. Zanon, and A. Bemporad, “Safe reinforcement learning via projection on a safe set: How to achieve opti- mality?”IFAC-PapersOnLine, vol. 53, no. 2, pp. 8076–8081, Jan. 1, 2020

work page 2020
[21]

MuJoCo: A physics engine for model-based control,

E. Todorov, T. Erez, and Y. Tassa, “MuJoCo: A physics engine for model-based control,” inProc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2012, pp. 5026–5033

work page 2012
[22]

Brax - a differentiable physics engine for large scale rigid body simulation,

C. D. Freeman, E. Frey, A. Raichuk, S. Girgin, I. Mordatch, and O. Bachem, “Brax - a differentiable physics engine for large scale rigid body simulation,” inProc. of the Int. Conf. on Neural Information Processing Systems (NeurIPS), 2021

work page 2021
[23]

ChainQueen:Areal-timedifferentiablephysical simulator for soft robotics,

Y.Hu etal.,“ChainQueen:Areal-timedifferentiablephysical simulator for soft robotics,” inProc. of the IEEE Int. Conf. on Robotics and Automation (ICRA), May 2019, pp. 6265– 6271

work page 2019
[24]

Thuerey, P

N. Thuerey, P. Holl, M. Mueller, P. Schnell, F. Trost, and K. Um,Physics-based Deep Learning. WWW, 2021

work page 2021
[25]

Stabilizing reinforcement learning in differentiable multiphysics simulation,

E. Xing, V. Luk, and J. Oh, “Stabilizing reinforcement learning in differentiable multiphysics simulation,” presented at the Proc. of the Int. Conf. on Learning Representations (ICLR), 2025

work page 2025
[26]

Monte carlo gradient estimation in machine learning,

S. Mohamed, M. Rosca, M. Figurnov, and A. Mnih, “Monte carlo gradient estimation in machine learning,”Journal of Machine Learning Research, vol. 21, no. 132, pp. 1–62, 2020

work page 2020
[27]

Stochastic first- and zeroth-order methods for nonconvex stochastic programming,

S. Ghadimi and G. Lan, “Stochastic first- and zeroth-order methods for nonconvex stochastic programming,” SIAM Journal on Optimization, vol. 23, no. 4, pp. 2341–2368, 2013

work page 2013
[28]

PODS: Policy optimization via differentiable simulation,

M. A. Z. Mora, M. Peychev, S. Ha, M. Vechev, and S. Coros, “PODS: Policy optimization via differentiable simulation,” in Proc. of the Int. Conf. on Machine Learning (ICML), M. Meila and T. Zhang, Eds., vol. 139, Jul. 18, 2021, pp. 7805– 7817

work page 2021
[29]

Accelerated policy learning with parallel differ- entiable simulation,

J. Xu et al., “Accelerated policy learning with parallel differ- entiable simulation,” inProc. of the Int. Conf. on Learning Representations (ICLR), 2022

work page 2022
[30]

Do differentiable simulators give better policy gradients?

H. J. Suh, M. Simchowitz, K. Zhang, and R. Tedrake, “Do differentiable simulators give better policy gradients?” In Proc. of the Int. Conf. on Machine Learning (ICML), K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, Eds., vol. 162, Jul. 17, 2022, pp. 20668–20696

work page 2022
[31]

A focused backpropagation algorithm for tem- poral pattern recognition,

M. C. Mozer, “A focused backpropagation algorithm for tem- poral pattern recognition,”Complex Systems 3, pp. 349–381, 1989

work page 1989
[32]

A differentiable physics engine for deep learning in robotics,

J. Degrave, M. Hermans, J. Dambre, and F. Wyffels, “A differentiable physics engine for deep learning in robotics,” Frontiers in Neurorobotics, vol. 13, Mar. 7, 2019

work page 2019
[33]

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press Cambridge, 1998, vol. 1

work page 1998
[34]

Adaptive horizon actor-critic for policy learning in contact- rich differentiable simulation,

I. Georgiev, K. Srinivasan, J. Xu, E. Heiden, and A. Garg, “Adaptive horizon actor-critic for policy learning in contact- rich differentiable simulation,” inProc. of the Int. Conf. on Machine Learning (ICML), 2024

work page 2024
[35]

Safe learning in robotics: From learning- based control to safe reinforcement learning,

L. Brunke et al., “Safe learning in robotics: From learning- based control to safe reinforcement learning,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 5, no. 1, pp. 411–444, 2022. 14 VOLUME 00 2021

work page 2022
[36]

End- to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,

R.Cheng,G.Orosz,R.M.Murray,andJ.W.Burdick,“End- to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,” inProc. of the AAAI Conf. on Artificial Intelligence (AAAI), vol. 33, 2019, pp. 3387–3395

work page 2019
[37]

Safe reinforcement learning for dynamical games,

Y. Yang, Kyriakos G. Vamvoudakis, and H. Modares, “Safe reinforcement learning for dynamical games,”International Journal of Robust and Nonlinear Control, vol. 30, no. 9, pp. 3706–3726, 2020

work page 2020
[38]

Reinforcement learning with safety and stability guarantees during exploration for linear systems,

Z. Marvi and B. Kiumarsi, “Reinforcement learning with safety and stability guarantees during exploration for linear systems,”IEEE Open Journal of Control Systems, vol. 1, pp. 322–334, 2022

work page 2022
[39]

Safe neural control for non-affine control systems with differentiable control barrier functions,

W. Xiao, R. Allen, and D. Rus, “Safe neural control for non-affine control systems with differentiable control barrier functions,” inProc. of the IEEE Conf. on Decision and Control (CDC), 2023, pp. 3366–3371

work page 2023
[40]

Safety-aware pursuit-evasion games in unknown environ- ments using Gaussian processes and finite-time convergent reinforcement learning,

Nikolaos-Marios T. Kokolakis and K. G. Vamvoudakis, “Safety-aware pursuit-evasion games in unknown environ- ments using Gaussian processes and finite-time convergent reinforcement learning,”IEEE Transactions on Neural Net- works and Learning Systems, vol. 35, no. 3, pp. 3130–3143, 2022

work page 2022
[41]

Safe reinforcement learning using data-driven predictive control,

M. Selim, A. Alanwar, M. W. El-Kharashi, H. M. Abbas, and K. H. Johansson, “Safe reinforcement learning using data-driven predictive control,” inProc. of the Int. Conf. on Communications, Signal Processing, and their Applications (ICCSPA), 2022, pp. 1–6

work page 2022
[42]

Contingency- constrained economic dispatch with safe reinforcement learn- ing,

M. Eichelbeck, H. Markgraf, and M. Althoff, “Contingency- constrained economic dispatch with safe reinforcement learn- ing,” inProc. of the IEEE Int. Conf. on Machine Learning and Applications (ICMLA), 2022, pp. 597–602

work page 2022
[43]

Data-driven safety filters: Hamilton- Jacobi reachability, control barrier functions, and predictive methodsforuncertainsystems,

K. P. Wabersich et al., “Data-driven safety filters: Hamilton- Jacobi reachability, control barrier functions, and predictive methodsforuncertainsystems,” IEEEControlSystemsMag- azine, vol. 43, no. 5, pp. 137–177, 2023

work page 2023
[44]

Scalable reachset-conformant identification of linear systems,

L. Lützow and M. Althoff, “Scalable reachset-conformant identification of linear systems,”IEEE Control Systems Let- ters, vol. 8, pp. 520–525, 2024

work page 2024
[45]

Reachset-conformant system identification,

L. Lützow and M. Althoff, “Reachset-conformant system identification,”arXiv preprint arXiv:2407.11692, 2024

work page arXiv 2024
[46]

Scalablecomputation of robust control invariant sets of nonlinear systems,

L.Schäfer,F.Gruber,andM.Althoff,“Scalablecomputation of robust control invariant sets of nonlinear systems,”IEEE Transactions on Automatic Control, vol. 69, no. 2, pp. 755– 770, 2024

work page 2024
[47]

(implicit)2: Implicit layers for implicit representations,

Z. Huang, S. Bai, and J. Z. Kolter, “(implicit)2: Implicit layers for implicit representations,” inProc. of the Int. Conf. on Neural Information Processing Systems (NeurIPS), 2021, pp. 9639–9650

work page 2021
[48]

Differentiable convex optimization layers,

A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and J. Z. Kolter, “Differentiable convex optimization layers,” in Proc.oftheInt.Conf.onNeuralInformationProcessingSys- tems (NeurIPS), H. Wallach, H. Larochelle, A. Beygelzimer, F. d. Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32, 2019

work page 2019
[49]

S. G. Krantz and H. R. Parks,The Implicit Function Theo- rem: History, Theory, and Applications. Springer, 2013

work page 2013
[50]

Learning convex optimization control policies,

A. Agrawal, S. Barratt, S. Boyd, and B. Stellato, “Learning convex optimization control policies,” inProc. of the Ann. Learning for Dynamics and Control Conf. (L4DC), A. M. Bayen et al., Eds., vol. 120, Jun. 10, 2020, pp. 361–373

work page 2020
[51]

Learning convex optimization models,

A. Agrawal, S. Barratt, and S. Boyd, “Learning convex optimization models,”IEEE/CAA Journal of Automatica Sinica, vol. 8, no. 8, pp. 1355–1364, Aug. 2021

work page 2021
[52]

Differentiating through a cone program,

A. Agrawal, S. Barratt, S. Boyd, E. Busseti, and M. Walaa, “Differentiating through a cone program,”JournalofApplied and Numerical Optimization, vol. 2019, no. 2, 2019

work page 2019
[53]

A tutorial on geometric programming,

S. Boyd, S.-J. Kim, L. Vandenberghe, and A. Hassibi, “A tutorial on geometric programming,”Optimization and En- gineering, vol. 8, no. 1, Mar. 2007

work page 2007
[54]

Conic formulation of a convex programming problem and duality,

Y. Nesterov and A. Nemirovsky, “Conic formulation of a convex programming problem and duality,”Optimization Methods and Software, vol. 1, no. 2, pp. 95–115, Jan. 1992

work page 1992
[55]

CVXPY: A Python-embedded modeling language for convex optimization,

S. Diamond and S. Boyd, “CVXPY: A Python-embedded modeling language for convex optimization,”Journal of Ma- chine Learning Research, vol. 17, no. 83, pp. 1–5, 2016

work page 2016
[56]

A rewriting system for convex optimization problems,

A. Agrawal, R. Verschueren, S. Diamond, and S. Boyd, “A rewriting system for convex optimization problems,”Journal of Control and Decision, vol. 5, no. 1, pp. 42–60, 2018

work page 2018
[57]

Simple statistical gradient-following algo- rithms for connectionist reinforcement learning,

R. J. Williams, “Simple statistical gradient-following algo- rithms for connectionist reinforcement learning,”Machine Learning, vol. 8, no. 3, pp. 229–256, 1992

work page 1992
[58]

Schulman, F

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, Proximal policy optimization algorithms, Aug. 28,

work page
[59]

arXiv: 1707.06347[cs]

work page internal anchor Pith review Pith/arXiv arXiv
[60]

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” inProc. of the Int. Conf. on Machine Learning (ICML), Jul. 3, 2018, pp. 1861–1870

work page 2018
[61]

Policy gradient methods for reinforcement learning with function approximation,

S. P. S. Richard S. Sutton David A. McAllester, “Policy gradient methods for reinforcement learning with function approximation,” inProc. of the Int. Conf. on Neural Infor- mation Processing Systems (NeurIPS), vol. 12, 1999

work page 1999
[62]

Combining zonotopes and sup- port functions for efficient reachability analysis of linear systems,

M. Althoff and G. Frehse, “Combining zonotopes and sup- port functions for efficient reachability analysis of linear systems,”inProc.oftheIEEEConf.onDecisionandControl (CDC), Dec. 2016, pp. 7439–7446

work page 2016
[63]

On the co-NP-completeness of the zonotope containment problem,

A. Kulmburg and M. Althoff, “On the co-NP-completeness of the zonotope containment problem,”European Journal of Control, vol. 62, pp. 84–91, 2021

work page 2021
[64]

Linear encodings for poly- tope containment problems,

S. Sadraddini and R. Tedrake, “Linear encodings for poly- tope containment problems,” inProc. of the IEEE Conf. on Decision and Control (CDC), 2019, pp. 4367–4372

work page 2019
[65]

Disciplined convex program- ming,

M. Grant, S. Boyd, and Y. Ye, “Disciplined convex program- ming,” inGlobal Optimization: From Theory to Implemen- tation, L. Liberti and N. Maculan, Eds., 2006, pp. 155–210

work page 2006
[66]

Guarantees for realroboticsystems:Unifyingformalcontrollersynthesisand reachset-conformant identification,

S. B. Liu, B. Schürmann, and M. Althoff, “Guarantees for realroboticsystems:Unifyingformalcontrollersynthesisand reachset-conformant identification,”IEEE Transactions on Robotics, vol. 39, no. 5, pp. 3776–3790, Oct. 2023

work page 2023
[67]

Scalable robust safety filter with unknown disturbance set,

F. Gruber and M. Althoff, “Scalable robust safety filter with unknown disturbance set,”IEEE Transactions on Automatic Control, vol. 68, no. 12, pp. 7756–7770, Dec. 2023

work page 2023
[68]

Set propagation tech- niques for reachability analysis,

M. Althoff, G. Frehse, and A. Girard, “Set propagation tech- niques for reachability analysis,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 4, pp. 369–395, May 3, 2021

work page 2021
[69]

AROC: A toolbox for au- tomated reachset optimal controller synthesis,

N. Kochdumper, F. Gruber, B. Schürmann, V. Gaßmann, M. Klischat, and M. Althoff, “AROC: A toolbox for au- tomated reachset optimal controller synthesis,” inProc. of the Int. Conf. on Hybrid Systems: Computation and Control (HSCC), 2021, pp. 1–6

work page 2021
[70]

Generalized gradients and applications,

F. H. Clarke, “Generalized gradients and applications,” TransactionsoftheAmericanMathematicalSociety ,vol.205, pp. 247–247, 1975

work page 1975
[71]

Optuna: A next-generation hyperparameter optimization framework,

T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-generation hyperparameter optimization framework,” inProc. of the ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD), Jul. 25, 2019, pp. 2623–2631

work page 2019
[72]

Gymnasium: A Standard Interface for Reinforcement Learning Environments

M. Towers et al., Gymnasium: A standard interface for reinforcement learning environments, Nov. 8, 2024. arXiv: 2407.17032[cs]

work page internal anchor Pith review Pith/arXiv arXiv 2024
[73]

PyTorch: An imperative style, high- performance deep learning library,

A. Paszke et al., “PyTorch: An imperative style, high- performance deep learning library,” in Proc. of the Int. Conf. on Neural Information Processing Systems (NeurIPS), vol. 32, 2019

work page 2019
[74]

Y. Chen, D. Tse, P. Nobel, P. Goulart, and S. Boyd, CuClarabel: GPU acceleration for a conic optimization solver, Dec. 30, 2024. arXiv: 2412.19027[math]

work page arXiv 2024
[75]

Embedded code generation with CVXPY,

M. Schaller, G. Banjac, S. Diamond, A. Agrawal, B. Stellato, and S. Boyd, “Embedded code generation with CVXPY,” IEEE Control Systems Letters, vol. 6, pp. 2653–2658, 2022

work page 2022
[76]

A. S. C. Bianchi, Analogues of the usual pseudodifferential calculus on the Heisenberg group. State University of New York at Stony Brook, 2005. VOLUME 00 2021 15 F. A. Author ET AL .: PREPARATION OF PAPERS FOR IEEE OPEN JOURNAL OF CONTROL SYSTEMS

work page 2005
[77]

OptNet: Differentiable optimiza- tion as a layer in neural networks,

B. Amos and J. Z. Kolter, “OptNet: Differentiable optimiza- tion as a layer in neural networks,” inProc. of the Int. Conf. on Learning Representations (ICLR), Aug. 6, 2017, pp. 136–145. T. Walter (Member, IEEE) received the B.Eng. degree in Electrical Engineering and Information Technology from the University of Applied Sciences Munich, Munich, Ger- many,...

work page 2017
[78]

(33) The reward function encodes the goal of balancing the pendulum upright. We define the safety constraints as the part of the state space from which the controller can maintain balance, effectively limiting the velocity and angle close to the upright position. We induce a safe action set from a robust control invariant (RCI) state set, which we obtain ...

work page 2021

[1] [1]

Core challenges of social robot navigation: A survey,

C. Mavrogiannis et al., “Core challenges of social robot navigation: A survey,”ACM Transactions on Human-Robot Interaction, vol. 12, no. 3, pp. 1–39, Sep. 30, 2023

work page 2023

[2] [2]

Safety issues in human-robot interactions,

M. Vasic and A. Billard, “Safety issues in human-robot interactions,” inProc. of the IEEE Int. Conf. on Robotics and Automation (ICRA), May 2013, pp. 197–204

work page 2013

[3] [3]

Optimal and autonomous control using reinforce- ment learning: A survey,

B. Kiumarsi, K. G. Vamvoudakis, H. Modares, and F. L. Lewis, “Optimal and autonomous control using reinforce- ment learning: A survey,”IEEE Transactions on Neural Net- works and Learning Systems, vol. 29, no. 6, pp. 2042–2062, 2018

work page 2042

[4] [4]

Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control,

Z. Li, X. B. Peng, P. Abbeel, S. Levine, G. Berseth, and K. Sreenath, “Reinforcement learning for versatile, dynamic, and robust bipedal locomotion control,”The International Journal of Robotics Research, Oct. 23, 2024

work page 2024

[5] [5]

Learning quadruped locomotion using differentiable simulation,

Y. Song, S. b. Kim, and D. Scaramuzza, “Learning quadruped locomotion using differentiable simulation,” pre- sented at the Proc. of the Conf. on Robot Learning (CoRL), Sep. 5, 2024

work page 2024

[6] [6]

J. Heeg, Y. Song, and D. Scaramuzza, Learning quadrotor control from visual features using differentiable simulation, Mar. 6, 2025. arXiv: 2410.15979[cs]

work page arXiv 2025

[7] [7]

Cross- ing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning,

E. Salvato, G. Fenu, E. Medvet, and F. A. Pellegrino, “Cross- ing the reality gap: A survey on sim-to-real transferability of robot controllers in reinforcement learning,”IEEE Access, vol. 9, pp. 153171–153187, 2021

work page 2021

[8] [8]

Sim-to-real transfer in deep reinforcement learning for robotics: A sur- vey,

W. Zhao, J. P. Queralta, and T. Westerlund, “Sim-to-real transfer in deep reinforcement learning for robotics: A sur- vey,” inProc. of the IEEE Symp. Series on Computational Intelligence (SSCI), Dec. 2020, pp. 737–744. VOLUME 00 2021 13 F. A. Author ET AL .: PREPARATION OF PAPERS FOR IEEE OPEN JOURNAL OF CONTROL SYSTEMS TABLE 5. Comparison of the safe cen...

work page 2020

[9] [9]

Safety fil- tering while training: Improving the performance and sample efficiency of reinforcement learning agents,

F. P. Bejarano, L. Brunke, and A. P. Schoellig, “Safety fil- tering while training: Improving the performance and sample efficiency of reinforcement learning agents,”IEEE Robotics and Automation Letters, vol. 10, no. 1, pp. 788–795, Jan. 2025

work page 2025

[10] [10]

The effects of reward misspecification: Mapping and mitigating misaligned mod- els,

A. Pan, K. Bhatia, and J. Steinhardt, “The effects of reward misspecification: Mapping and mitigating misaligned mod- els,” presented at the Proc. of the Int. Conf. on Learning Representations (ICLR), Oct. 6, 2021

work page 2021

[11] [11]

Data-efficient Deep Reinforcement Learning for Dexterous Manipulation

I. Popov et al.,Data-efficient deep reinforcement learning for dexterousmanipulation,Apr.10,2017.arXiv:1704.03073[cs]

work page internal anchor Pith review Pith/arXiv arXiv 2017

[12] [12]

Excluding the irrelevant focusing reinforcement learning through continuous action masking,

R. Stolz, H. Krasowski, J. Thumm, M. Eichelbeck, P. Gassert, and M. Althoff, “Excluding the irrelevant focusing reinforcement learning through continuous action masking,” in Proc. of the Int. Conf. on Neural Information Processing Systems (NeurIPS), 2024

work page 2024

[13] [13]

Provablysafedeepreinforcement learning for robotic manipulation in human environments,

J.ThummandM.Althoff,“Provablysafedeepreinforcement learning for robotic manipulation in human environments,” in Proc. of the IEEE Int. Conf. on Robotics and Automation (ICRA), 2022, pp. 6344–6350

work page 2022

[14] [14]

A comprehensive survey on safe reinforcement learning,

J. García and F. Fernández, “A comprehensive survey on safe reinforcement learning,”Journal of Machine Learning Research, vol. 16, no. 1, pp. 1437–1480, Jan. 2015

work page 2015

[15] [15]

Provably safe reinforcement learning: Con- ceptual analysis, survey, and benchmarking,

H. Krasowski, J. Thumm, M. Müller, L. Schäfer, X. Wang, and M. Althoff, “Provably safe reinforcement learning: Con- ceptual analysis, survey, and benchmarking,”Transactions on Machine Learning Research, 2023

work page 2023

[16] [16]

Safe reinforcement learning using black- box reachability analysis,

M. Selim, A. Alanwar, S. Kousik, G. Gao, M. Pavone, and K. H. Johansson, “Safe reinforcement learning using black- box reachability analysis,”IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 10665–10672, 2022

work page 2022

[17] [17]

Provably safe reinforcement learning via action projection using reachability analysis and polynomial zono- topes,

N. Kochdumper, H. Krasowski, X. Wang, S. Bak, and M. Althoff, “Provably safe reinforcement learning via action projection using reachability analysis and polynomial zono- topes,”IEEE Open Journal of Control Systems , vol. 2, pp. 79–92, 2023

work page 2023

[18] [18]

Enforcing policy feasibility constraints through dif- ferentiable projection for energy optimization,

B. Chen, P. L. Donti, K. Baker, J. Z. Kolter, and M. Bergés, “Enforcing policy feasibility constraints through dif- ferentiable projection for energy optimization,” inProc. of the ACM Int. Conf. on Future Energy Systems (e-Energy), Jun. 22, 2021, pp. 199–210

work page 2021

[19] [19]

Computationally efficient safe reinforcement learning for power systems,

D. Tabas and B. Zhang, “Computationally efficient safe reinforcement learning for power systems,” inProc. of the American Control Conf. (ACC), 2022, pp. 3303–3310

work page 2022

[20] [20]

Safe reinforcement learning via projection on a safe set: How to achieve opti- mality?

S. Gros, M. Zanon, and A. Bemporad, “Safe reinforcement learning via projection on a safe set: How to achieve opti- mality?”IFAC-PapersOnLine, vol. 53, no. 2, pp. 8076–8081, Jan. 1, 2020

work page 2020

[21] [21]

MuJoCo: A physics engine for model-based control,

E. Todorov, T. Erez, and Y. Tassa, “MuJoCo: A physics engine for model-based control,” inProc. of the IEEE/RSJ Int. Conf. on Intelligent Robots and Systems (IROS), 2012, pp. 5026–5033

work page 2012

[22] [22]

Brax - a differentiable physics engine for large scale rigid body simulation,

C. D. Freeman, E. Frey, A. Raichuk, S. Girgin, I. Mordatch, and O. Bachem, “Brax - a differentiable physics engine for large scale rigid body simulation,” inProc. of the Int. Conf. on Neural Information Processing Systems (NeurIPS), 2021

work page 2021

[23] [23]

ChainQueen:Areal-timedifferentiablephysical simulator for soft robotics,

Y.Hu etal.,“ChainQueen:Areal-timedifferentiablephysical simulator for soft robotics,” inProc. of the IEEE Int. Conf. on Robotics and Automation (ICRA), May 2019, pp. 6265– 6271

work page 2019

[24] [24]

Thuerey, P

N. Thuerey, P. Holl, M. Mueller, P. Schnell, F. Trost, and K. Um,Physics-based Deep Learning. WWW, 2021

work page 2021

[25] [25]

Stabilizing reinforcement learning in differentiable multiphysics simulation,

E. Xing, V. Luk, and J. Oh, “Stabilizing reinforcement learning in differentiable multiphysics simulation,” presented at the Proc. of the Int. Conf. on Learning Representations (ICLR), 2025

work page 2025

[26] [26]

Monte carlo gradient estimation in machine learning,

S. Mohamed, M. Rosca, M. Figurnov, and A. Mnih, “Monte carlo gradient estimation in machine learning,”Journal of Machine Learning Research, vol. 21, no. 132, pp. 1–62, 2020

work page 2020

[27] [27]

Stochastic first- and zeroth-order methods for nonconvex stochastic programming,

S. Ghadimi and G. Lan, “Stochastic first- and zeroth-order methods for nonconvex stochastic programming,” SIAM Journal on Optimization, vol. 23, no. 4, pp. 2341–2368, 2013

work page 2013

[28] [28]

PODS: Policy optimization via differentiable simulation,

M. A. Z. Mora, M. Peychev, S. Ha, M. Vechev, and S. Coros, “PODS: Policy optimization via differentiable simulation,” in Proc. of the Int. Conf. on Machine Learning (ICML), M. Meila and T. Zhang, Eds., vol. 139, Jul. 18, 2021, pp. 7805– 7817

work page 2021

[29] [29]

Accelerated policy learning with parallel differ- entiable simulation,

J. Xu et al., “Accelerated policy learning with parallel differ- entiable simulation,” inProc. of the Int. Conf. on Learning Representations (ICLR), 2022

work page 2022

[30] [30]

Do differentiable simulators give better policy gradients?

H. J. Suh, M. Simchowitz, K. Zhang, and R. Tedrake, “Do differentiable simulators give better policy gradients?” In Proc. of the Int. Conf. on Machine Learning (ICML), K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, Eds., vol. 162, Jul. 17, 2022, pp. 20668–20696

work page 2022

[31] [31]

A focused backpropagation algorithm for tem- poral pattern recognition,

M. C. Mozer, “A focused backpropagation algorithm for tem- poral pattern recognition,”Complex Systems 3, pp. 349–381, 1989

work page 1989

[32] [32]

A differentiable physics engine for deep learning in robotics,

J. Degrave, M. Hermans, J. Dambre, and F. Wyffels, “A differentiable physics engine for deep learning in robotics,” Frontiers in Neurorobotics, vol. 13, Mar. 7, 2019

work page 2019

[33] [33]

R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press Cambridge, 1998, vol. 1

work page 1998

[34] [34]

Adaptive horizon actor-critic for policy learning in contact- rich differentiable simulation,

I. Georgiev, K. Srinivasan, J. Xu, E. Heiden, and A. Garg, “Adaptive horizon actor-critic for policy learning in contact- rich differentiable simulation,” inProc. of the Int. Conf. on Machine Learning (ICML), 2024

work page 2024

[35] [35]

Safe learning in robotics: From learning- based control to safe reinforcement learning,

L. Brunke et al., “Safe learning in robotics: From learning- based control to safe reinforcement learning,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 5, no. 1, pp. 411–444, 2022. 14 VOLUME 00 2021

work page 2022

[36] [36]

End- to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,

R.Cheng,G.Orosz,R.M.Murray,andJ.W.Burdick,“End- to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks,” inProc. of the AAAI Conf. on Artificial Intelligence (AAAI), vol. 33, 2019, pp. 3387–3395

work page 2019

[37] [37]

Safe reinforcement learning for dynamical games,

Y. Yang, Kyriakos G. Vamvoudakis, and H. Modares, “Safe reinforcement learning for dynamical games,”International Journal of Robust and Nonlinear Control, vol. 30, no. 9, pp. 3706–3726, 2020

work page 2020

[38] [38]

Reinforcement learning with safety and stability guarantees during exploration for linear systems,

Z. Marvi and B. Kiumarsi, “Reinforcement learning with safety and stability guarantees during exploration for linear systems,”IEEE Open Journal of Control Systems, vol. 1, pp. 322–334, 2022

work page 2022

[39] [39]

Safe neural control for non-affine control systems with differentiable control barrier functions,

W. Xiao, R. Allen, and D. Rus, “Safe neural control for non-affine control systems with differentiable control barrier functions,” inProc. of the IEEE Conf. on Decision and Control (CDC), 2023, pp. 3366–3371

work page 2023

[40] [40]

Safety-aware pursuit-evasion games in unknown environ- ments using Gaussian processes and finite-time convergent reinforcement learning,

Nikolaos-Marios T. Kokolakis and K. G. Vamvoudakis, “Safety-aware pursuit-evasion games in unknown environ- ments using Gaussian processes and finite-time convergent reinforcement learning,”IEEE Transactions on Neural Net- works and Learning Systems, vol. 35, no. 3, pp. 3130–3143, 2022

work page 2022

[41] [41]

Safe reinforcement learning using data-driven predictive control,

M. Selim, A. Alanwar, M. W. El-Kharashi, H. M. Abbas, and K. H. Johansson, “Safe reinforcement learning using data-driven predictive control,” inProc. of the Int. Conf. on Communications, Signal Processing, and their Applications (ICCSPA), 2022, pp. 1–6

work page 2022

[42] [42]

Contingency- constrained economic dispatch with safe reinforcement learn- ing,

M. Eichelbeck, H. Markgraf, and M. Althoff, “Contingency- constrained economic dispatch with safe reinforcement learn- ing,” inProc. of the IEEE Int. Conf. on Machine Learning and Applications (ICMLA), 2022, pp. 597–602

work page 2022

[43] [43]

Data-driven safety filters: Hamilton- Jacobi reachability, control barrier functions, and predictive methodsforuncertainsystems,

K. P. Wabersich et al., “Data-driven safety filters: Hamilton- Jacobi reachability, control barrier functions, and predictive methodsforuncertainsystems,” IEEEControlSystemsMag- azine, vol. 43, no. 5, pp. 137–177, 2023

work page 2023

[44] [44]

Scalable reachset-conformant identification of linear systems,

L. Lützow and M. Althoff, “Scalable reachset-conformant identification of linear systems,”IEEE Control Systems Let- ters, vol. 8, pp. 520–525, 2024

work page 2024

[45] [45]

Reachset-conformant system identification,

L. Lützow and M. Althoff, “Reachset-conformant system identification,”arXiv preprint arXiv:2407.11692, 2024

work page arXiv 2024

[46] [46]

Scalablecomputation of robust control invariant sets of nonlinear systems,

L.Schäfer,F.Gruber,andM.Althoff,“Scalablecomputation of robust control invariant sets of nonlinear systems,”IEEE Transactions on Automatic Control, vol. 69, no. 2, pp. 755– 770, 2024

work page 2024

[47] [47]

(implicit)2: Implicit layers for implicit representations,

Z. Huang, S. Bai, and J. Z. Kolter, “(implicit)2: Implicit layers for implicit representations,” inProc. of the Int. Conf. on Neural Information Processing Systems (NeurIPS), 2021, pp. 9639–9650

work page 2021

[48] [48]

Differentiable convex optimization layers,

A. Agrawal, B. Amos, S. Barratt, S. Boyd, S. Diamond, and J. Z. Kolter, “Differentiable convex optimization layers,” in Proc.oftheInt.Conf.onNeuralInformationProcessingSys- tems (NeurIPS), H. Wallach, H. Larochelle, A. Beygelzimer, F. d. Alché-Buc, E. Fox, and R. Garnett, Eds., vol. 32, 2019

work page 2019

[49] [49]

S. G. Krantz and H. R. Parks,The Implicit Function Theo- rem: History, Theory, and Applications. Springer, 2013

work page 2013

[50] [50]

Learning convex optimization control policies,

A. Agrawal, S. Barratt, S. Boyd, and B. Stellato, “Learning convex optimization control policies,” inProc. of the Ann. Learning for Dynamics and Control Conf. (L4DC), A. M. Bayen et al., Eds., vol. 120, Jun. 10, 2020, pp. 361–373

work page 2020

[51] [51]

Learning convex optimization models,

A. Agrawal, S. Barratt, and S. Boyd, “Learning convex optimization models,”IEEE/CAA Journal of Automatica Sinica, vol. 8, no. 8, pp. 1355–1364, Aug. 2021

work page 2021

[52] [52]

Differentiating through a cone program,

A. Agrawal, S. Barratt, S. Boyd, E. Busseti, and M. Walaa, “Differentiating through a cone program,”JournalofApplied and Numerical Optimization, vol. 2019, no. 2, 2019

work page 2019

[53] [53]

A tutorial on geometric programming,

S. Boyd, S.-J. Kim, L. Vandenberghe, and A. Hassibi, “A tutorial on geometric programming,”Optimization and En- gineering, vol. 8, no. 1, Mar. 2007

work page 2007

[54] [54]

Conic formulation of a convex programming problem and duality,

Y. Nesterov and A. Nemirovsky, “Conic formulation of a convex programming problem and duality,”Optimization Methods and Software, vol. 1, no. 2, pp. 95–115, Jan. 1992

work page 1992

[55] [55]

CVXPY: A Python-embedded modeling language for convex optimization,

S. Diamond and S. Boyd, “CVXPY: A Python-embedded modeling language for convex optimization,”Journal of Ma- chine Learning Research, vol. 17, no. 83, pp. 1–5, 2016

work page 2016

[56] [56]

A rewriting system for convex optimization problems,

A. Agrawal, R. Verschueren, S. Diamond, and S. Boyd, “A rewriting system for convex optimization problems,”Journal of Control and Decision, vol. 5, no. 1, pp. 42–60, 2018

work page 2018

[57] [57]

Simple statistical gradient-following algo- rithms for connectionist reinforcement learning,

R. J. Williams, “Simple statistical gradient-following algo- rithms for connectionist reinforcement learning,”Machine Learning, vol. 8, no. 3, pp. 229–256, 1992

work page 1992

[58] [58]

Schulman, F

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, Proximal policy optimization algorithms, Aug. 28,

work page

[59] [59]

arXiv: 1707.06347[cs]

work page internal anchor Pith review Pith/arXiv arXiv

[60] [60]

Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,

T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, “Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor,” inProc. of the Int. Conf. on Machine Learning (ICML), Jul. 3, 2018, pp. 1861–1870

work page 2018

[61] [61]

Policy gradient methods for reinforcement learning with function approximation,

S. P. S. Richard S. Sutton David A. McAllester, “Policy gradient methods for reinforcement learning with function approximation,” inProc. of the Int. Conf. on Neural Infor- mation Processing Systems (NeurIPS), vol. 12, 1999

work page 1999

[62] [62]

Combining zonotopes and sup- port functions for efficient reachability analysis of linear systems,

M. Althoff and G. Frehse, “Combining zonotopes and sup- port functions for efficient reachability analysis of linear systems,”inProc.oftheIEEEConf.onDecisionandControl (CDC), Dec. 2016, pp. 7439–7446

work page 2016

[63] [63]

On the co-NP-completeness of the zonotope containment problem,

A. Kulmburg and M. Althoff, “On the co-NP-completeness of the zonotope containment problem,”European Journal of Control, vol. 62, pp. 84–91, 2021

work page 2021

[64] [64]

Linear encodings for poly- tope containment problems,

S. Sadraddini and R. Tedrake, “Linear encodings for poly- tope containment problems,” inProc. of the IEEE Conf. on Decision and Control (CDC), 2019, pp. 4367–4372

work page 2019

[65] [65]

Disciplined convex program- ming,

M. Grant, S. Boyd, and Y. Ye, “Disciplined convex program- ming,” inGlobal Optimization: From Theory to Implemen- tation, L. Liberti and N. Maculan, Eds., 2006, pp. 155–210

work page 2006

[66] [66]

Guarantees for realroboticsystems:Unifyingformalcontrollersynthesisand reachset-conformant identification,

S. B. Liu, B. Schürmann, and M. Althoff, “Guarantees for realroboticsystems:Unifyingformalcontrollersynthesisand reachset-conformant identification,”IEEE Transactions on Robotics, vol. 39, no. 5, pp. 3776–3790, Oct. 2023

work page 2023

[67] [67]

Scalable robust safety filter with unknown disturbance set,

F. Gruber and M. Althoff, “Scalable robust safety filter with unknown disturbance set,”IEEE Transactions on Automatic Control, vol. 68, no. 12, pp. 7756–7770, Dec. 2023

work page 2023

[68] [68]

Set propagation tech- niques for reachability analysis,

M. Althoff, G. Frehse, and A. Girard, “Set propagation tech- niques for reachability analysis,”Annual Review of Control, Robotics, and Autonomous Systems, vol. 4, pp. 369–395, May 3, 2021

work page 2021

[69] [69]

AROC: A toolbox for au- tomated reachset optimal controller synthesis,

N. Kochdumper, F. Gruber, B. Schürmann, V. Gaßmann, M. Klischat, and M. Althoff, “AROC: A toolbox for au- tomated reachset optimal controller synthesis,” inProc. of the Int. Conf. on Hybrid Systems: Computation and Control (HSCC), 2021, pp. 1–6

work page 2021

[70] [70]

Generalized gradients and applications,

F. H. Clarke, “Generalized gradients and applications,” TransactionsoftheAmericanMathematicalSociety ,vol.205, pp. 247–247, 1975

work page 1975

[71] [71]

Optuna: A next-generation hyperparameter optimization framework,

T. Akiba, S. Sano, T. Yanase, T. Ohta, and M. Koyama, “Optuna: A next-generation hyperparameter optimization framework,” inProc. of the ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (SIGKDD), Jul. 25, 2019, pp. 2623–2631

work page 2019

[72] [72]

Gymnasium: A Standard Interface for Reinforcement Learning Environments

M. Towers et al., Gymnasium: A standard interface for reinforcement learning environments, Nov. 8, 2024. arXiv: 2407.17032[cs]

work page internal anchor Pith review Pith/arXiv arXiv 2024

[73] [73]

PyTorch: An imperative style, high- performance deep learning library,

A. Paszke et al., “PyTorch: An imperative style, high- performance deep learning library,” in Proc. of the Int. Conf. on Neural Information Processing Systems (NeurIPS), vol. 32, 2019

work page 2019

[74] [74]

Y. Chen, D. Tse, P. Nobel, P. Goulart, and S. Boyd, CuClarabel: GPU acceleration for a conic optimization solver, Dec. 30, 2024. arXiv: 2412.19027[math]

work page arXiv 2024

[75] [75]

Embedded code generation with CVXPY,

M. Schaller, G. Banjac, S. Diamond, A. Agrawal, B. Stellato, and S. Boyd, “Embedded code generation with CVXPY,” IEEE Control Systems Letters, vol. 6, pp. 2653–2658, 2022

work page 2022

[76] [76]

A. S. C. Bianchi, Analogues of the usual pseudodifferential calculus on the Heisenberg group. State University of New York at Stony Brook, 2005. VOLUME 00 2021 15 F. A. Author ET AL .: PREPARATION OF PAPERS FOR IEEE OPEN JOURNAL OF CONTROL SYSTEMS

work page 2005

[77] [77]

OptNet: Differentiable optimiza- tion as a layer in neural networks,

B. Amos and J. Z. Kolter, “OptNet: Differentiable optimiza- tion as a layer in neural networks,” inProc. of the Int. Conf. on Learning Representations (ICLR), Aug. 6, 2017, pp. 136–145. T. Walter (Member, IEEE) received the B.Eng. degree in Electrical Engineering and Information Technology from the University of Applied Sciences Munich, Munich, Ger- many,...

work page 2017

[78] [78]

(33) The reward function encodes the goal of balancing the pendulum upright. We define the safety constraints as the part of the state space from which the controller can maintain balance, effectively limiting the velocity and angle close to the upright position. We induce a safe action set from a robust control invariant (RCI) state set, which we obtain ...

work page 2021