pith. sign in

arxiv: 2506.01167 · v2 · submitted 2025-06-01 · 💻 cs.LG · cs.RO

Accelerated Learning with Linear Temporal Logic using Differentiable Simulation

Pith reviewed 2026-05-19 10:45 UTC · model grok-4.3

classification 💻 cs.LG cs.RO
keywords reinforcement learninglinear temporal logicdifferentiable simulationBuchi automatonformal specificationssafety constraintsgradient-based optimizationcontinuous control
0
0 comments X

The pith

Soft labeling of automaton states makes linear temporal logic rewards differentiable for gradient-based reinforcement learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework that integrates linear temporal logic specifications directly into differentiable simulators for reinforcement learning. By relaxing the discrete transitions of the automaton with soft state labeling, it creates continuous rewards and state features that reduce the sparsity problem common in LTL-based rewards. This approach maintains the soundness of the original formal objective while providing theoretical connections between Büchi acceptance conditions and both discrete and differentiable return values, along with a bound on their difference. Experiments show it accelerates training and improves performance on nonlinear continuous control tasks compared to discrete methods.

Core claim

Our method relaxes discrete automaton transitions via soft labeling of states, yielding differentiable rewards and state representations that mitigate the sparsity issue intrinsic to LTL while preserving objective soundness. We provide theoretical guarantees connecting Büchi acceptance to both discrete and differentiable LTL returns and derive a tunable bound on their discrepancy in deterministic and stochastic settings.

What carries the argument

Soft labeling of states in the automaton, which replaces hard discrete transitions with continuous probabilities to enable gradient propagation through the LTL-based reward computation.

If this is right

  • Substantially accelerates training in complex, nonlinear, contact-rich continuous-control tasks.
  • Achieves up to twice the returns of discrete baselines.
  • Compatible with reward machines for co-safe LTL and LTL_f without modification.
  • Bridges formal methods and deep RL for safe, specification-driven learning in continuous domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach may extend to other formal specification languages beyond LTL by similar relaxation techniques.
  • Future work could explore scaling to higher-dimensional state spaces or integrating with model-based RL methods.
  • The tunable bound on discrepancy suggests opportunities for adaptive relaxation parameters during training.

Load-bearing premise

The soft labeling of automaton states preserves the soundness of the original LTL objective without introducing violations of the specification.

What would settle it

Observing a policy trained with the differentiable LTL reward that violates the original LTL specification in a deterministic environment would falsify the claim that soundness is preserved.

Figures

Figures reproduced from arXiv: 2506.01167 by Alper Kamil Bozkurt, Calin Belta, Ming C. Lin.

Figure 1
Figure 1. Figure 1: LTL Returns and Derivatives. Left: The parking scenario where the car must brake to stop in the parking area without entering the grass field (φp). Middle: LTL satisfaction probability and return estimates from discrete and differentiable LTL formulations as functions of deceleration. Right: LTL return gradients with respect to deceleration and their standard deviation. The key challenge in learning from L… view at source ↗
Figure 2
Figure 2. Figure 2: Task Specification with LTL. This figure illustrates a Cheetah policy learned by SHAC using differentiable rewards derived via our approach from the LTL formula φlegged (10), which specifies accelerating forward, stopping, and maintaining a safe tip-to-ground distance. Specifying the desired behaviors of robots using the high-level language LTL provides is an intuitive alternative to manually designing rew… view at source ↗
Figure 3
Figure 3. Figure 3: Comparison Across Environments: Differentiable vs. Discrete LTL Rewards. The wider plots show the learning curves of all baseline algorithms, while the narrower plots on the right display the maximum returns achieved after 100 M steps. All results are averaged over 5 random seeds, and the curves are smoothed using max and uniform filters for visual clarity. The reported returns, bounded between 0 and 1, se… view at source ↗
Figure 4
Figure 4. Figure 4: Ablation Study for LTL. The maximum returns obtained after 100 M steps for simplified LTL formulas (12), averaged over 5 seeds. Returns (0 to 1) indicate LTL satisfaction probabilities. Under these simpler specifications, both ̸∂RLs and ∂RLs success￾fully learn near-optimal policies. However, as shown in [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Convergence speed comparison of stochastic gradient descent algorithms using [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The ω-automaton derived from φcartpole from (9). Inf(❶) | Fin( ) [Streett 1] 0 "torso_height>0.0" & !"torso_velocity_x>1.5" [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: The ω-automaton derived from φlegged from (10) for the Ant environment. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
read the original abstract

Ensuring that reinforcement learning (RL) controllers satisfy safety and reliability constraints in real-world settings remains challenging: state-avoidance and constrained Markov decision processes often fail to capture trajectory-level requirements or induce overly conservative behavior. Formal specification languages such as linear temporal logic (LTL) offer correct-by-construction objectives, yet their rewards are typically sparse, and heuristic shaping can undermine correctness. We introduce, to our knowledge, the first end-to-end framework that integrates LTL with differentiable simulators, enabling efficient gradient-based learning directly from formal specifications. Our method relaxes discrete automaton transitions via soft labeling of states, yielding differentiable rewards and state representations that mitigate the sparsity issue intrinsic to LTL while preserving objective soundness. We provide theoretical guarantees connecting B\"uchi acceptance to both discrete and differentiable LTL returns and derive a tunable bound on their discrepancy in deterministic and stochastic settings. Empirically, across complex, nonlinear, contact-rich continuous-control tasks, our approach substantially accelerates training and achieves up to twice the returns of discrete baselines. We further demonstrate compatibility with reward machines, thereby covering co-safe LTL and LTL$_\text{f}$ without modification. By rendering automaton-based rewards differentiable, our work bridges formal methods and deep RL, enabling safe, specification-driven learning in continuous domains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. This paper introduces a framework for integrating Linear Temporal Logic (LTL) with differentiable simulators in reinforcement learning. By relaxing discrete automaton transitions through soft labeling of states, it generates differentiable rewards and state representations to address the sparsity of LTL-based rewards while aiming to preserve soundness. Theoretical results connect Büchi acceptance to both discrete and differentiable returns and provide a tunable discrepancy bound for deterministic and stochastic settings. Experiments on nonlinear, contact-rich continuous-control tasks demonstrate faster training and up to twice the returns compared to discrete baselines, with extensions to reward machines for co-safe LTL and LTL_f.

Significance. If the theoretical guarantees and discrepancy bound hold under the considered dynamics, this approach could enable more efficient and correct specification-driven RL in continuous domains, bridging formal methods and deep learning. The empirical acceleration in complex tasks highlights potential practical impact. The provision of theoretical connections and compatibility with reward machines are notable strengths.

major comments (2)
  1. [Abstract / Theoretical Analysis] Abstract and Theoretical Analysis section: The derivation of the tunable bound on the discrepancy between discrete and differentiable LTL returns is not fully detailed. This is load-bearing for the central claim because soundness of the relaxed objective under stochastic nonlinear dynamics (as in the contact-rich tasks) requires explicit conditions on the soft labeling (e.g., Lipschitz continuity or bounded transition variance); without them the bound may fail to control discrepancies when discontinuities alter acceptance paths, as highlighted by the stress-test concern.
  2. [§3 / §5] §3 (Method) and §5 (Theoretical Guarantees): The claim that soft labeling preserves soundness of the original LTL objective needs a concrete statement of the conditions under which the relaxation maintains equivalence to Büchi acceptance; the current presentation leaves open whether post-hoc choices in the labeling temperature affect the bound's validity in deterministic vs. stochastic cases.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'soft labeling of states' would benefit from an immediate parenthetical definition or pointer to the precise relaxation formula to improve accessibility.
  2. The manuscript could add a short paragraph contrasting the approach with prior work on differentiable automata or reward shaping to better situate the novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment in detail below and have revised the theoretical sections to strengthen the presentation of the discrepancy bound and the conditions for soundness.

read point-by-point responses
  1. Referee: [Abstract / Theoretical Analysis] Abstract and Theoretical Analysis section: The derivation of the tunable bound on the discrepancy between discrete and differentiable LTL returns is not fully detailed. This is load-bearing for the central claim because soundness of the relaxed objective under stochastic nonlinear dynamics (as in the contact-rich tasks) requires explicit conditions on the soft labeling (e.g., Lipschitz continuity or bounded transition variance); without them the bound may fail to control discrepancies when discontinuities alter acceptance paths, as highlighted by the stress-test concern.

    Authors: We agree that the derivation of the tunable bound would benefit from greater explicitness. In the revised manuscript, we have expanded the Theoretical Guarantees section with a complete step-by-step derivation. We now state the required assumptions on the soft labeling, including Lipschitz continuity of the labeling function and a bound on transition variance in the stochastic setting. These conditions ensure the bound remains valid under nonlinear dynamics and controls discrepancies arising from discontinuities in acceptance paths. We have also added analysis addressing stress-test scenarios to demonstrate that the bound continues to hold. revision: yes

  2. Referee: [§3 / §5] §3 (Method) and §5 (Theoretical Guarantees): The claim that soft labeling preserves soundness of the original LTL objective needs a concrete statement of the conditions under which the relaxation maintains equivalence to Büchi acceptance; the current presentation leaves open whether post-hoc choices in the labeling temperature affect the bound's validity in deterministic vs. stochastic cases.

    Authors: We concur that a more precise statement of the conditions is warranted. We have added a dedicated theorem in §5 that explicitly characterizes the conditions under which soft labeling maintains equivalence to Büchi acceptance. The theorem delineates the admissible range for the labeling temperature such that the differentiable objective remains sound with respect to the discrete Büchi acceptance condition. This statement applies uniformly to both deterministic and stochastic cases, with the discrepancy bound adjusted to reflect the setting. We clarify that temperature selection must respect these conditions rather than being chosen post-hoc, and we provide practical guidance for satisfying them. revision: yes

Circularity Check

0 steps flagged

Theoretical guarantees on Büchi-to-differentiable return connection are independently derived

full rationale

The paper's central derivation provides theoretical guarantees linking Büchi acceptance to both discrete and differentiable LTL returns, along with a tunable discrepancy bound in deterministic and stochastic settings. This is presented as a first-principles result from the soft-labeling relaxation of automaton transitions, without reduction to fitted parameters, self-referential definitions, or load-bearing self-citations. The abstract and context show no evidence of the bound or soundness preservation being equivalent to inputs by construction; the derivation chain remains self-contained against external formal methods benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence of a tunable bound connecting discrete Büchi acceptance to the soft differentiable return; this bound is introduced without external verification in the abstract.

free parameters (1)
  • soft labeling temperature or relaxation parameter
    Controls the degree of softness in state labeling; its value must be chosen to balance differentiability and fidelity to the original automaton.
axioms (1)
  • domain assumption Büchi acceptance conditions remain approximately preserved under soft state labeling in both deterministic and stochastic dynamics
    Invoked to derive the discrepancy bound stated in the abstract.

pith-pipeline@v0.9.0 · 5758 in / 1320 out tokens · 33425 ms · 2026-05-19T10:45:54.768761+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

86 extracted references · 86 canonical work pages · 3 internal anchors

  1. [1]

    Reinforcement learning in robotics: A survey

    Jens Kober, J Andrew Bagnell, and Jan Peters. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013

  2. [2]

    End-to-end training of deep visuomotor policies

    Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1):1334–1373, 2016

  3. [3]

    Learning dexterous in-hand manipulation

    OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob Mc- Grew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1):3–20, 2020

  4. [4]

    Learning agile and dynamic motor skills for legged robots

    Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26):eaau5872, 2019

  5. [5]

    Learning quadrupedal locomotion over challenging terrain

    Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning quadrupedal locomotion over challenging terrain. Science robotics, 5(47):eabc5986, 2020

  6. [6]

    Socially aware motion planning with deep reinforcement learning

    Yu Fan Chen, Michael Everett, Miao Liu, and Jonathan P How. Socially aware motion planning with deep reinforcement learning. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1343–1350. IEEE, 2017

  7. [7]

    Reinforcement learning in healthcare: A survey

    Chao Yu, Jiming Liu, Shamim Nemati, and Guosheng Yin. Reinforcement learning in healthcare: A survey. ACM Computing Surveys (CSUR), 55(1):1–36, 2021

  8. [8]

    A comprehensive survey on safe reinforcement learning

    Javier Garcıa and Fernando Fernández. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480, 2015

  9. [9]

    A lyapunov-based approach to safe reinforcement learning

    Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, and Mohammad Ghavamzadeh. A lyapunov-based approach to safe reinforcement learning. Advances in neural information processing systems, 31, 2018

  10. [10]

    Responsive safety in reinforcement learning by pid lagrangian methods

    Adam Stooke, Joshua Achiam, and Pieter Abbeel. Responsive safety in reinforcement learning by pid lagrangian methods. InInternational Conference on Machine Learning, pages 9133–9143. PMLR, 2020

  11. [11]

    Provably efficient safe exploration via primal-dual policy optimization

    Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, and Mihailo Jovanovic. Provably efficient safe exploration via primal-dual policy optimization. In International conference on artificial intelligence and statistics, pages 3304–3312. PMLR, 2021

  12. [12]

    Robot reinforcement learning on the constraint manifold

    Puze Liu, Davide Tateo, Haitham Bou Ammar, and Jan Peters. Robot reinforcement learning on the constraint manifold. In Conference on Robot Learning, pages 1357–1366. PMLR, 2022

  13. [13]

    Safe model- based reinforcement learning with stability guarantees

    Felix Berkenkamp, Matteo Turchetta, Angela P Schoellig, and Andreas Krause. Safe model- based reinforcement learning with stability guarantees. NIPS, 2017

  14. [14]

    Fisac, Anayo K

    Jaime F. Fisac, Anayo K. Akametalu, Melanie N. Zeilinger, Shahab Kaynama, Jeremy Gillula, and Claire J. Tomlin. A general safety framework for learning-based control in uncertain robotic systems. TAC, 64(7):2737–2752, 2019

  15. [15]

    Murray, and Joel W

    Richard Cheng, Gabor Orosz, Richard M. Murray, and Joel W. Burdick. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. AAAI, 2019

  16. [16]

    Safe reinforcement learning with model uncertainty estimates

    Björn Lütjens, Michael Everett, and Jonathan P How. Safe reinforcement learning with model uncertainty estimates. ICRA, 2019

  17. [17]

    Fisac, Neil F

    Jaime F. Fisac, Neil F. Lugovoy, Vicenç Rubies-Royo, Shromona Ghosh, and Claire J. Tomlin. Bridging hamilton-jacobi safety analysis and reinforcement learning. ICRA, 00:8550–8556, 2019

  18. [18]

    Gonzalez, Julian Ibarz, Chelsea Finn, and Ken Goldberg

    Brijen Thananjeyan, Ashwin Balakrishna, Suraj Nair, Michael Luo, Krishnan Srinivasan, Minho Hwang, Joseph E. Gonzalez, Julian Ibarz, Chelsea Finn, and Ken Goldberg. Recovery RL: Safe reinforcement learning with learned recovery zones. RA-L, 6(3):4915–4922, 2020

  19. [19]

    Robust model predictive shielding for safe reinforcement learning with stochastic dynamics

    Shuo Li and Osbert Bastani. Robust model predictive shielding for safe reinforcement learning with stochastic dynamics. ICRA, 00:7166–7172, 2020

  20. [20]

    Safe reinforcement learning using robust MPC

    Mario Zanon and Sebastien Gros. Safe reinforcement learning using robust MPC. TAC, 66(8):3638–3652, 2020. 10

  21. [21]

    Mohit Srinivasan, Amogh Dabholkar, Samuel Coogan, and Patricio A. Vela. Synthesis of control barrier functions using a supervised machine learning approach. IROS, 00:7139–7145, 2020

  22. [22]

    Tomlin, and Koushil Sreenath

    Jason Choi, Fernando Castaneda, Claire J. Tomlin, and Koushil Sreenath. Reinforcement learning for safety-critical control under model uncertainty, using control lyapunov functions and control barrier functions. RSS, 2020

  23. [23]

    Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios

    Tingxiang Fan, Pinxin Long, Wenxi Liu, and Jia Pan. Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios. Journal of Robotics Research, 39(7):856–892, 2020

  24. [24]

    Learning safe multi-agent control with decentralized neural barrier certificates

    Zengyi Qin, Kaiqing Zhang, Yuxiao Chen, Jingkai Chen, and Chuchu Fan. Learning safe multi-agent control with decentralized neural barrier certificates. ICLR, 2021

  25. [25]

    Model-free safe control for zero-violation reinforce- ment learning

    Weiye Zhao, Tairan He, and Changliu Liu. Model-free safe control for zero-violation reinforce- ment learning. CoRL, 2021

  26. [26]

    Safe control with learned certificates: A survey of neural lyapunov, barrier, and contraction methods for robotics and control

    Charles Dawson, Sicun Gao, and Chuchu Fan. Safe control with learned certificates: A survey of neural lyapunov, barrier, and contraction methods for robotics and control. T-RO, 39(3):1749–1767, 2023

  27. [27]

    Santiago Paternain, Miguel Calvo-Fullana, Luiz F. O. Chamon, and Alejandro Ribeiro. Safe policies for reinforcement learning via primal-dual methods. TAC, 68(3):1321–1336, 2023

  28. [28]

    Omega-regular objectives in model-free reinforcement learning

    Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. Omega-regular objectives in model-free reinforcement learning. In Proceedings of the 25th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 395–412, 2019

  29. [29]

    A. K. Bozkurt, Y . Wang, M. M. Zavlanos, and M. Pajic. Control synthesis from linear temporal logic specifications using model-free reinforcement learning. In International Conference on Robotics and Automation (ICRA), pages 10349–10355, 2020

  30. [30]

    A. K. Bozkurt, Y . Wang, M. M. Zavlanos, and M. Pajic. Model-free reinforcement learning for stochastic games with linear temporal logic objectives. In International Conference on Robotics and Automation (ICRA), pages 10649–10655. IEEE, 2021

  31. [31]

    A. K. Bozkurt, Y . Wang, and M. Pajic. Secure planning against stealthy attacks via model-free reinforcement learning. In International Conference on Robotics and Automation (ICRA), pages 10656–10662. IEEE, 2021

  32. [32]

    A. K. Bozkurt, Y . Wang, and M. Pajic. Model-free learning of safe yet effective controllers. In Conference on Decision and Control (CDC), pages 6560–6565. IEEE, 2021

  33. [33]

    A. K. Bozkurt, Y . Wang, M. M. Zavlanos, and M. Pajic. Learning optimal controllers for temporal logic specifications in stochastic games. Transactions on Automatic Control (TAC), 2024

  34. [34]

    A formal methods approach to inter- pretable reinforcement learning for robotic planning

    Xiao Li, Zachary Serlin, Guang Yang, and Calin Belta. A formal methods approach to inter- pretable reinforcement learning for robotic planning. Science Robotics, 4(37), 2019

  35. [35]

    Modular deep reinforcement learning for continuous motion planning with temporal logic

    Mingyu Cai, Mohammadhosein Hasanbeig, Shaoping Xiao, Alessandro Abate, and Zhen Kan. Modular deep reinforcement learning for continuous motion planning with temporal logic. RA-L, 6(4):7973–7980, 2021

  36. [36]

    Reinforcement learning based temporal logic control with maximum probabilistic satisfaction

    Mingyu Cai, Shaoping Xiao, Baoluo Li, Zhiliang Li, and Zhen Kan. Reinforcement learning based temporal logic control with maximum probabilistic satisfaction. ICRA, 00:806–812, 2021

  37. [37]

    Reward machines: Exploiting reward function structure in reinforcement learning

    Rodrigo Toro Icarte, Toryn Q Klassen, Richard Valenzano, and Sheila A McIlraith. Reward machines: Exploiting reward function structure in reinforcement learning. JAIR, 2022

  38. [38]

    Accelerated reinforcement learning for temporal logic control objectives

    Yiannis Kantaros. Accelerated reinforcement learning for temporal logic control objectives. IROS, 00:5077–5082, 2022

  39. [39]

    Policy optimization with linear temporal logic constraints

    Cameron V oloshin, Hoang M Le, Swarat Chaudhuri, and Yisong Yue. Policy optimization with linear temporal logic constraints. NeurIPS, 2022

  40. [40]

    On the (in)tractability of reinforcement learning for LTL objectives

    Cambridge Yang, Michael Littman, and Michael Carbin. On the (in)tractability of reinforcement learning for LTL objectives. IJCAI, 2022. 11

  41. [41]

    Safe reinforcement learning under temporal logic with reward design and quantum action selection

    Mingyu Cai, Shaoping Xiao, Junchao Li, and Zhen Kan. Safe reinforcement learning under temporal logic with reward design and quantum action selection. Scientific Reports, 13(1):1925, 2023

  42. [42]

    Certified reinforcement learning with logic guidance

    Hosein Hasanbeig, Daniel Kroening, and Alessandro Abate. Certified reinforcement learning with logic guidance. Artificial Intelligence, 322:103949, 2023

  43. [43]

    Overcoming exploration: Deep reinforcement learning for continuous control in cluttered environments from temporal logic specifications

    Mingyu Cai, Erfan Aasi, Calin Belta, and Cristian-Ioan Vasile. Overcoming exploration: Deep reinforcement learning for continuous control in cluttered environments from temporal logic specifications. RA-L, 8(4):2158–2165, 2023

  44. [44]

    Security-aware reinforcement learning under linear temporal logic specifications

    Bohan Cui, Keyi Zhu, Shaoyuan Li, and Xiang Yin. Security-aware reinforcement learning under linear temporal logic specifications. ICRA, 00:12367–12373, 2023

  45. [45]

    Eventual discounting temporal logic counterfactual experience replay

    Cameron V oloshin, Abhinav Verma, and Yisong Yue. Eventual discounting temporal logic counterfactual experience replay. ICML, 2023

  46. [46]

    Sample efficient model-free reinforcement learning from LTL specifications with optimality guarantees

    Daqian Shao and Marta Kwiatkowska. Sample efficient model-free reinforcement learning from LTL specifications with optimality guarantees. arXiv, 2023

  47. [47]

    Reinforcement learning under temporal logic constraints as a sequence modeling problem

    Daiying Tian, Hao Fang, Qingkai Yang, Haoyong Yu, Wenyu Liang, and Yan Wu. Reinforcement learning under temporal logic constraints as a sequence modeling problem. Robotics and Autonomous Systems, 161:104351, 2023

  48. [48]

    Verginis, Cevahir Koprulu, Sandeep Chinchali, and Ufuk Topcu

    Christos K. Verginis, Cevahir Koprulu, Sandeep Chinchali, and Ufuk Topcu. Joint learning of reward machines and policies in environments with partially known semantics. Artificial Intelligence, 333:104146, 2024

  49. [49]

    Reinforcement learning with LTL and $\ omega$-regular objectives via optimality-preserving translation to average rewards

    Xuan-Bach Le, Dominik Wagner, Leon Witzman, Alexander Rabinovich, and Luke Ong. Reinforcement learning with LTL and $\ omega$-regular objectives via optimality-preserving translation to average rewards. NeurIPS, 2024

  50. [50]

    A PAC learning algorithm for LTL and omega-regular objectives in MDPs

    Mateo Perez, Fabio Somenzi, and Ashutosh Trivedi. A PAC learning algorithm for LTL and omega-regular objectives in MDPs. AAAI, 38(19):21510–21517, 2024

  51. [51]

    Concrete Problems in AI Safety

    Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. Concrete problems in ai safety. arXiv preprint arXiv:1606.06565, 2016

  52. [52]

    Defining and characterizing reward gaming

    Joar Skalse, Nikolaus Howe, Dmitrii Krasheninnikov, and David Krueger. Defining and characterizing reward gaming. Advances in Neural Information Processing Systems, 35:9460– 9471, 2022

  53. [53]

    Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments

    Yixuan Wang, Simon Sinong Zhan, Ruochen Jiao, Zhilu Wang, Wanxin Jin, Zhuoran Yang, Zhaoran Wang, Chao Huang, and Qi Zhu. Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments. In International Conference on Machine Learning, pages 36593–36604. PMLR, 2023

  54. [54]

    Reachability constrained reinforcement learning

    Dongjie Yu, Haitong Ma, Shengbo Li, and Jianyu Chen. Reachability constrained reinforcement learning. In International conference on machine learning, pages 25636–25655. PMLR, 2022

  55. [55]

    Safety and liveness guarantees through reach-avoid reinforcement learning

    Kai-Chieh Hsu, Vicenç Rubies-Royo, Claire J Tomlin, and Jaime F Fisac. Safety and liveness guarantees through reach-avoid reinforcement learning. RSS, 2021

  56. [56]

    Aksaray, A

    D. Aksaray, A. Jones, Z. Kong, M. Schwager, and C. Belta. Q-learning for robust satisfaction of signal temporal logic specifications. In 2016 IEEE 55th Conference on Decision and Control (CDC), pages 6565–6570, Dec 2016

  57. [57]

    Analytical derivatives of rigid body dynamics algorithms

    Justin Carpentier and Nicolas Mansard. Analytical derivatives of rigid body dynamics algorithms. RSS, 2018

  58. [58]

    ADD: Analytically differentiable dynamics for multi-body systems with frictional contact

    Moritz Geilinger, David Hahn, Jonas Zehnder, Moritz Bacher, Bernhard Thomaszewski, and Stelian Coros. ADD: Analytically differentiable dynamics for multi-body systems with frictional contact. TOG, 2020

  59. [59]

    Efficient differentiable simulation of articulated bodies

    Yi-Ling Qiao, Junbang Liang, Vladlen Koltun, and Ming C Lin. Efficient differentiable simulation of articulated bodies. ICML, 2021

  60. [60]

    An end-to-end differentiable framework for contact-aware robot design

    Jie Xu, Tao Chen, Lara Zlokapa, Michael Foshey, Wojciech Matusik, Shinjiro Sueda, and Pulkit Agrawal. An end-to-end differentiable framework for contact-aware robot design. RSS, 2021

  61. [61]

    Fast and feature-complete differentiable physics for articulated rigid bodies with contact

    Keenon Werling, Dalton Omens, Jeongseok Lee, Ioannis Exarchos, and C Karen Liu. Fast and feature-complete differentiable physics for articulated rigid bodies with contact. RSS, 2021. 12

  62. [62]

    DiSECt: A differentiable simulation engine for autonomous robotic cutting

    Eric Heiden, Miles Macklin, Yashraj Narang, Dieter Fox, Animesh Garg, and Fabio Ramos. DiSECt: A differentiable simulation engine for autonomous robotic cutting. RSS, 2021

  63. [63]

    Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem

    C. Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem. Brax - a differentiable physics engine for large scale rigid body simulation. NeurIPS, 2021

  64. [64]

    PODS: Policy optimization via differentiable simulation

    Miguel Zamora, Momchil Peychev, Sehoon Ha, Martin Vechev, and Stelian Coros. PODS: Policy optimization via differentiable simulation. ICML, 2021

  65. [65]

    DiffPD: Differentiable projective dynamics

    Tao Du, Kui Wu, Pingchuan Ma, Sebastien Wah, Andrew Spielberg, Daniela Rus, and Wojciech Matusik. DiffPD: Differentiable projective dynamics. TOG, 41(2):1–21, 2021

  66. [66]

    PlasticineLab: A soft-body manipulation benchmark with differentiable physics

    Zhiao Huang, Yuanming Hu, Tao Du, Siyuan Zhou, Hao Su, Joshua B Tenenbaum, and Chuang Gan. PlasticineLab: A soft-body manipulation benchmark with differentiable physics. ICLR, 2021

  67. [67]

    DiffTaichi: Differentiable programming for physical simulation

    Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, and Frédo Durand. DiffTaichi: Differentiable programming for physical simulation. ICLR, 2020

  68. [68]

    Lin, and Vladlen Koltun

    Junbang Liang, Ming C. Lin, and Vladlen Koltun. Differentiable cloth simulation for inverse problems. NeurIPS, pages 1–22, 2019

  69. [69]

    Tenenbaum, William T

    Yuanming Hu, Jiancheng Liu, Andrew Spielberg, Joshua B. Tenenbaum, William T. Freeman, Jiajun Wu, Daniela Rus, and Wojciech Matusik. ChainQueen: A real-time differentiable physical simulator for soft robotics. ICRA, 00:6265–6271, 2019

  70. [70]

    Gradients are not all you need

    Luke Metz, C Daniel Freeman, Samuel S Schoenholz, and Tal Kachman. Gradients are not all you need. arXiv preprint arXiv:2111.05803, 2021

  71. [71]

    Pipps: Flexible model- based policy search robust to the curse of chaos

    Paavo Parmas, Carl Edward Rasmussen, Jan Peters, and Kenji Doya. Pipps: Flexible model- based policy search robust to the curse of chaos. In International Conference on Machine Learning, pages 4065–4074. PMLR, 2018

  72. [72]

    Do differentiable simulators give better policy gradients? In International Conference on Machine Learning, pages 20668– 20696

    Hyung Ju Suh, Max Simchowitz, Kaiqing Zhang, and Russ Tedrake. Do differentiable simulators give better policy gradients? In International Conference on Machine Learning, pages 20668– 20696. PMLR, 2022

  73. [73]

    Accelerated policy learning with parallel differentiable simulation

    Jie Xu, Viktor Makoviychuk, Yashraj Narang, Fabio Ramos, Wojciech Matusik, Animesh Garg, and Miles Macklin. Accelerated policy learning with parallel differentiable simulation. ICLR, 2022

  74. [74]

    Adaptive horizon actor-critic for policy learning in contact-rich differentiable simulation

    Ignat Georgiev, Krishnan Srinivasan, Jie Xu, Eric Heiden, and Animesh Garg. Adaptive horizon actor-critic for policy learning in contact-rich differentiable simulation. ICML, 2024

  75. [75]

    Sanghyun Son, Laura Yu Zheng, Ryan Sullivan, Yi-Ling Qiao, and Ming C. Lin. Gradient informed proximal policy optimization. NeurIPS, 2023

  76. [76]

    Backpropagation through signal temporal logic specifications: Infusing logical structure into gradient-based methods

    Karen Leung, Nikos Aréchiga, and Marco Pavone. Backpropagation through signal temporal logic specifications: Infusing logical structure into gradient-based methods. The International Journal of Robotics Research, 42(6):356–370, 2023

  77. [77]

    Signal temporal logic neural predictive control

    Yue Meng and Chuchu Fan. Signal temporal logic neural predictive control. RAL, 8(11):7719– 7726, 2023

  78. [78]

    Principles of Model Checking

    Christel Baier and Joost-Pieter Katoen. Principles of Model Checking. MIT Press, Cambridge, MA, USA, 2008

  79. [79]

    Limit-deterministic Büchi automata for linear temporal logic

    Salomon Sickert, Javier Esparza, Stefan Jaax, and Jan Kˇretínský. Limit-deterministic Büchi automata for linear temporal logic. In Swarat Chaudhuri and Azadeh Farzan, editors, Computer Aided Verification, pages 312–332, Cham, 2016. Springer International Publishing

  80. [80]

    Henzinger

    Krishnendu Chatterjee and Thomas A. Henzinger. A survey of stochastic ω-regular games. Journal of Computer and System Sciences, 78(2):394–413, 2012

Showing first 80 references.