Accelerated Learning with Linear Temporal Logic using Differentiable Simulation

Alper Kamil Bozkurt; Calin Belta; Ming C. Lin

arxiv: 2506.01167 · v2 · submitted 2025-06-01 · 💻 cs.LG · cs.RO

Accelerated Learning with Linear Temporal Logic using Differentiable Simulation

Alper Kamil Bozkurt , Calin Belta , Ming C. Lin This is my paper

Pith reviewed 2026-05-19 10:45 UTC · model grok-4.3

classification 💻 cs.LG cs.RO

keywords reinforcement learninglinear temporal logicdifferentiable simulationBuchi automatonformal specificationssafety constraintsgradient-based optimizationcontinuous control

0 comments

The pith

Soft labeling of automaton states makes linear temporal logic rewards differentiable for gradient-based reinforcement learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a framework that integrates linear temporal logic specifications directly into differentiable simulators for reinforcement learning. By relaxing the discrete transitions of the automaton with soft state labeling, it creates continuous rewards and state features that reduce the sparsity problem common in LTL-based rewards. This approach maintains the soundness of the original formal objective while providing theoretical connections between Büchi acceptance conditions and both discrete and differentiable return values, along with a bound on their difference. Experiments show it accelerates training and improves performance on nonlinear continuous control tasks compared to discrete methods.

Core claim

Our method relaxes discrete automaton transitions via soft labeling of states, yielding differentiable rewards and state representations that mitigate the sparsity issue intrinsic to LTL while preserving objective soundness. We provide theoretical guarantees connecting Büchi acceptance to both discrete and differentiable LTL returns and derive a tunable bound on their discrepancy in deterministic and stochastic settings.

What carries the argument

Soft labeling of states in the automaton, which replaces hard discrete transitions with continuous probabilities to enable gradient propagation through the LTL-based reward computation.

If this is right

Substantially accelerates training in complex, nonlinear, contact-rich continuous-control tasks.
Achieves up to twice the returns of discrete baselines.
Compatible with reward machines for co-safe LTL and LTL_f without modification.
Bridges formal methods and deep RL for safe, specification-driven learning in continuous domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach may extend to other formal specification languages beyond LTL by similar relaxation techniques.
Future work could explore scaling to higher-dimensional state spaces or integrating with model-based RL methods.
The tunable bound on discrepancy suggests opportunities for adaptive relaxation parameters during training.

Load-bearing premise

The soft labeling of automaton states preserves the soundness of the original LTL objective without introducing violations of the specification.

What would settle it

Observing a policy trained with the differentiable LTL reward that violates the original LTL specification in a deterministic environment would falsify the claim that soundness is preserved.

Figures

Figures reproduced from arXiv: 2506.01167 by Alper Kamil Bozkurt, Calin Belta, Ming C. Lin.

**Figure 1.** Figure 1: LTL Returns and Derivatives. Left: The parking scenario where the car must brake to stop in the parking area without entering the grass field (φp). Middle: LTL satisfaction probability and return estimates from discrete and differentiable LTL formulations as functions of deceleration. Right: LTL return gradients with respect to deceleration and their standard deviation. The key challenge in learning from L… view at source ↗

**Figure 2.** Figure 2: Task Specification with LTL. This figure illustrates a Cheetah policy learned by SHAC using differentiable rewards derived via our approach from the LTL formula φlegged (10), which specifies accelerating forward, stopping, and maintaining a safe tip-to-ground distance. Specifying the desired behaviors of robots using the high-level language LTL provides is an intuitive alternative to manually designing rew… view at source ↗

**Figure 3.** Figure 3: Comparison Across Environments: Differentiable vs. Discrete LTL Rewards. The wider plots show the learning curves of all baseline algorithms, while the narrower plots on the right display the maximum returns achieved after 100 M steps. All results are averaged over 5 random seeds, and the curves are smoothed using max and uniform filters for visual clarity. The reported returns, bounded between 0 and 1, se… view at source ↗

**Figure 4.** Figure 4: Ablation Study for LTL. The maximum returns obtained after 100 M steps for simplified LTL formulas (12), averaged over 5 seeds. Returns (0 to 1) indicate LTL satisfaction probabilities. Under these simpler specifications, both ̸∂RLs and ∂RLs successfully learn near-optimal policies. However, as shown in [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Convergence speed comparison of stochastic gradient descent algorithms using [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: The ω-automaton derived from φcartpole from (9). Inf(❶) | Fin( ) [Streett 1] 0 "torso_height>0.0" & !"torso_velocity_x>1.5" [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: The ω-automaton derived from φlegged from (10) for the Ant environment. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

read the original abstract

Ensuring that reinforcement learning (RL) controllers satisfy safety and reliability constraints in real-world settings remains challenging: state-avoidance and constrained Markov decision processes often fail to capture trajectory-level requirements or induce overly conservative behavior. Formal specification languages such as linear temporal logic (LTL) offer correct-by-construction objectives, yet their rewards are typically sparse, and heuristic shaping can undermine correctness. We introduce, to our knowledge, the first end-to-end framework that integrates LTL with differentiable simulators, enabling efficient gradient-based learning directly from formal specifications. Our method relaxes discrete automaton transitions via soft labeling of states, yielding differentiable rewards and state representations that mitigate the sparsity issue intrinsic to LTL while preserving objective soundness. We provide theoretical guarantees connecting B\"uchi acceptance to both discrete and differentiable LTL returns and derive a tunable bound on their discrepancy in deterministic and stochastic settings. Empirically, across complex, nonlinear, contact-rich continuous-control tasks, our approach substantially accelerates training and achieves up to twice the returns of discrete baselines. We further demonstrate compatibility with reward machines, thereby covering co-safe LTL and LTL$_\text{f}$ without modification. By rendering automaton-based rewards differentiable, our work bridges formal methods and deep RL, enabling safe, specification-driven learning in continuous domains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper's soft labeling of LTL automata enables differentiable RL from formal specs with supporting theory and experiments, but the bound's behavior in noisy stochastic dynamics is worth verifying.

read the letter

The punchline for this paper is that it relaxes LTL automata with soft state labels to enable end-to-end gradient-based learning from formal specs in differentiable simulators for continuous control. They achieve this by creating differentiable rewards and state representations that reduce the sparsity typical of LTL while keeping the objective sound. The theoretical part connects Büchi acceptance to the returns in both discrete and differentiable cases and gives a tunable bound on the discrepancy for deterministic and stochastic dynamics. Empirically, the method accelerates training and achieves up to twice the returns of discrete baselines on complex nonlinear contact-rich tasks. It also integrates with reward machines to handle co-safe LTL and LTL_f without extra work. This approach does well at bridging formal methods and deep RL. The experiments target relevant domains like robotics, and the results suggest practical benefits over heuristic reward shaping. The abstract frames it as the first such end-to-end integration, which aligns with the cited prior work on discrete LTL and reward machines. A soft spot is the load-bearing assumption about the discrepancy bound holding under stochastic nonlinear dynamics. The stress-test concern is valid in principle because simulator discontinuities or high variance could alter acceptance paths and loosen the bound. However, since the paper states the bound for stochastic settings and reports gains on those tasks, the empirical evidence likely shows it works in practice. Still, more details on the derivation and parameter choices would strengthen the soundness claim. This paper is for researchers combining formal specifications with reinforcement learning in continuous domains. A reader focused on safety-critical control would find value in the method for avoiding conservative behaviors from other constrained RL approaches. It deserves a serious referee given the grounded theory and concrete results on challenging tasks. I recommend sending it to peer review.

Referee Report

2 major / 2 minor

Summary. This paper introduces a framework for integrating Linear Temporal Logic (LTL) with differentiable simulators in reinforcement learning. By relaxing discrete automaton transitions through soft labeling of states, it generates differentiable rewards and state representations to address the sparsity of LTL-based rewards while aiming to preserve soundness. Theoretical results connect Büchi acceptance to both discrete and differentiable returns and provide a tunable discrepancy bound for deterministic and stochastic settings. Experiments on nonlinear, contact-rich continuous-control tasks demonstrate faster training and up to twice the returns compared to discrete baselines, with extensions to reward machines for co-safe LTL and LTL_f.

Significance. If the theoretical guarantees and discrepancy bound hold under the considered dynamics, this approach could enable more efficient and correct specification-driven RL in continuous domains, bridging formal methods and deep learning. The empirical acceleration in complex tasks highlights potential practical impact. The provision of theoretical connections and compatibility with reward machines are notable strengths.

major comments (2)

[Abstract / Theoretical Analysis] Abstract and Theoretical Analysis section: The derivation of the tunable bound on the discrepancy between discrete and differentiable LTL returns is not fully detailed. This is load-bearing for the central claim because soundness of the relaxed objective under stochastic nonlinear dynamics (as in the contact-rich tasks) requires explicit conditions on the soft labeling (e.g., Lipschitz continuity or bounded transition variance); without them the bound may fail to control discrepancies when discontinuities alter acceptance paths, as highlighted by the stress-test concern.
[§3 / §5] §3 (Method) and §5 (Theoretical Guarantees): The claim that soft labeling preserves soundness of the original LTL objective needs a concrete statement of the conditions under which the relaxation maintains equivalence to Büchi acceptance; the current presentation leaves open whether post-hoc choices in the labeling temperature affect the bound's validity in deterministic vs. stochastic cases.

minor comments (2)

[Abstract] Abstract: The phrase 'soft labeling of states' would benefit from an immediate parenthetical definition or pointer to the precise relaxation formula to improve accessibility.
The manuscript could add a short paragraph contrasting the approach with prior work on differentiable automata or reward shaping to better situate the novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment in detail below and have revised the theoretical sections to strengthen the presentation of the discrepancy bound and the conditions for soundness.

read point-by-point responses

Referee: [Abstract / Theoretical Analysis] Abstract and Theoretical Analysis section: The derivation of the tunable bound on the discrepancy between discrete and differentiable LTL returns is not fully detailed. This is load-bearing for the central claim because soundness of the relaxed objective under stochastic nonlinear dynamics (as in the contact-rich tasks) requires explicit conditions on the soft labeling (e.g., Lipschitz continuity or bounded transition variance); without them the bound may fail to control discrepancies when discontinuities alter acceptance paths, as highlighted by the stress-test concern.

Authors: We agree that the derivation of the tunable bound would benefit from greater explicitness. In the revised manuscript, we have expanded the Theoretical Guarantees section with a complete step-by-step derivation. We now state the required assumptions on the soft labeling, including Lipschitz continuity of the labeling function and a bound on transition variance in the stochastic setting. These conditions ensure the bound remains valid under nonlinear dynamics and controls discrepancies arising from discontinuities in acceptance paths. We have also added analysis addressing stress-test scenarios to demonstrate that the bound continues to hold. revision: yes
Referee: [§3 / §5] §3 (Method) and §5 (Theoretical Guarantees): The claim that soft labeling preserves soundness of the original LTL objective needs a concrete statement of the conditions under which the relaxation maintains equivalence to Büchi acceptance; the current presentation leaves open whether post-hoc choices in the labeling temperature affect the bound's validity in deterministic vs. stochastic cases.

Authors: We concur that a more precise statement of the conditions is warranted. We have added a dedicated theorem in §5 that explicitly characterizes the conditions under which soft labeling maintains equivalence to Büchi acceptance. The theorem delineates the admissible range for the labeling temperature such that the differentiable objective remains sound with respect to the discrete Büchi acceptance condition. This statement applies uniformly to both deterministic and stochastic cases, with the discrepancy bound adjusted to reflect the setting. We clarify that temperature selection must respect these conditions rather than being chosen post-hoc, and we provide practical guidance for satisfying them. revision: yes

Circularity Check

0 steps flagged

Theoretical guarantees on Büchi-to-differentiable return connection are independently derived

full rationale

The paper's central derivation provides theoretical guarantees linking Büchi acceptance to both discrete and differentiable LTL returns, along with a tunable discrepancy bound in deterministic and stochastic settings. This is presented as a first-principles result from the soft-labeling relaxation of automaton transitions, without reduction to fitted parameters, self-referential definitions, or load-bearing self-citations. The abstract and context show no evidence of the bound or soundness preservation being equivalent to inputs by construction; the derivation chain remains self-contained against external formal methods benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence of a tunable bound connecting discrete Büchi acceptance to the soft differentiable return; this bound is introduced without external verification in the abstract.

free parameters (1)

soft labeling temperature or relaxation parameter
Controls the degree of softness in state labeling; its value must be chosen to balance differentiability and fidelity to the original automaton.

axioms (1)

domain assumption Büchi acceptance conditions remain approximately preserved under soft state labeling in both deterministic and stochastic dynamics
Invoked to derive the discrepancy bound stated in the abstract.

pith-pipeline@v0.9.0 · 5758 in / 1320 out tokens · 33425 ms · 2026-05-19T10:45:54.768761+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 1... lim γ→1 E[G(σ)] = Pr(σ |= □◇B) with state-dependent R and Γ on accepting states B

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

86 extracted references · 86 canonical work pages · 3 internal anchors

[1]

Reinforcement learning in robotics: A survey

Jens Kober, J Andrew Bagnell, and Jan Peters. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013

work page 2013
[2]

End-to-end training of deep visuomotor policies

Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1):1334–1373, 2016

work page 2016
[3]

Learning dexterous in-hand manipulation

OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob Mc- Grew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1):3–20, 2020

work page 2020
[4]

Learning agile and dynamic motor skills for legged robots

Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26):eaau5872, 2019

work page 2019
[5]

Learning quadrupedal locomotion over challenging terrain

Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning quadrupedal locomotion over challenging terrain. Science robotics, 5(47):eabc5986, 2020

work page 2020
[6]

Socially aware motion planning with deep reinforcement learning

Yu Fan Chen, Michael Everett, Miao Liu, and Jonathan P How. Socially aware motion planning with deep reinforcement learning. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1343–1350. IEEE, 2017

work page 2017
[7]

Reinforcement learning in healthcare: A survey

Chao Yu, Jiming Liu, Shamim Nemati, and Guosheng Yin. Reinforcement learning in healthcare: A survey. ACM Computing Surveys (CSUR), 55(1):1–36, 2021

work page 2021
[8]

A comprehensive survey on safe reinforcement learning

Javier Garcıa and Fernando Fernández. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480, 2015

work page 2015
[9]

A lyapunov-based approach to safe reinforcement learning

Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, and Mohammad Ghavamzadeh. A lyapunov-based approach to safe reinforcement learning. Advances in neural information processing systems, 31, 2018

work page 2018
[10]

Responsive safety in reinforcement learning by pid lagrangian methods

Adam Stooke, Joshua Achiam, and Pieter Abbeel. Responsive safety in reinforcement learning by pid lagrangian methods. InInternational Conference on Machine Learning, pages 9133–9143. PMLR, 2020

work page 2020
[11]

Provably efficient safe exploration via primal-dual policy optimization

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, and Mihailo Jovanovic. Provably efficient safe exploration via primal-dual policy optimization. In International conference on artificial intelligence and statistics, pages 3304–3312. PMLR, 2021

work page 2021
[12]

Robot reinforcement learning on the constraint manifold

Puze Liu, Davide Tateo, Haitham Bou Ammar, and Jan Peters. Robot reinforcement learning on the constraint manifold. In Conference on Robot Learning, pages 1357–1366. PMLR, 2022

work page 2022
[13]

Safe model- based reinforcement learning with stability guarantees

Felix Berkenkamp, Matteo Turchetta, Angela P Schoellig, and Andreas Krause. Safe model- based reinforcement learning with stability guarantees. NIPS, 2017

work page 2017
[14]

Fisac, Anayo K

Jaime F. Fisac, Anayo K. Akametalu, Melanie N. Zeilinger, Shahab Kaynama, Jeremy Gillula, and Claire J. Tomlin. A general safety framework for learning-based control in uncertain robotic systems. TAC, 64(7):2737–2752, 2019

work page 2019
[15]

Murray, and Joel W

Richard Cheng, Gabor Orosz, Richard M. Murray, and Joel W. Burdick. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. AAAI, 2019

work page 2019
[16]

Safe reinforcement learning with model uncertainty estimates

Björn Lütjens, Michael Everett, and Jonathan P How. Safe reinforcement learning with model uncertainty estimates. ICRA, 2019

work page 2019
[17]

Fisac, Neil F

Jaime F. Fisac, Neil F. Lugovoy, Vicenç Rubies-Royo, Shromona Ghosh, and Claire J. Tomlin. Bridging hamilton-jacobi safety analysis and reinforcement learning. ICRA, 00:8550–8556, 2019

work page 2019
[18]

Gonzalez, Julian Ibarz, Chelsea Finn, and Ken Goldberg

Brijen Thananjeyan, Ashwin Balakrishna, Suraj Nair, Michael Luo, Krishnan Srinivasan, Minho Hwang, Joseph E. Gonzalez, Julian Ibarz, Chelsea Finn, and Ken Goldberg. Recovery RL: Safe reinforcement learning with learned recovery zones. RA-L, 6(3):4915–4922, 2020

work page 2020
[19]

Robust model predictive shielding for safe reinforcement learning with stochastic dynamics

Shuo Li and Osbert Bastani. Robust model predictive shielding for safe reinforcement learning with stochastic dynamics. ICRA, 00:7166–7172, 2020

work page 2020
[20]

Safe reinforcement learning using robust MPC

Mario Zanon and Sebastien Gros. Safe reinforcement learning using robust MPC. TAC, 66(8):3638–3652, 2020. 10

work page 2020
[21]

Mohit Srinivasan, Amogh Dabholkar, Samuel Coogan, and Patricio A. Vela. Synthesis of control barrier functions using a supervised machine learning approach. IROS, 00:7139–7145, 2020

work page 2020
[22]

Tomlin, and Koushil Sreenath

Jason Choi, Fernando Castaneda, Claire J. Tomlin, and Koushil Sreenath. Reinforcement learning for safety-critical control under model uncertainty, using control lyapunov functions and control barrier functions. RSS, 2020

work page 2020
[23]

Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios

Tingxiang Fan, Pinxin Long, Wenxi Liu, and Jia Pan. Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios. Journal of Robotics Research, 39(7):856–892, 2020

work page 2020
[24]

Learning safe multi-agent control with decentralized neural barrier certificates

Zengyi Qin, Kaiqing Zhang, Yuxiao Chen, Jingkai Chen, and Chuchu Fan. Learning safe multi-agent control with decentralized neural barrier certificates. ICLR, 2021

work page 2021
[25]

Model-free safe control for zero-violation reinforce- ment learning

Weiye Zhao, Tairan He, and Changliu Liu. Model-free safe control for zero-violation reinforce- ment learning. CoRL, 2021

work page 2021
[26]

Safe control with learned certificates: A survey of neural lyapunov, barrier, and contraction methods for robotics and control

Charles Dawson, Sicun Gao, and Chuchu Fan. Safe control with learned certificates: A survey of neural lyapunov, barrier, and contraction methods for robotics and control. T-RO, 39(3):1749–1767, 2023

work page 2023
[27]

Santiago Paternain, Miguel Calvo-Fullana, Luiz F. O. Chamon, and Alejandro Ribeiro. Safe policies for reinforcement learning via primal-dual methods. TAC, 68(3):1321–1336, 2023

work page 2023
[28]

Omega-regular objectives in model-free reinforcement learning

Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. Omega-regular objectives in model-free reinforcement learning. In Proceedings of the 25th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 395–412, 2019

work page 2019
[29]

A. K. Bozkurt, Y . Wang, M. M. Zavlanos, and M. Pajic. Control synthesis from linear temporal logic specifications using model-free reinforcement learning. In International Conference on Robotics and Automation (ICRA), pages 10349–10355, 2020

work page 2020
[30]

A. K. Bozkurt, Y . Wang, M. M. Zavlanos, and M. Pajic. Model-free reinforcement learning for stochastic games with linear temporal logic objectives. In International Conference on Robotics and Automation (ICRA), pages 10649–10655. IEEE, 2021

work page 2021
[31]

A. K. Bozkurt, Y . Wang, and M. Pajic. Secure planning against stealthy attacks via model-free reinforcement learning. In International Conference on Robotics and Automation (ICRA), pages 10656–10662. IEEE, 2021

work page 2021
[32]

A. K. Bozkurt, Y . Wang, and M. Pajic. Model-free learning of safe yet effective controllers. In Conference on Decision and Control (CDC), pages 6560–6565. IEEE, 2021

work page 2021
[33]

A. K. Bozkurt, Y . Wang, M. M. Zavlanos, and M. Pajic. Learning optimal controllers for temporal logic specifications in stochastic games. Transactions on Automatic Control (TAC), 2024

work page 2024
[34]

A formal methods approach to inter- pretable reinforcement learning for robotic planning

Xiao Li, Zachary Serlin, Guang Yang, and Calin Belta. A formal methods approach to inter- pretable reinforcement learning for robotic planning. Science Robotics, 4(37), 2019

work page 2019
[35]

Modular deep reinforcement learning for continuous motion planning with temporal logic

Mingyu Cai, Mohammadhosein Hasanbeig, Shaoping Xiao, Alessandro Abate, and Zhen Kan. Modular deep reinforcement learning for continuous motion planning with temporal logic. RA-L, 6(4):7973–7980, 2021

work page 2021
[36]

Reinforcement learning based temporal logic control with maximum probabilistic satisfaction

Mingyu Cai, Shaoping Xiao, Baoluo Li, Zhiliang Li, and Zhen Kan. Reinforcement learning based temporal logic control with maximum probabilistic satisfaction. ICRA, 00:806–812, 2021

work page 2021
[37]

Reward machines: Exploiting reward function structure in reinforcement learning

Rodrigo Toro Icarte, Toryn Q Klassen, Richard Valenzano, and Sheila A McIlraith. Reward machines: Exploiting reward function structure in reinforcement learning. JAIR, 2022

work page 2022
[38]

Accelerated reinforcement learning for temporal logic control objectives

Yiannis Kantaros. Accelerated reinforcement learning for temporal logic control objectives. IROS, 00:5077–5082, 2022

work page 2022
[39]

Policy optimization with linear temporal logic constraints

Cameron V oloshin, Hoang M Le, Swarat Chaudhuri, and Yisong Yue. Policy optimization with linear temporal logic constraints. NeurIPS, 2022

work page 2022
[40]

On the (in)tractability of reinforcement learning for LTL objectives

Cambridge Yang, Michael Littman, and Michael Carbin. On the (in)tractability of reinforcement learning for LTL objectives. IJCAI, 2022. 11

work page 2022
[41]

Safe reinforcement learning under temporal logic with reward design and quantum action selection

Mingyu Cai, Shaoping Xiao, Junchao Li, and Zhen Kan. Safe reinforcement learning under temporal logic with reward design and quantum action selection. Scientific Reports, 13(1):1925, 2023

work page 1925
[42]

Certified reinforcement learning with logic guidance

Hosein Hasanbeig, Daniel Kroening, and Alessandro Abate. Certified reinforcement learning with logic guidance. Artificial Intelligence, 322:103949, 2023

work page 2023
[43]

Overcoming exploration: Deep reinforcement learning for continuous control in cluttered environments from temporal logic specifications

Mingyu Cai, Erfan Aasi, Calin Belta, and Cristian-Ioan Vasile. Overcoming exploration: Deep reinforcement learning for continuous control in cluttered environments from temporal logic specifications. RA-L, 8(4):2158–2165, 2023

work page 2023
[44]

Security-aware reinforcement learning under linear temporal logic specifications

Bohan Cui, Keyi Zhu, Shaoyuan Li, and Xiang Yin. Security-aware reinforcement learning under linear temporal logic specifications. ICRA, 00:12367–12373, 2023

work page 2023
[45]

Eventual discounting temporal logic counterfactual experience replay

Cameron V oloshin, Abhinav Verma, and Yisong Yue. Eventual discounting temporal logic counterfactual experience replay. ICML, 2023

work page 2023
[46]

Sample efficient model-free reinforcement learning from LTL specifications with optimality guarantees

Daqian Shao and Marta Kwiatkowska. Sample efficient model-free reinforcement learning from LTL specifications with optimality guarantees. arXiv, 2023

work page 2023
[47]

Reinforcement learning under temporal logic constraints as a sequence modeling problem

Daiying Tian, Hao Fang, Qingkai Yang, Haoyong Yu, Wenyu Liang, and Yan Wu. Reinforcement learning under temporal logic constraints as a sequence modeling problem. Robotics and Autonomous Systems, 161:104351, 2023

work page 2023
[48]

Verginis, Cevahir Koprulu, Sandeep Chinchali, and Ufuk Topcu

Christos K. Verginis, Cevahir Koprulu, Sandeep Chinchali, and Ufuk Topcu. Joint learning of reward machines and policies in environments with partially known semantics. Artificial Intelligence, 333:104146, 2024

work page 2024
[49]

Reinforcement learning with LTL and $\ omega$-regular objectives via optimality-preserving translation to average rewards

Xuan-Bach Le, Dominik Wagner, Leon Witzman, Alexander Rabinovich, and Luke Ong. Reinforcement learning with LTL and $\ omega$-regular objectives via optimality-preserving translation to average rewards. NeurIPS, 2024

work page 2024
[50]

A PAC learning algorithm for LTL and omega-regular objectives in MDPs

Mateo Perez, Fabio Somenzi, and Ashutosh Trivedi. A PAC learning algorithm for LTL and omega-regular objectives in MDPs. AAAI, 38(19):21510–21517, 2024

work page 2024
[51]

Concrete Problems in AI Safety

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. Concrete problems in ai safety. arXiv preprint arXiv:1606.06565, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[52]

Defining and characterizing reward gaming

Joar Skalse, Nikolaus Howe, Dmitrii Krasheninnikov, and David Krueger. Defining and characterizing reward gaming. Advances in Neural Information Processing Systems, 35:9460– 9471, 2022

work page 2022
[53]

Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments

Yixuan Wang, Simon Sinong Zhan, Ruochen Jiao, Zhilu Wang, Wanxin Jin, Zhuoran Yang, Zhaoran Wang, Chao Huang, and Qi Zhu. Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments. In International Conference on Machine Learning, pages 36593–36604. PMLR, 2023

work page 2023
[54]

Reachability constrained reinforcement learning

Dongjie Yu, Haitong Ma, Shengbo Li, and Jianyu Chen. Reachability constrained reinforcement learning. In International conference on machine learning, pages 25636–25655. PMLR, 2022

work page 2022
[55]

Safety and liveness guarantees through reach-avoid reinforcement learning

Kai-Chieh Hsu, Vicenç Rubies-Royo, Claire J Tomlin, and Jaime F Fisac. Safety and liveness guarantees through reach-avoid reinforcement learning. RSS, 2021

work page 2021
[56]

Aksaray, A

D. Aksaray, A. Jones, Z. Kong, M. Schwager, and C. Belta. Q-learning for robust satisfaction of signal temporal logic specifications. In 2016 IEEE 55th Conference on Decision and Control (CDC), pages 6565–6570, Dec 2016

work page 2016
[57]

Analytical derivatives of rigid body dynamics algorithms

Justin Carpentier and Nicolas Mansard. Analytical derivatives of rigid body dynamics algorithms. RSS, 2018

work page 2018
[58]

ADD: Analytically differentiable dynamics for multi-body systems with frictional contact

Moritz Geilinger, David Hahn, Jonas Zehnder, Moritz Bacher, Bernhard Thomaszewski, and Stelian Coros. ADD: Analytically differentiable dynamics for multi-body systems with frictional contact. TOG, 2020

work page 2020
[59]

Efficient differentiable simulation of articulated bodies

Yi-Ling Qiao, Junbang Liang, Vladlen Koltun, and Ming C Lin. Efficient differentiable simulation of articulated bodies. ICML, 2021

work page 2021
[60]

An end-to-end differentiable framework for contact-aware robot design

Jie Xu, Tao Chen, Lara Zlokapa, Michael Foshey, Wojciech Matusik, Shinjiro Sueda, and Pulkit Agrawal. An end-to-end differentiable framework for contact-aware robot design. RSS, 2021

work page 2021
[61]

Fast and feature-complete differentiable physics for articulated rigid bodies with contact

Keenon Werling, Dalton Omens, Jeongseok Lee, Ioannis Exarchos, and C Karen Liu. Fast and feature-complete differentiable physics for articulated rigid bodies with contact. RSS, 2021. 12

work page 2021
[62]

DiSECt: A differentiable simulation engine for autonomous robotic cutting

Eric Heiden, Miles Macklin, Yashraj Narang, Dieter Fox, Animesh Garg, and Fabio Ramos. DiSECt: A differentiable simulation engine for autonomous robotic cutting. RSS, 2021

work page 2021
[63]

Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem

C. Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem. Brax - a differentiable physics engine for large scale rigid body simulation. NeurIPS, 2021

work page 2021
[64]

PODS: Policy optimization via differentiable simulation

Miguel Zamora, Momchil Peychev, Sehoon Ha, Martin Vechev, and Stelian Coros. PODS: Policy optimization via differentiable simulation. ICML, 2021

work page 2021
[65]

DiffPD: Differentiable projective dynamics

Tao Du, Kui Wu, Pingchuan Ma, Sebastien Wah, Andrew Spielberg, Daniela Rus, and Wojciech Matusik. DiffPD: Differentiable projective dynamics. TOG, 41(2):1–21, 2021

work page 2021
[66]

PlasticineLab: A soft-body manipulation benchmark with differentiable physics

Zhiao Huang, Yuanming Hu, Tao Du, Siyuan Zhou, Hao Su, Joshua B Tenenbaum, and Chuang Gan. PlasticineLab: A soft-body manipulation benchmark with differentiable physics. ICLR, 2021

work page 2021
[67]

DiffTaichi: Differentiable programming for physical simulation

Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, and Frédo Durand. DiffTaichi: Differentiable programming for physical simulation. ICLR, 2020

work page 2020
[68]

Lin, and Vladlen Koltun

Junbang Liang, Ming C. Lin, and Vladlen Koltun. Differentiable cloth simulation for inverse problems. NeurIPS, pages 1–22, 2019

work page 2019
[69]

Tenenbaum, William T

Yuanming Hu, Jiancheng Liu, Andrew Spielberg, Joshua B. Tenenbaum, William T. Freeman, Jiajun Wu, Daniela Rus, and Wojciech Matusik. ChainQueen: A real-time differentiable physical simulator for soft robotics. ICRA, 00:6265–6271, 2019

work page 2019
[70]

Gradients are not all you need

Luke Metz, C Daniel Freeman, Samuel S Schoenholz, and Tal Kachman. Gradients are not all you need. arXiv preprint arXiv:2111.05803, 2021

work page arXiv 2021
[71]

Pipps: Flexible model- based policy search robust to the curse of chaos

Paavo Parmas, Carl Edward Rasmussen, Jan Peters, and Kenji Doya. Pipps: Flexible model- based policy search robust to the curse of chaos. In International Conference on Machine Learning, pages 4065–4074. PMLR, 2018

work page 2018
[72]

Do differentiable simulators give better policy gradients? In International Conference on Machine Learning, pages 20668– 20696

Hyung Ju Suh, Max Simchowitz, Kaiqing Zhang, and Russ Tedrake. Do differentiable simulators give better policy gradients? In International Conference on Machine Learning, pages 20668– 20696. PMLR, 2022

work page 2022
[73]

Accelerated policy learning with parallel differentiable simulation

Jie Xu, Viktor Makoviychuk, Yashraj Narang, Fabio Ramos, Wojciech Matusik, Animesh Garg, and Miles Macklin. Accelerated policy learning with parallel differentiable simulation. ICLR, 2022

work page 2022
[74]

Adaptive horizon actor-critic for policy learning in contact-rich differentiable simulation

Ignat Georgiev, Krishnan Srinivasan, Jie Xu, Eric Heiden, and Animesh Garg. Adaptive horizon actor-critic for policy learning in contact-rich differentiable simulation. ICML, 2024

work page 2024
[75]

Sanghyun Son, Laura Yu Zheng, Ryan Sullivan, Yi-Ling Qiao, and Ming C. Lin. Gradient informed proximal policy optimization. NeurIPS, 2023

work page 2023
[76]

Backpropagation through signal temporal logic specifications: Infusing logical structure into gradient-based methods

Karen Leung, Nikos Aréchiga, and Marco Pavone. Backpropagation through signal temporal logic specifications: Infusing logical structure into gradient-based methods. The International Journal of Robotics Research, 42(6):356–370, 2023

work page 2023
[77]

Signal temporal logic neural predictive control

Yue Meng and Chuchu Fan. Signal temporal logic neural predictive control. RAL, 8(11):7719– 7726, 2023

work page 2023
[78]

Principles of Model Checking

Christel Baier and Joost-Pieter Katoen. Principles of Model Checking. MIT Press, Cambridge, MA, USA, 2008

work page 2008
[79]

Limit-deterministic Büchi automata for linear temporal logic

Salomon Sickert, Javier Esparza, Stefan Jaax, and Jan Kˇretínský. Limit-deterministic Büchi automata for linear temporal logic. In Swarat Chaudhuri and Azadeh Farzan, editors, Computer Aided Verification, pages 312–332, Cham, 2016. Springer International Publishing

work page 2016
[80]

Henzinger

Krishnendu Chatterjee and Thomas A. Henzinger. A survey of stochastic ω-regular games. Journal of Computer and System Sciences, 78(2):394–413, 2012

work page 2012

Showing first 80 references.

[1] [1]

Reinforcement learning in robotics: A survey

Jens Kober, J Andrew Bagnell, and Jan Peters. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013

work page 2013

[2] [2]

End-to-end training of deep visuomotor policies

Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1):1334–1373, 2016

work page 2016

[3] [3]

Learning dexterous in-hand manipulation

OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob Mc- Grew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1):3–20, 2020

work page 2020

[4] [4]

Learning agile and dynamic motor skills for legged robots

Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26):eaau5872, 2019

work page 2019

[5] [5]

Learning quadrupedal locomotion over challenging terrain

Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning quadrupedal locomotion over challenging terrain. Science robotics, 5(47):eabc5986, 2020

work page 2020

[6] [6]

Socially aware motion planning with deep reinforcement learning

Yu Fan Chen, Michael Everett, Miao Liu, and Jonathan P How. Socially aware motion planning with deep reinforcement learning. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1343–1350. IEEE, 2017

work page 2017

[7] [7]

Reinforcement learning in healthcare: A survey

Chao Yu, Jiming Liu, Shamim Nemati, and Guosheng Yin. Reinforcement learning in healthcare: A survey. ACM Computing Surveys (CSUR), 55(1):1–36, 2021

work page 2021

[8] [8]

A comprehensive survey on safe reinforcement learning

Javier Garcıa and Fernando Fernández. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480, 2015

work page 2015

[9] [9]

A lyapunov-based approach to safe reinforcement learning

Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, and Mohammad Ghavamzadeh. A lyapunov-based approach to safe reinforcement learning. Advances in neural information processing systems, 31, 2018

work page 2018

[10] [10]

Responsive safety in reinforcement learning by pid lagrangian methods

Adam Stooke, Joshua Achiam, and Pieter Abbeel. Responsive safety in reinforcement learning by pid lagrangian methods. InInternational Conference on Machine Learning, pages 9133–9143. PMLR, 2020

work page 2020

[11] [11]

Provably efficient safe exploration via primal-dual policy optimization

Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, and Mihailo Jovanovic. Provably efficient safe exploration via primal-dual policy optimization. In International conference on artificial intelligence and statistics, pages 3304–3312. PMLR, 2021

work page 2021

[12] [12]

Robot reinforcement learning on the constraint manifold

Puze Liu, Davide Tateo, Haitham Bou Ammar, and Jan Peters. Robot reinforcement learning on the constraint manifold. In Conference on Robot Learning, pages 1357–1366. PMLR, 2022

work page 2022

[13] [13]

Safe model- based reinforcement learning with stability guarantees

Felix Berkenkamp, Matteo Turchetta, Angela P Schoellig, and Andreas Krause. Safe model- based reinforcement learning with stability guarantees. NIPS, 2017

work page 2017

[14] [14]

Fisac, Anayo K

Jaime F. Fisac, Anayo K. Akametalu, Melanie N. Zeilinger, Shahab Kaynama, Jeremy Gillula, and Claire J. Tomlin. A general safety framework for learning-based control in uncertain robotic systems. TAC, 64(7):2737–2752, 2019

work page 2019

[15] [15]

Murray, and Joel W

Richard Cheng, Gabor Orosz, Richard M. Murray, and Joel W. Burdick. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. AAAI, 2019

work page 2019

[16] [16]

Safe reinforcement learning with model uncertainty estimates

Björn Lütjens, Michael Everett, and Jonathan P How. Safe reinforcement learning with model uncertainty estimates. ICRA, 2019

work page 2019

[17] [17]

Fisac, Neil F

Jaime F. Fisac, Neil F. Lugovoy, Vicenç Rubies-Royo, Shromona Ghosh, and Claire J. Tomlin. Bridging hamilton-jacobi safety analysis and reinforcement learning. ICRA, 00:8550–8556, 2019

work page 2019

[18] [18]

Gonzalez, Julian Ibarz, Chelsea Finn, and Ken Goldberg

Brijen Thananjeyan, Ashwin Balakrishna, Suraj Nair, Michael Luo, Krishnan Srinivasan, Minho Hwang, Joseph E. Gonzalez, Julian Ibarz, Chelsea Finn, and Ken Goldberg. Recovery RL: Safe reinforcement learning with learned recovery zones. RA-L, 6(3):4915–4922, 2020

work page 2020

[19] [19]

Robust model predictive shielding for safe reinforcement learning with stochastic dynamics

Shuo Li and Osbert Bastani. Robust model predictive shielding for safe reinforcement learning with stochastic dynamics. ICRA, 00:7166–7172, 2020

work page 2020

[20] [20]

Safe reinforcement learning using robust MPC

Mario Zanon and Sebastien Gros. Safe reinforcement learning using robust MPC. TAC, 66(8):3638–3652, 2020. 10

work page 2020

[21] [21]

Mohit Srinivasan, Amogh Dabholkar, Samuel Coogan, and Patricio A. Vela. Synthesis of control barrier functions using a supervised machine learning approach. IROS, 00:7139–7145, 2020

work page 2020

[22] [22]

Tomlin, and Koushil Sreenath

Jason Choi, Fernando Castaneda, Claire J. Tomlin, and Koushil Sreenath. Reinforcement learning for safety-critical control under model uncertainty, using control lyapunov functions and control barrier functions. RSS, 2020

work page 2020

[23] [23]

Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios

Tingxiang Fan, Pinxin Long, Wenxi Liu, and Jia Pan. Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios. Journal of Robotics Research, 39(7):856–892, 2020

work page 2020

[24] [24]

Learning safe multi-agent control with decentralized neural barrier certificates

Zengyi Qin, Kaiqing Zhang, Yuxiao Chen, Jingkai Chen, and Chuchu Fan. Learning safe multi-agent control with decentralized neural barrier certificates. ICLR, 2021

work page 2021

[25] [25]

Model-free safe control for zero-violation reinforce- ment learning

Weiye Zhao, Tairan He, and Changliu Liu. Model-free safe control for zero-violation reinforce- ment learning. CoRL, 2021

work page 2021

[26] [26]

Safe control with learned certificates: A survey of neural lyapunov, barrier, and contraction methods for robotics and control

Charles Dawson, Sicun Gao, and Chuchu Fan. Safe control with learned certificates: A survey of neural lyapunov, barrier, and contraction methods for robotics and control. T-RO, 39(3):1749–1767, 2023

work page 2023

[27] [27]

Santiago Paternain, Miguel Calvo-Fullana, Luiz F. O. Chamon, and Alejandro Ribeiro. Safe policies for reinforcement learning via primal-dual methods. TAC, 68(3):1321–1336, 2023

work page 2023

[28] [28]

Omega-regular objectives in model-free reinforcement learning

Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. Omega-regular objectives in model-free reinforcement learning. In Proceedings of the 25th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 395–412, 2019

work page 2019

[29] [29]

A. K. Bozkurt, Y . Wang, M. M. Zavlanos, and M. Pajic. Control synthesis from linear temporal logic specifications using model-free reinforcement learning. In International Conference on Robotics and Automation (ICRA), pages 10349–10355, 2020

work page 2020

[30] [30]

A. K. Bozkurt, Y . Wang, M. M. Zavlanos, and M. Pajic. Model-free reinforcement learning for stochastic games with linear temporal logic objectives. In International Conference on Robotics and Automation (ICRA), pages 10649–10655. IEEE, 2021

work page 2021

[31] [31]

A. K. Bozkurt, Y . Wang, and M. Pajic. Secure planning against stealthy attacks via model-free reinforcement learning. In International Conference on Robotics and Automation (ICRA), pages 10656–10662. IEEE, 2021

work page 2021

[32] [32]

A. K. Bozkurt, Y . Wang, and M. Pajic. Model-free learning of safe yet effective controllers. In Conference on Decision and Control (CDC), pages 6560–6565. IEEE, 2021

work page 2021

[33] [33]

A. K. Bozkurt, Y . Wang, M. M. Zavlanos, and M. Pajic. Learning optimal controllers for temporal logic specifications in stochastic games. Transactions on Automatic Control (TAC), 2024

work page 2024

[34] [34]

A formal methods approach to inter- pretable reinforcement learning for robotic planning

Xiao Li, Zachary Serlin, Guang Yang, and Calin Belta. A formal methods approach to inter- pretable reinforcement learning for robotic planning. Science Robotics, 4(37), 2019

work page 2019

[35] [35]

Modular deep reinforcement learning for continuous motion planning with temporal logic

Mingyu Cai, Mohammadhosein Hasanbeig, Shaoping Xiao, Alessandro Abate, and Zhen Kan. Modular deep reinforcement learning for continuous motion planning with temporal logic. RA-L, 6(4):7973–7980, 2021

work page 2021

[36] [36]

Reinforcement learning based temporal logic control with maximum probabilistic satisfaction

Mingyu Cai, Shaoping Xiao, Baoluo Li, Zhiliang Li, and Zhen Kan. Reinforcement learning based temporal logic control with maximum probabilistic satisfaction. ICRA, 00:806–812, 2021

work page 2021

[37] [37]

Reward machines: Exploiting reward function structure in reinforcement learning

Rodrigo Toro Icarte, Toryn Q Klassen, Richard Valenzano, and Sheila A McIlraith. Reward machines: Exploiting reward function structure in reinforcement learning. JAIR, 2022

work page 2022

[38] [38]

Accelerated reinforcement learning for temporal logic control objectives

Yiannis Kantaros. Accelerated reinforcement learning for temporal logic control objectives. IROS, 00:5077–5082, 2022

work page 2022

[39] [39]

Policy optimization with linear temporal logic constraints

Cameron V oloshin, Hoang M Le, Swarat Chaudhuri, and Yisong Yue. Policy optimization with linear temporal logic constraints. NeurIPS, 2022

work page 2022

[40] [40]

On the (in)tractability of reinforcement learning for LTL objectives

Cambridge Yang, Michael Littman, and Michael Carbin. On the (in)tractability of reinforcement learning for LTL objectives. IJCAI, 2022. 11

work page 2022

[41] [41]

Safe reinforcement learning under temporal logic with reward design and quantum action selection

Mingyu Cai, Shaoping Xiao, Junchao Li, and Zhen Kan. Safe reinforcement learning under temporal logic with reward design and quantum action selection. Scientific Reports, 13(1):1925, 2023

work page 1925

[42] [42]

Certified reinforcement learning with logic guidance

Hosein Hasanbeig, Daniel Kroening, and Alessandro Abate. Certified reinforcement learning with logic guidance. Artificial Intelligence, 322:103949, 2023

work page 2023

[43] [43]

Overcoming exploration: Deep reinforcement learning for continuous control in cluttered environments from temporal logic specifications

Mingyu Cai, Erfan Aasi, Calin Belta, and Cristian-Ioan Vasile. Overcoming exploration: Deep reinforcement learning for continuous control in cluttered environments from temporal logic specifications. RA-L, 8(4):2158–2165, 2023

work page 2023

[44] [44]

Security-aware reinforcement learning under linear temporal logic specifications

Bohan Cui, Keyi Zhu, Shaoyuan Li, and Xiang Yin. Security-aware reinforcement learning under linear temporal logic specifications. ICRA, 00:12367–12373, 2023

work page 2023

[45] [45]

Eventual discounting temporal logic counterfactual experience replay

Cameron V oloshin, Abhinav Verma, and Yisong Yue. Eventual discounting temporal logic counterfactual experience replay. ICML, 2023

work page 2023

[46] [46]

Sample efficient model-free reinforcement learning from LTL specifications with optimality guarantees

Daqian Shao and Marta Kwiatkowska. Sample efficient model-free reinforcement learning from LTL specifications with optimality guarantees. arXiv, 2023

work page 2023

[47] [47]

Reinforcement learning under temporal logic constraints as a sequence modeling problem

Daiying Tian, Hao Fang, Qingkai Yang, Haoyong Yu, Wenyu Liang, and Yan Wu. Reinforcement learning under temporal logic constraints as a sequence modeling problem. Robotics and Autonomous Systems, 161:104351, 2023

work page 2023

[48] [48]

Verginis, Cevahir Koprulu, Sandeep Chinchali, and Ufuk Topcu

Christos K. Verginis, Cevahir Koprulu, Sandeep Chinchali, and Ufuk Topcu. Joint learning of reward machines and policies in environments with partially known semantics. Artificial Intelligence, 333:104146, 2024

work page 2024

[49] [49]

Reinforcement learning with LTL and $\ omega$-regular objectives via optimality-preserving translation to average rewards

Xuan-Bach Le, Dominik Wagner, Leon Witzman, Alexander Rabinovich, and Luke Ong. Reinforcement learning with LTL and $\ omega$-regular objectives via optimality-preserving translation to average rewards. NeurIPS, 2024

work page 2024

[50] [50]

A PAC learning algorithm for LTL and omega-regular objectives in MDPs

Mateo Perez, Fabio Somenzi, and Ashutosh Trivedi. A PAC learning algorithm for LTL and omega-regular objectives in MDPs. AAAI, 38(19):21510–21517, 2024

work page 2024

[51] [51]

Concrete Problems in AI Safety

Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. Concrete problems in ai safety. arXiv preprint arXiv:1606.06565, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[52] [52]

Defining and characterizing reward gaming

Joar Skalse, Nikolaus Howe, Dmitrii Krasheninnikov, and David Krueger. Defining and characterizing reward gaming. Advances in Neural Information Processing Systems, 35:9460– 9471, 2022

work page 2022

[53] [53]

Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments

Yixuan Wang, Simon Sinong Zhan, Ruochen Jiao, Zhilu Wang, Wanxin Jin, Zhuoran Yang, Zhaoran Wang, Chao Huang, and Qi Zhu. Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments. In International Conference on Machine Learning, pages 36593–36604. PMLR, 2023

work page 2023

[54] [54]

Reachability constrained reinforcement learning

Dongjie Yu, Haitong Ma, Shengbo Li, and Jianyu Chen. Reachability constrained reinforcement learning. In International conference on machine learning, pages 25636–25655. PMLR, 2022

work page 2022

[55] [55]

Safety and liveness guarantees through reach-avoid reinforcement learning

Kai-Chieh Hsu, Vicenç Rubies-Royo, Claire J Tomlin, and Jaime F Fisac. Safety and liveness guarantees through reach-avoid reinforcement learning. RSS, 2021

work page 2021

[56] [56]

Aksaray, A

D. Aksaray, A. Jones, Z. Kong, M. Schwager, and C. Belta. Q-learning for robust satisfaction of signal temporal logic specifications. In 2016 IEEE 55th Conference on Decision and Control (CDC), pages 6565–6570, Dec 2016

work page 2016

[57] [57]

Analytical derivatives of rigid body dynamics algorithms

Justin Carpentier and Nicolas Mansard. Analytical derivatives of rigid body dynamics algorithms. RSS, 2018

work page 2018

[58] [58]

ADD: Analytically differentiable dynamics for multi-body systems with frictional contact

Moritz Geilinger, David Hahn, Jonas Zehnder, Moritz Bacher, Bernhard Thomaszewski, and Stelian Coros. ADD: Analytically differentiable dynamics for multi-body systems with frictional contact. TOG, 2020

work page 2020

[59] [59]

Efficient differentiable simulation of articulated bodies

Yi-Ling Qiao, Junbang Liang, Vladlen Koltun, and Ming C Lin. Efficient differentiable simulation of articulated bodies. ICML, 2021

work page 2021

[60] [60]

An end-to-end differentiable framework for contact-aware robot design

Jie Xu, Tao Chen, Lara Zlokapa, Michael Foshey, Wojciech Matusik, Shinjiro Sueda, and Pulkit Agrawal. An end-to-end differentiable framework for contact-aware robot design. RSS, 2021

work page 2021

[61] [61]

Fast and feature-complete differentiable physics for articulated rigid bodies with contact

Keenon Werling, Dalton Omens, Jeongseok Lee, Ioannis Exarchos, and C Karen Liu. Fast and feature-complete differentiable physics for articulated rigid bodies with contact. RSS, 2021. 12

work page 2021

[62] [62]

DiSECt: A differentiable simulation engine for autonomous robotic cutting

Eric Heiden, Miles Macklin, Yashraj Narang, Dieter Fox, Animesh Garg, and Fabio Ramos. DiSECt: A differentiable simulation engine for autonomous robotic cutting. RSS, 2021

work page 2021

[63] [63]

Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem

C. Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem. Brax - a differentiable physics engine for large scale rigid body simulation. NeurIPS, 2021

work page 2021

[64] [64]

PODS: Policy optimization via differentiable simulation

Miguel Zamora, Momchil Peychev, Sehoon Ha, Martin Vechev, and Stelian Coros. PODS: Policy optimization via differentiable simulation. ICML, 2021

work page 2021

[65] [65]

DiffPD: Differentiable projective dynamics

Tao Du, Kui Wu, Pingchuan Ma, Sebastien Wah, Andrew Spielberg, Daniela Rus, and Wojciech Matusik. DiffPD: Differentiable projective dynamics. TOG, 41(2):1–21, 2021

work page 2021

[66] [66]

PlasticineLab: A soft-body manipulation benchmark with differentiable physics

Zhiao Huang, Yuanming Hu, Tao Du, Siyuan Zhou, Hao Su, Joshua B Tenenbaum, and Chuang Gan. PlasticineLab: A soft-body manipulation benchmark with differentiable physics. ICLR, 2021

work page 2021

[67] [67]

DiffTaichi: Differentiable programming for physical simulation

Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, and Frédo Durand. DiffTaichi: Differentiable programming for physical simulation. ICLR, 2020

work page 2020

[68] [68]

Lin, and Vladlen Koltun

Junbang Liang, Ming C. Lin, and Vladlen Koltun. Differentiable cloth simulation for inverse problems. NeurIPS, pages 1–22, 2019

work page 2019

[69] [69]

Tenenbaum, William T

Yuanming Hu, Jiancheng Liu, Andrew Spielberg, Joshua B. Tenenbaum, William T. Freeman, Jiajun Wu, Daniela Rus, and Wojciech Matusik. ChainQueen: A real-time differentiable physical simulator for soft robotics. ICRA, 00:6265–6271, 2019

work page 2019

[70] [70]

Gradients are not all you need

Luke Metz, C Daniel Freeman, Samuel S Schoenholz, and Tal Kachman. Gradients are not all you need. arXiv preprint arXiv:2111.05803, 2021

work page arXiv 2021

[71] [71]

Pipps: Flexible model- based policy search robust to the curse of chaos

Paavo Parmas, Carl Edward Rasmussen, Jan Peters, and Kenji Doya. Pipps: Flexible model- based policy search robust to the curse of chaos. In International Conference on Machine Learning, pages 4065–4074. PMLR, 2018

work page 2018

[72] [72]

Do differentiable simulators give better policy gradients? In International Conference on Machine Learning, pages 20668– 20696

Hyung Ju Suh, Max Simchowitz, Kaiqing Zhang, and Russ Tedrake. Do differentiable simulators give better policy gradients? In International Conference on Machine Learning, pages 20668– 20696. PMLR, 2022

work page 2022

[73] [73]

Accelerated policy learning with parallel differentiable simulation

Jie Xu, Viktor Makoviychuk, Yashraj Narang, Fabio Ramos, Wojciech Matusik, Animesh Garg, and Miles Macklin. Accelerated policy learning with parallel differentiable simulation. ICLR, 2022

work page 2022

[74] [74]

Adaptive horizon actor-critic for policy learning in contact-rich differentiable simulation

Ignat Georgiev, Krishnan Srinivasan, Jie Xu, Eric Heiden, and Animesh Garg. Adaptive horizon actor-critic for policy learning in contact-rich differentiable simulation. ICML, 2024

work page 2024

[75] [75]

Sanghyun Son, Laura Yu Zheng, Ryan Sullivan, Yi-Ling Qiao, and Ming C. Lin. Gradient informed proximal policy optimization. NeurIPS, 2023

work page 2023

[76] [76]

Backpropagation through signal temporal logic specifications: Infusing logical structure into gradient-based methods

Karen Leung, Nikos Aréchiga, and Marco Pavone. Backpropagation through signal temporal logic specifications: Infusing logical structure into gradient-based methods. The International Journal of Robotics Research, 42(6):356–370, 2023

work page 2023

[77] [77]

Signal temporal logic neural predictive control

Yue Meng and Chuchu Fan. Signal temporal logic neural predictive control. RAL, 8(11):7719– 7726, 2023

work page 2023

[78] [78]

Principles of Model Checking

Christel Baier and Joost-Pieter Katoen. Principles of Model Checking. MIT Press, Cambridge, MA, USA, 2008

work page 2008

[79] [79]

Limit-deterministic Büchi automata for linear temporal logic

Salomon Sickert, Javier Esparza, Stefan Jaax, and Jan Kˇretínský. Limit-deterministic Büchi automata for linear temporal logic. In Swarat Chaudhuri and Azadeh Farzan, editors, Computer Aided Verification, pages 312–332, Cham, 2016. Springer International Publishing

work page 2016

[80] [80]

Henzinger

Krishnendu Chatterjee and Thomas A. Henzinger. A survey of stochastic ω-regular games. Journal of Computer and System Sciences, 78(2):394–413, 2012

work page 2012