Accelerated Learning with Linear Temporal Logic using Differentiable Simulation
Pith reviewed 2026-05-19 10:45 UTC · model grok-4.3
The pith
Soft labeling of automaton states makes linear temporal logic rewards differentiable for gradient-based reinforcement learning.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Our method relaxes discrete automaton transitions via soft labeling of states, yielding differentiable rewards and state representations that mitigate the sparsity issue intrinsic to LTL while preserving objective soundness. We provide theoretical guarantees connecting Büchi acceptance to both discrete and differentiable LTL returns and derive a tunable bound on their discrepancy in deterministic and stochastic settings.
What carries the argument
Soft labeling of states in the automaton, which replaces hard discrete transitions with continuous probabilities to enable gradient propagation through the LTL-based reward computation.
If this is right
- Substantially accelerates training in complex, nonlinear, contact-rich continuous-control tasks.
- Achieves up to twice the returns of discrete baselines.
- Compatible with reward machines for co-safe LTL and LTL_f without modification.
- Bridges formal methods and deep RL for safe, specification-driven learning in continuous domains.
Where Pith is reading between the lines
- This approach may extend to other formal specification languages beyond LTL by similar relaxation techniques.
- Future work could explore scaling to higher-dimensional state spaces or integrating with model-based RL methods.
- The tunable bound on discrepancy suggests opportunities for adaptive relaxation parameters during training.
Load-bearing premise
The soft labeling of automaton states preserves the soundness of the original LTL objective without introducing violations of the specification.
What would settle it
Observing a policy trained with the differentiable LTL reward that violates the original LTL specification in a deterministic environment would falsify the claim that soundness is preserved.
Figures
read the original abstract
Ensuring that reinforcement learning (RL) controllers satisfy safety and reliability constraints in real-world settings remains challenging: state-avoidance and constrained Markov decision processes often fail to capture trajectory-level requirements or induce overly conservative behavior. Formal specification languages such as linear temporal logic (LTL) offer correct-by-construction objectives, yet their rewards are typically sparse, and heuristic shaping can undermine correctness. We introduce, to our knowledge, the first end-to-end framework that integrates LTL with differentiable simulators, enabling efficient gradient-based learning directly from formal specifications. Our method relaxes discrete automaton transitions via soft labeling of states, yielding differentiable rewards and state representations that mitigate the sparsity issue intrinsic to LTL while preserving objective soundness. We provide theoretical guarantees connecting B\"uchi acceptance to both discrete and differentiable LTL returns and derive a tunable bound on their discrepancy in deterministic and stochastic settings. Empirically, across complex, nonlinear, contact-rich continuous-control tasks, our approach substantially accelerates training and achieves up to twice the returns of discrete baselines. We further demonstrate compatibility with reward machines, thereby covering co-safe LTL and LTL$_\text{f}$ without modification. By rendering automaton-based rewards differentiable, our work bridges formal methods and deep RL, enabling safe, specification-driven learning in continuous domains.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This paper introduces a framework for integrating Linear Temporal Logic (LTL) with differentiable simulators in reinforcement learning. By relaxing discrete automaton transitions through soft labeling of states, it generates differentiable rewards and state representations to address the sparsity of LTL-based rewards while aiming to preserve soundness. Theoretical results connect Büchi acceptance to both discrete and differentiable returns and provide a tunable discrepancy bound for deterministic and stochastic settings. Experiments on nonlinear, contact-rich continuous-control tasks demonstrate faster training and up to twice the returns compared to discrete baselines, with extensions to reward machines for co-safe LTL and LTL_f.
Significance. If the theoretical guarantees and discrepancy bound hold under the considered dynamics, this approach could enable more efficient and correct specification-driven RL in continuous domains, bridging formal methods and deep learning. The empirical acceleration in complex tasks highlights potential practical impact. The provision of theoretical connections and compatibility with reward machines are notable strengths.
major comments (2)
- [Abstract / Theoretical Analysis] Abstract and Theoretical Analysis section: The derivation of the tunable bound on the discrepancy between discrete and differentiable LTL returns is not fully detailed. This is load-bearing for the central claim because soundness of the relaxed objective under stochastic nonlinear dynamics (as in the contact-rich tasks) requires explicit conditions on the soft labeling (e.g., Lipschitz continuity or bounded transition variance); without them the bound may fail to control discrepancies when discontinuities alter acceptance paths, as highlighted by the stress-test concern.
- [§3 / §5] §3 (Method) and §5 (Theoretical Guarantees): The claim that soft labeling preserves soundness of the original LTL objective needs a concrete statement of the conditions under which the relaxation maintains equivalence to Büchi acceptance; the current presentation leaves open whether post-hoc choices in the labeling temperature affect the bound's validity in deterministic vs. stochastic cases.
minor comments (2)
- [Abstract] Abstract: The phrase 'soft labeling of states' would benefit from an immediate parenthetical definition or pointer to the precise relaxation formula to improve accessibility.
- The manuscript could add a short paragraph contrasting the approach with prior work on differentiable automata or reward shaping to better situate the novelty.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment in detail below and have revised the theoretical sections to strengthen the presentation of the discrepancy bound and the conditions for soundness.
read point-by-point responses
-
Referee: [Abstract / Theoretical Analysis] Abstract and Theoretical Analysis section: The derivation of the tunable bound on the discrepancy between discrete and differentiable LTL returns is not fully detailed. This is load-bearing for the central claim because soundness of the relaxed objective under stochastic nonlinear dynamics (as in the contact-rich tasks) requires explicit conditions on the soft labeling (e.g., Lipschitz continuity or bounded transition variance); without them the bound may fail to control discrepancies when discontinuities alter acceptance paths, as highlighted by the stress-test concern.
Authors: We agree that the derivation of the tunable bound would benefit from greater explicitness. In the revised manuscript, we have expanded the Theoretical Guarantees section with a complete step-by-step derivation. We now state the required assumptions on the soft labeling, including Lipschitz continuity of the labeling function and a bound on transition variance in the stochastic setting. These conditions ensure the bound remains valid under nonlinear dynamics and controls discrepancies arising from discontinuities in acceptance paths. We have also added analysis addressing stress-test scenarios to demonstrate that the bound continues to hold. revision: yes
-
Referee: [§3 / §5] §3 (Method) and §5 (Theoretical Guarantees): The claim that soft labeling preserves soundness of the original LTL objective needs a concrete statement of the conditions under which the relaxation maintains equivalence to Büchi acceptance; the current presentation leaves open whether post-hoc choices in the labeling temperature affect the bound's validity in deterministic vs. stochastic cases.
Authors: We concur that a more precise statement of the conditions is warranted. We have added a dedicated theorem in §5 that explicitly characterizes the conditions under which soft labeling maintains equivalence to Büchi acceptance. The theorem delineates the admissible range for the labeling temperature such that the differentiable objective remains sound with respect to the discrete Büchi acceptance condition. This statement applies uniformly to both deterministic and stochastic cases, with the discrepancy bound adjusted to reflect the setting. We clarify that temperature selection must respect these conditions rather than being chosen post-hoc, and we provide practical guidance for satisfying them. revision: yes
Circularity Check
Theoretical guarantees on Büchi-to-differentiable return connection are independently derived
full rationale
The paper's central derivation provides theoretical guarantees linking Büchi acceptance to both discrete and differentiable LTL returns, along with a tunable discrepancy bound in deterministic and stochastic settings. This is presented as a first-principles result from the soft-labeling relaxation of automaton transitions, without reduction to fitted parameters, self-referential definitions, or load-bearing self-citations. The abstract and context show no evidence of the bound or soundness preservation being equivalent to inputs by construction; the derivation chain remains self-contained against external formal methods benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- soft labeling temperature or relaxation parameter
axioms (1)
- domain assumption Büchi acceptance conditions remain approximately preserved under soft state labeling in both deterministic and stochastic dynamics
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 1... lim γ→1 E[G(σ)] = Pr(σ |= □◇B) with state-dependent R and Γ on accepting states B
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Reinforcement learning in robotics: A survey
Jens Kober, J Andrew Bagnell, and Jan Peters. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32(11):1238–1274, 2013
work page 2013
-
[2]
End-to-end training of deep visuomotor policies
Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1):1334–1373, 2016
work page 2016
-
[3]
Learning dexterous in-hand manipulation
OpenAI: Marcin Andrychowicz, Bowen Baker, Maciek Chociej, Rafal Jozefowicz, Bob Mc- Grew, Jakub Pachocki, Arthur Petron, Matthias Plappert, Glenn Powell, Alex Ray, et al. Learning dexterous in-hand manipulation. The International Journal of Robotics Research, 39(1):3–20, 2020
work page 2020
-
[4]
Learning agile and dynamic motor skills for legged robots
Jemin Hwangbo, Joonho Lee, Alexey Dosovitskiy, Dario Bellicoso, Vassilios Tsounis, Vladlen Koltun, and Marco Hutter. Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26):eaau5872, 2019
work page 2019
-
[5]
Learning quadrupedal locomotion over challenging terrain
Joonho Lee, Jemin Hwangbo, Lorenz Wellhausen, Vladlen Koltun, and Marco Hutter. Learning quadrupedal locomotion over challenging terrain. Science robotics, 5(47):eabc5986, 2020
work page 2020
-
[6]
Socially aware motion planning with deep reinforcement learning
Yu Fan Chen, Michael Everett, Miao Liu, and Jonathan P How. Socially aware motion planning with deep reinforcement learning. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1343–1350. IEEE, 2017
work page 2017
-
[7]
Reinforcement learning in healthcare: A survey
Chao Yu, Jiming Liu, Shamim Nemati, and Guosheng Yin. Reinforcement learning in healthcare: A survey. ACM Computing Surveys (CSUR), 55(1):1–36, 2021
work page 2021
-
[8]
A comprehensive survey on safe reinforcement learning
Javier Garcıa and Fernando Fernández. A comprehensive survey on safe reinforcement learning. Journal of Machine Learning Research, 16(1):1437–1480, 2015
work page 2015
-
[9]
A lyapunov-based approach to safe reinforcement learning
Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, and Mohammad Ghavamzadeh. A lyapunov-based approach to safe reinforcement learning. Advances in neural information processing systems, 31, 2018
work page 2018
-
[10]
Responsive safety in reinforcement learning by pid lagrangian methods
Adam Stooke, Joshua Achiam, and Pieter Abbeel. Responsive safety in reinforcement learning by pid lagrangian methods. InInternational Conference on Machine Learning, pages 9133–9143. PMLR, 2020
work page 2020
-
[11]
Provably efficient safe exploration via primal-dual policy optimization
Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, and Mihailo Jovanovic. Provably efficient safe exploration via primal-dual policy optimization. In International conference on artificial intelligence and statistics, pages 3304–3312. PMLR, 2021
work page 2021
-
[12]
Robot reinforcement learning on the constraint manifold
Puze Liu, Davide Tateo, Haitham Bou Ammar, and Jan Peters. Robot reinforcement learning on the constraint manifold. In Conference on Robot Learning, pages 1357–1366. PMLR, 2022
work page 2022
-
[13]
Safe model- based reinforcement learning with stability guarantees
Felix Berkenkamp, Matteo Turchetta, Angela P Schoellig, and Andreas Krause. Safe model- based reinforcement learning with stability guarantees. NIPS, 2017
work page 2017
-
[14]
Jaime F. Fisac, Anayo K. Akametalu, Melanie N. Zeilinger, Shahab Kaynama, Jeremy Gillula, and Claire J. Tomlin. A general safety framework for learning-based control in uncertain robotic systems. TAC, 64(7):2737–2752, 2019
work page 2019
-
[15]
Richard Cheng, Gabor Orosz, Richard M. Murray, and Joel W. Burdick. End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks. AAAI, 2019
work page 2019
-
[16]
Safe reinforcement learning with model uncertainty estimates
Björn Lütjens, Michael Everett, and Jonathan P How. Safe reinforcement learning with model uncertainty estimates. ICRA, 2019
work page 2019
-
[17]
Jaime F. Fisac, Neil F. Lugovoy, Vicenç Rubies-Royo, Shromona Ghosh, and Claire J. Tomlin. Bridging hamilton-jacobi safety analysis and reinforcement learning. ICRA, 00:8550–8556, 2019
work page 2019
-
[18]
Gonzalez, Julian Ibarz, Chelsea Finn, and Ken Goldberg
Brijen Thananjeyan, Ashwin Balakrishna, Suraj Nair, Michael Luo, Krishnan Srinivasan, Minho Hwang, Joseph E. Gonzalez, Julian Ibarz, Chelsea Finn, and Ken Goldberg. Recovery RL: Safe reinforcement learning with learned recovery zones. RA-L, 6(3):4915–4922, 2020
work page 2020
-
[19]
Robust model predictive shielding for safe reinforcement learning with stochastic dynamics
Shuo Li and Osbert Bastani. Robust model predictive shielding for safe reinforcement learning with stochastic dynamics. ICRA, 00:7166–7172, 2020
work page 2020
-
[20]
Safe reinforcement learning using robust MPC
Mario Zanon and Sebastien Gros. Safe reinforcement learning using robust MPC. TAC, 66(8):3638–3652, 2020. 10
work page 2020
-
[21]
Mohit Srinivasan, Amogh Dabholkar, Samuel Coogan, and Patricio A. Vela. Synthesis of control barrier functions using a supervised machine learning approach. IROS, 00:7139–7145, 2020
work page 2020
-
[22]
Jason Choi, Fernando Castaneda, Claire J. Tomlin, and Koushil Sreenath. Reinforcement learning for safety-critical control under model uncertainty, using control lyapunov functions and control barrier functions. RSS, 2020
work page 2020
-
[23]
Tingxiang Fan, Pinxin Long, Wenxi Liu, and Jia Pan. Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios. Journal of Robotics Research, 39(7):856–892, 2020
work page 2020
-
[24]
Learning safe multi-agent control with decentralized neural barrier certificates
Zengyi Qin, Kaiqing Zhang, Yuxiao Chen, Jingkai Chen, and Chuchu Fan. Learning safe multi-agent control with decentralized neural barrier certificates. ICLR, 2021
work page 2021
-
[25]
Model-free safe control for zero-violation reinforce- ment learning
Weiye Zhao, Tairan He, and Changliu Liu. Model-free safe control for zero-violation reinforce- ment learning. CoRL, 2021
work page 2021
-
[26]
Charles Dawson, Sicun Gao, and Chuchu Fan. Safe control with learned certificates: A survey of neural lyapunov, barrier, and contraction methods for robotics and control. T-RO, 39(3):1749–1767, 2023
work page 2023
-
[27]
Santiago Paternain, Miguel Calvo-Fullana, Luiz F. O. Chamon, and Alejandro Ribeiro. Safe policies for reinforcement learning via primal-dual methods. TAC, 68(3):1321–1336, 2023
work page 2023
-
[28]
Omega-regular objectives in model-free reinforcement learning
Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh Trivedi, and Dominik Wojtczak. Omega-regular objectives in model-free reinforcement learning. In Proceedings of the 25th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS), pages 395–412, 2019
work page 2019
-
[29]
A. K. Bozkurt, Y . Wang, M. M. Zavlanos, and M. Pajic. Control synthesis from linear temporal logic specifications using model-free reinforcement learning. In International Conference on Robotics and Automation (ICRA), pages 10349–10355, 2020
work page 2020
-
[30]
A. K. Bozkurt, Y . Wang, M. M. Zavlanos, and M. Pajic. Model-free reinforcement learning for stochastic games with linear temporal logic objectives. In International Conference on Robotics and Automation (ICRA), pages 10649–10655. IEEE, 2021
work page 2021
-
[31]
A. K. Bozkurt, Y . Wang, and M. Pajic. Secure planning against stealthy attacks via model-free reinforcement learning. In International Conference on Robotics and Automation (ICRA), pages 10656–10662. IEEE, 2021
work page 2021
-
[32]
A. K. Bozkurt, Y . Wang, and M. Pajic. Model-free learning of safe yet effective controllers. In Conference on Decision and Control (CDC), pages 6560–6565. IEEE, 2021
work page 2021
-
[33]
A. K. Bozkurt, Y . Wang, M. M. Zavlanos, and M. Pajic. Learning optimal controllers for temporal logic specifications in stochastic games. Transactions on Automatic Control (TAC), 2024
work page 2024
-
[34]
A formal methods approach to inter- pretable reinforcement learning for robotic planning
Xiao Li, Zachary Serlin, Guang Yang, and Calin Belta. A formal methods approach to inter- pretable reinforcement learning for robotic planning. Science Robotics, 4(37), 2019
work page 2019
-
[35]
Modular deep reinforcement learning for continuous motion planning with temporal logic
Mingyu Cai, Mohammadhosein Hasanbeig, Shaoping Xiao, Alessandro Abate, and Zhen Kan. Modular deep reinforcement learning for continuous motion planning with temporal logic. RA-L, 6(4):7973–7980, 2021
work page 2021
-
[36]
Reinforcement learning based temporal logic control with maximum probabilistic satisfaction
Mingyu Cai, Shaoping Xiao, Baoluo Li, Zhiliang Li, and Zhen Kan. Reinforcement learning based temporal logic control with maximum probabilistic satisfaction. ICRA, 00:806–812, 2021
work page 2021
-
[37]
Reward machines: Exploiting reward function structure in reinforcement learning
Rodrigo Toro Icarte, Toryn Q Klassen, Richard Valenzano, and Sheila A McIlraith. Reward machines: Exploiting reward function structure in reinforcement learning. JAIR, 2022
work page 2022
-
[38]
Accelerated reinforcement learning for temporal logic control objectives
Yiannis Kantaros. Accelerated reinforcement learning for temporal logic control objectives. IROS, 00:5077–5082, 2022
work page 2022
-
[39]
Policy optimization with linear temporal logic constraints
Cameron V oloshin, Hoang M Le, Swarat Chaudhuri, and Yisong Yue. Policy optimization with linear temporal logic constraints. NeurIPS, 2022
work page 2022
-
[40]
On the (in)tractability of reinforcement learning for LTL objectives
Cambridge Yang, Michael Littman, and Michael Carbin. On the (in)tractability of reinforcement learning for LTL objectives. IJCAI, 2022. 11
work page 2022
-
[41]
Safe reinforcement learning under temporal logic with reward design and quantum action selection
Mingyu Cai, Shaoping Xiao, Junchao Li, and Zhen Kan. Safe reinforcement learning under temporal logic with reward design and quantum action selection. Scientific Reports, 13(1):1925, 2023
work page 1925
-
[42]
Certified reinforcement learning with logic guidance
Hosein Hasanbeig, Daniel Kroening, and Alessandro Abate. Certified reinforcement learning with logic guidance. Artificial Intelligence, 322:103949, 2023
work page 2023
-
[43]
Mingyu Cai, Erfan Aasi, Calin Belta, and Cristian-Ioan Vasile. Overcoming exploration: Deep reinforcement learning for continuous control in cluttered environments from temporal logic specifications. RA-L, 8(4):2158–2165, 2023
work page 2023
-
[44]
Security-aware reinforcement learning under linear temporal logic specifications
Bohan Cui, Keyi Zhu, Shaoyuan Li, and Xiang Yin. Security-aware reinforcement learning under linear temporal logic specifications. ICRA, 00:12367–12373, 2023
work page 2023
-
[45]
Eventual discounting temporal logic counterfactual experience replay
Cameron V oloshin, Abhinav Verma, and Yisong Yue. Eventual discounting temporal logic counterfactual experience replay. ICML, 2023
work page 2023
-
[46]
Daqian Shao and Marta Kwiatkowska. Sample efficient model-free reinforcement learning from LTL specifications with optimality guarantees. arXiv, 2023
work page 2023
-
[47]
Reinforcement learning under temporal logic constraints as a sequence modeling problem
Daiying Tian, Hao Fang, Qingkai Yang, Haoyong Yu, Wenyu Liang, and Yan Wu. Reinforcement learning under temporal logic constraints as a sequence modeling problem. Robotics and Autonomous Systems, 161:104351, 2023
work page 2023
-
[48]
Verginis, Cevahir Koprulu, Sandeep Chinchali, and Ufuk Topcu
Christos K. Verginis, Cevahir Koprulu, Sandeep Chinchali, and Ufuk Topcu. Joint learning of reward machines and policies in environments with partially known semantics. Artificial Intelligence, 333:104146, 2024
work page 2024
-
[49]
Xuan-Bach Le, Dominik Wagner, Leon Witzman, Alexander Rabinovich, and Luke Ong. Reinforcement learning with LTL and $\ omega$-regular objectives via optimality-preserving translation to average rewards. NeurIPS, 2024
work page 2024
-
[50]
A PAC learning algorithm for LTL and omega-regular objectives in MDPs
Mateo Perez, Fabio Somenzi, and Ashutosh Trivedi. A PAC learning algorithm for LTL and omega-regular objectives in MDPs. AAAI, 38(19):21510–21517, 2024
work page 2024
-
[51]
Concrete Problems in AI Safety
Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. Concrete problems in ai safety. arXiv preprint arXiv:1606.06565, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[52]
Defining and characterizing reward gaming
Joar Skalse, Nikolaus Howe, Dmitrii Krasheninnikov, and David Krueger. Defining and characterizing reward gaming. Advances in Neural Information Processing Systems, 35:9460– 9471, 2022
work page 2022
-
[53]
Yixuan Wang, Simon Sinong Zhan, Ruochen Jiao, Zhilu Wang, Wanxin Jin, Zhuoran Yang, Zhaoran Wang, Chao Huang, and Qi Zhu. Enforcing hard constraints with soft barriers: Safe reinforcement learning in unknown stochastic environments. In International Conference on Machine Learning, pages 36593–36604. PMLR, 2023
work page 2023
-
[54]
Reachability constrained reinforcement learning
Dongjie Yu, Haitong Ma, Shengbo Li, and Jianyu Chen. Reachability constrained reinforcement learning. In International conference on machine learning, pages 25636–25655. PMLR, 2022
work page 2022
-
[55]
Safety and liveness guarantees through reach-avoid reinforcement learning
Kai-Chieh Hsu, Vicenç Rubies-Royo, Claire J Tomlin, and Jaime F Fisac. Safety and liveness guarantees through reach-avoid reinforcement learning. RSS, 2021
work page 2021
-
[56]
D. Aksaray, A. Jones, Z. Kong, M. Schwager, and C. Belta. Q-learning for robust satisfaction of signal temporal logic specifications. In 2016 IEEE 55th Conference on Decision and Control (CDC), pages 6565–6570, Dec 2016
work page 2016
-
[57]
Analytical derivatives of rigid body dynamics algorithms
Justin Carpentier and Nicolas Mansard. Analytical derivatives of rigid body dynamics algorithms. RSS, 2018
work page 2018
-
[58]
ADD: Analytically differentiable dynamics for multi-body systems with frictional contact
Moritz Geilinger, David Hahn, Jonas Zehnder, Moritz Bacher, Bernhard Thomaszewski, and Stelian Coros. ADD: Analytically differentiable dynamics for multi-body systems with frictional contact. TOG, 2020
work page 2020
-
[59]
Efficient differentiable simulation of articulated bodies
Yi-Ling Qiao, Junbang Liang, Vladlen Koltun, and Ming C Lin. Efficient differentiable simulation of articulated bodies. ICML, 2021
work page 2021
-
[60]
An end-to-end differentiable framework for contact-aware robot design
Jie Xu, Tao Chen, Lara Zlokapa, Michael Foshey, Wojciech Matusik, Shinjiro Sueda, and Pulkit Agrawal. An end-to-end differentiable framework for contact-aware robot design. RSS, 2021
work page 2021
-
[61]
Fast and feature-complete differentiable physics for articulated rigid bodies with contact
Keenon Werling, Dalton Omens, Jeongseok Lee, Ioannis Exarchos, and C Karen Liu. Fast and feature-complete differentiable physics for articulated rigid bodies with contact. RSS, 2021. 12
work page 2021
-
[62]
DiSECt: A differentiable simulation engine for autonomous robotic cutting
Eric Heiden, Miles Macklin, Yashraj Narang, Dieter Fox, Animesh Garg, and Fabio Ramos. DiSECt: A differentiable simulation engine for autonomous robotic cutting. RSS, 2021
work page 2021
-
[63]
Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem
C. Daniel Freeman, Erik Frey, Anton Raichuk, Sertan Girgin, Igor Mordatch, and Olivier Bachem. Brax - a differentiable physics engine for large scale rigid body simulation. NeurIPS, 2021
work page 2021
-
[64]
PODS: Policy optimization via differentiable simulation
Miguel Zamora, Momchil Peychev, Sehoon Ha, Martin Vechev, and Stelian Coros. PODS: Policy optimization via differentiable simulation. ICML, 2021
work page 2021
-
[65]
DiffPD: Differentiable projective dynamics
Tao Du, Kui Wu, Pingchuan Ma, Sebastien Wah, Andrew Spielberg, Daniela Rus, and Wojciech Matusik. DiffPD: Differentiable projective dynamics. TOG, 41(2):1–21, 2021
work page 2021
-
[66]
PlasticineLab: A soft-body manipulation benchmark with differentiable physics
Zhiao Huang, Yuanming Hu, Tao Du, Siyuan Zhou, Hao Su, Joshua B Tenenbaum, and Chuang Gan. PlasticineLab: A soft-body manipulation benchmark with differentiable physics. ICLR, 2021
work page 2021
-
[67]
DiffTaichi: Differentiable programming for physical simulation
Yuanming Hu, Luke Anderson, Tzu-Mao Li, Qi Sun, Nathan Carr, Jonathan Ragan-Kelley, and Frédo Durand. DiffTaichi: Differentiable programming for physical simulation. ICLR, 2020
work page 2020
-
[68]
Junbang Liang, Ming C. Lin, and Vladlen Koltun. Differentiable cloth simulation for inverse problems. NeurIPS, pages 1–22, 2019
work page 2019
-
[69]
Yuanming Hu, Jiancheng Liu, Andrew Spielberg, Joshua B. Tenenbaum, William T. Freeman, Jiajun Wu, Daniela Rus, and Wojciech Matusik. ChainQueen: A real-time differentiable physical simulator for soft robotics. ICRA, 00:6265–6271, 2019
work page 2019
-
[70]
Gradients are not all you need
Luke Metz, C Daniel Freeman, Samuel S Schoenholz, and Tal Kachman. Gradients are not all you need. arXiv preprint arXiv:2111.05803, 2021
-
[71]
Pipps: Flexible model- based policy search robust to the curse of chaos
Paavo Parmas, Carl Edward Rasmussen, Jan Peters, and Kenji Doya. Pipps: Flexible model- based policy search robust to the curse of chaos. In International Conference on Machine Learning, pages 4065–4074. PMLR, 2018
work page 2018
-
[72]
Hyung Ju Suh, Max Simchowitz, Kaiqing Zhang, and Russ Tedrake. Do differentiable simulators give better policy gradients? In International Conference on Machine Learning, pages 20668– 20696. PMLR, 2022
work page 2022
-
[73]
Accelerated policy learning with parallel differentiable simulation
Jie Xu, Viktor Makoviychuk, Yashraj Narang, Fabio Ramos, Wojciech Matusik, Animesh Garg, and Miles Macklin. Accelerated policy learning with parallel differentiable simulation. ICLR, 2022
work page 2022
-
[74]
Adaptive horizon actor-critic for policy learning in contact-rich differentiable simulation
Ignat Georgiev, Krishnan Srinivasan, Jie Xu, Eric Heiden, and Animesh Garg. Adaptive horizon actor-critic for policy learning in contact-rich differentiable simulation. ICML, 2024
work page 2024
-
[75]
Sanghyun Son, Laura Yu Zheng, Ryan Sullivan, Yi-Ling Qiao, and Ming C. Lin. Gradient informed proximal policy optimization. NeurIPS, 2023
work page 2023
-
[76]
Karen Leung, Nikos Aréchiga, and Marco Pavone. Backpropagation through signal temporal logic specifications: Infusing logical structure into gradient-based methods. The International Journal of Robotics Research, 42(6):356–370, 2023
work page 2023
-
[77]
Signal temporal logic neural predictive control
Yue Meng and Chuchu Fan. Signal temporal logic neural predictive control. RAL, 8(11):7719– 7726, 2023
work page 2023
-
[78]
Christel Baier and Joost-Pieter Katoen. Principles of Model Checking. MIT Press, Cambridge, MA, USA, 2008
work page 2008
-
[79]
Limit-deterministic Büchi automata for linear temporal logic
Salomon Sickert, Javier Esparza, Stefan Jaax, and Jan Kˇretínský. Limit-deterministic Büchi automata for linear temporal logic. In Swarat Chaudhuri and Azadeh Farzan, editors, Computer Aided Verification, pages 312–332, Cham, 2016. Springer International Publishing
work page 2016
- [80]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.