Differentiable relaxation of LTL automata via soft labeling enables gradient-based RL from formal specifications, with theoretical bounds on discrete-differentiable discrepancy and up to 2x returns on nonlinear tasks.
A comprehensive survey on safe reinforcement learning
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 5verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
PROCO generates synthetic unsafe samples via model-based rollouts and LLM-grounded costs to enable safer policy learning from offline datasets containing few or no violations.
RL agents fail dangerously on unseen environments; ensembles reduce catastrophes in gridworld but not CoinRun, with uncertainty enabling intervention prediction.
AdamFLIP treats PDE constraint residuals in PINNs as a controlled dynamical system, computes Lagrange multipliers via feedback linearization to drive residuals to zero, and applies Adam-style adaptation to the resulting gradient for scalable hard-constrained training.
Introduces a framework that learns an uncertainty-aware dynamics model and optimizes the policy via automatic differentiation through the model, reporting competitive asymptotic performance with significantly lower sample complexity than baselines on continuous control benchmarks.
citing papers explorer
-
Accelerated Learning with Linear Temporal Logic using Differentiable Simulation
Differentiable relaxation of LTL automata via soft labeling enables gradient-based RL from formal specifications, with theoretical bounds on discrete-differentiable discrepancy and up to 2x returns on nonlinear tasks.
-
Model-Based Proactive Cost Generation for Learning Safe Policies Offline with Limited Violation Data
PROCO generates synthetic unsafe samples via model-based rollouts and LLM-grounded costs to enable safer policy learning from offline datasets containing few or no violations.
-
Generalizing from a few environments in safety-critical reinforcement learning
RL agents fail dangerously on unseen environments; ensembles reduce catastrophes in gridworld but not CoinRun, with uncertainty enabling intervention prediction.
-
AdamFLIP: Adaptive Momentum Feedback Linearization Optimization for Hard Constrained PINN Training
AdamFLIP treats PDE constraint residuals in PINNs as a controlled dynamical system, computes Lagrange multipliers via feedback linearization to drive residuals to zero, and applies Adam-style adaptation to the resulting gradient for scalable hard-constrained training.
-
Uncertainty-aware Model-based Policy Optimization
Introduces a framework that learns an uncertainty-aware dynamics model and optimizes the policy via automatic differentiation through the model, reporting competitive asymptotic performance with significantly lower sample complexity than baselines on continuous control benchmarks.