Responsive safety in reinforcement learning by pid lagrangian methods

Adam Stooke, Joshua Achiam, Pieter Abbeel · 2020

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Accelerated Learning with Linear Temporal Logic using Differentiable Simulation

cs.LG · 2025-06-01 · unverdicted · novelty 7.0

Differentiable relaxation of LTL automata via soft labeling enables gradient-based RL from formal specifications, with theoretical bounds on discrete-differentiable discrepancy and up to 2x returns on nonlinear tasks.

Action-Conditioned Risk Gating for Safety-Critical Control under Partial Observability

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

Action-conditioned near-term risk prediction gates optimistic and conservative value estimates in RL to approximate risk-sensitive POMDP control, yielding better safety-performance tradeoffs with lower runtime than belief planning baselines.

SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning

cs.RO · 2025-03-05 · unverdicted · novelty 5.0

SafeVLA applies constrained reinforcement learning via CMDP min-max optimization to VLAs, cutting safety violation costs by 83.58% while preserving task success on long-horizon mobile manipulation tasks.

citing papers explorer

Showing 3 of 3 citing papers.

Accelerated Learning with Linear Temporal Logic using Differentiable Simulation cs.LG · 2025-06-01 · unverdicted · none · ref 10
Differentiable relaxation of LTL automata via soft labeling enables gradient-based RL from formal specifications, with theoretical bounds on discrete-differentiable discrepancy and up to 2x returns on nonlinear tasks.
Action-Conditioned Risk Gating for Safety-Critical Control under Partial Observability cs.LG · 2026-05-14 · unverdicted · none · ref 28
Action-conditioned near-term risk prediction gates optimistic and conservative value estimates in RL to approximate risk-sensitive POMDP control, yielding better safety-performance tradeoffs with lower runtime than belief planning baselines.
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning cs.RO · 2025-03-05 · unverdicted · none · ref 59
SafeVLA applies constrained reinforcement learning via CMDP min-max optimization to VLAs, cutting safety violation costs by 83.58% while preserving task success on long-horizon mobile manipulation tasks.

Responsive safety in reinforcement learning by pid lagrangian methods

fields

years

verdicts

representative citing papers

citing papers explorer