DR-SAC is the first actor-critic distributionally robust RL algorithm for offline continuous control that derives a convergent robust soft policy iteration and reports up to 9.8x higher rewards than SAC under perturbations.
Robust control of markov decision processes with uncertain transition matrices.Operations Research, 53(5):780–798
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 3roles
background 1polarities
background 1representative citing papers
Agentic safety fails to generalize across tasks because the task-to-safe-controller mapping has a higher Lipschitz constant than the task-to-controller mapping alone, as proven in linear-quadratic control and demonstrated in quadcopter and LLM experiments.
Robust minimax task inference in BFMs achieves dynamics-shift robustness from nominal offline data alone and outperforms standard baselines.
citing papers explorer
-
DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty
DR-SAC is the first actor-critic distributionally robust RL algorithm for offline continuous control that derives a convergent robust soft policy iteration and reports up to 9.8x higher rewards than SAC under perturbations.
-
Why Does Agentic Safety Fail to Generalize Across Tasks?
Agentic safety fails to generalize across tasks because the task-to-safe-controller mapping has a higher Lipschitz constant than the task-to-controller mapping alone, as proven in linear-quadratic control and demonstrated in quadcopter and LLM experiments.
-
When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited
Robust minimax task inference in BFMs achieves dynamics-shift robustness from nominal offline data alone and outperforms standard baselines.