DR-SAC is the first actor-critic distributionally robust RL algorithm for offline continuous control that derives a convergent robust soft policy iteration and reports up to 9.8x higher rewards than SAC under perturbations.
Towards minimax Optimality of Model-based Robust Reinforcement Learning
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 3verdicts
UNVERDICTED 3representative citing papers
Presents the first algorithm to identify an ε-optimal policy in robust constrained MDPs via epigraph form and bisection search with Õ(ε^{-4}) robust policy evaluations.
Wolfpack attack framework disrupts MARL cooperation by targeting initial and assisting agents; WALL trains robust policies against it with reported experimental gains.
citing papers explorer
-
DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty
DR-SAC is the first actor-critic distributionally robust RL algorithm for offline continuous control that derives a convergent robust soft policy iteration and reports up to 9.8x higher rewards than SAC under perturbations.
-
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
Presents the first algorithm to identify an ε-optimal policy in robust constrained MDPs via epigraph form and bisection search with Õ(ε^{-4}) robust policy evaluations.
-
Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning
Wolfpack attack framework disrupts MARL cooperation by targeting initial and assisting agents; WALL trains robust policies against it with reported experimental gains.