Robust Reinforcement Learning for Continuous Control with Model Misspecification

Abbas Abdolmaleki; Daniel J. Mankowitz; Jackie Kay; Jost Tobias Springenberg; Martin Riedmiller; Nir Levine; Rae Jeong; Timothy Mann; Todd Hester; Yuanyuan Shi

arxiv: 1906.07516 · v2 · pith:KPBIPVNPnew · submitted 2019-06-18 · 💻 cs.LG · cs.AI· stat.ML

Robust Reinforcement Learning for Continuous Control with Model Misspecification

Daniel J. Mankowitz , Nir Levine , Rae Jeong , Yuanyuan Shi , Jackie Kay , Abbas Abdolmaleki , Jost Tobias Springenberg , Timothy Mann

show 2 more authors

Todd Hester Martin Riedmiller

This is my paper

classification 💻 cs.LG cs.AIstat.ML

keywords continuouscontrollearningrobustrobustnessadditionalgorithmbellman

0 comments

read the original abstract

We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms. We specifically focus on incorporating robustness into a state-of-the-art continuous control RL algorithm called Maximum a-posteriori Policy Optimization (MPO). We achieve this by learning a policy that optimizes for a worst case expected return objective and derive a corresponding robust entropy-regularized Bellman contraction operator. In addition, we introduce a less conservative, soft-robust, entropy-regularized objective with a corresponding Bellman operator. We show that both, robust and soft-robust policies, outperform their non-robust counterparts in nine Mujoco domains with environment perturbations. In addition, we show improved robust performance on a high-dimensional, simulated, dexterous robotic hand. Finally, we present multiple investigative experiments that provide a deeper insight into the robustness framework. This includes an adaptation to another continuous control RL algorithm as well as learning the uncertainty set from offline data. Performance videos can be found online at https://sites.google.com/view/robust-rl.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Robust Adversarial Policy Optimization Under Dynamics Uncertainty
cs.LG 2026-04 unverdicted novelty 7.0

RAPO uses a dual robust RL formulation with trajectory-level adversarial networks and model-level Boltzmann reweighting over dynamics ensembles to improve policy resilience and out-of-distribution generalization while...
DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty
cs.LG 2025-06 unverdicted novelty 7.0

DR-SAC is the first actor-critic distributionally robust RL algorithm for offline continuous control that derives a convergent robust soft policy iteration and reports up to 9.8x higher rewards than SAC under perturbations.
Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning
cs.LG 2025-02 unverdicted novelty 6.0

Wolfpack attack framework disrupts MARL cooperation by targeting initial and assisting agents; WALL trains robust policies against it with reported experimental gains.