pith. sign in

arxiv: 1906.07516 · v2 · pith:KPBIPVNPnew · submitted 2019-06-18 · 💻 cs.LG · cs.AI· stat.ML

Robust Reinforcement Learning for Continuous Control with Model Misspecification

classification 💻 cs.LG cs.AIstat.ML
keywords continuouscontrollearningrobustrobustnessadditionalgorithmbellman
0
0 comments X
read the original abstract

We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms. We specifically focus on incorporating robustness into a state-of-the-art continuous control RL algorithm called Maximum a-posteriori Policy Optimization (MPO). We achieve this by learning a policy that optimizes for a worst case expected return objective and derive a corresponding robust entropy-regularized Bellman contraction operator. In addition, we introduce a less conservative, soft-robust, entropy-regularized objective with a corresponding Bellman operator. We show that both, robust and soft-robust policies, outperform their non-robust counterparts in nine Mujoco domains with environment perturbations. In addition, we show improved robust performance on a high-dimensional, simulated, dexterous robotic hand. Finally, we present multiple investigative experiments that provide a deeper insight into the robustness framework. This includes an adaptation to another continuous control RL algorithm as well as learning the uncertainty set from offline data. Performance videos can be found online at https://sites.google.com/view/robust-rl.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Robust Adversarial Policy Optimization Under Dynamics Uncertainty

    cs.LG 2026-04 unverdicted novelty 7.0

    RAPO uses a dual robust RL formulation with trajectory-level adversarial networks and model-level Boltzmann reweighting over dynamics ensembles to improve policy resilience and out-of-distribution generalization while...

  2. DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty

    cs.LG 2025-06 unverdicted novelty 7.0

    DR-SAC is the first actor-critic distributionally robust RL algorithm for offline continuous control that derives a convergent robust soft policy iteration and reports up to 9.8x higher rewards than SAC under perturbations.

  3. Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

    cs.LG 2025-02 unverdicted novelty 6.0

    Wolfpack attack framework disrupts MARL cooperation by targeting initial and assisting agents; WALL trains robust policies against it with reported experimental gains.