Robust Constrained Reinforcement Learning for Continuous Control with Model Misspecification

Cosmin Paduraru; Dan A. Calian; Daniel J. Mankowitz; Martin Riedmiller; Nicolas Heess; Rae Jeong; Sumanth Dathathri; Timothy Mann

arxiv: 2010.10644 · v4 · pith:HY2JN4BEnew · submitted 2020-10-20 · 💻 cs.LG · cs.AI· stat.ML

Robust Constrained Reinforcement Learning for Continuous Control with Model Misspecification

Daniel J. Mankowitz , Dan A. Calian , Rae Jeong , Cosmin Paduraru , Nicolas Heess , Sumanth Dathathri , Martin Riedmiller , Timothy Mann This is my paper

classification 💻 cs.LG cs.AIstat.ML

keywords misspecificationconstrainedconstraintscontroldomaineffectslearningmodel

0 comments

read the original abstract

Many real-world physical control systems are required to satisfy constraints upon deployment. Furthermore, real-world systems are often subject to effects such as non-stationarity, wear-and-tear, uncalibrated sensors and so on. Such effects effectively perturb the system dynamics and can cause a policy trained successfully in one domain to perform poorly when deployed to a perturbed version of the same domain. This can affect a policy's ability to maximize future rewards as well as the extent to which it satisfies constraints. We refer to this as constrained model misspecification. We present an algorithm that mitigates this form of misspecification, and showcase its performance in multiple simulated Mujoco tasks from the Real World Reinforcement Learning (RWRL) suite.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Optimistic Policy Learning under Pessimistic Adversaries with Regret and Violation Guarantees
cs.LG 2026-04 unverdicted novelty 8.0

RHC-UCRL is the first algorithm for safety-constrained RL under explicit adversarial dynamics, providing sub-linear regret and constraint violation guarantees by maintaining optimism over both agent and adversary policies.
Near-Optimal Policy Identification in Robust Constrained Markov Decision Processes via Epigraph Form
cs.LG 2024-08 unverdicted novelty 7.0

Presents the first algorithm to identify an ε-optimal policy in robust constrained MDPs via epigraph form and bisection search with Õ(ε^{-4}) robust policy evaluations.
Stationary Robust Mean-Field Games under Model Mismatches
cs.LG 2026-06 unverdicted novelty 6.0

Develops infinite-horizon stationary robust mean-field games incorporating distributional uncertainty, proves equilibrium existence via fixed-point on contractive Bellman operator, gives convergent algorithm, and deri...