An empirical investigation of the challenges of real-world reinforcement learning
read the original abstract
Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. In this work, we identify and formalize a series of independent challenges that embody the difficulties that must be addressed for RL to be commonly deployed in real-world systems. For each challenge, we define it formally in the context of a Markov Decision Process, analyze the effects of the challenge on state-of-the-art learning algorithms, and present some existing attempts at tackling it. We believe that an approach that addresses our set of proposed challenges would be readily deployable in a large number of real world problems. Our proposed challenges are implemented in a suite of continuous control environments called the realworldrl-suite which we propose an as an open-source benchmark.
This paper has not been read by Pith yet.
Forward citations
Cited by 4 Pith papers
-
AtomComposer: Discovering Chemical Space from First Principles with Reinforcement Learning
AtomComposer uses online RL with multi-composition training to discover up to 10x more valid 3D isomers on unseen chemical formulas than single-composition baselines.
-
D4RL: Datasets for Deep Data-Driven Reinforcement Learning
D4RL supplies new offline RL benchmarks and datasets from expert and mixed sources to expose weaknesses in existing algorithms and standardize evaluation.
-
Distributionally Robust Control via Stein Variational Inference for Contact-Rich Manipulation
Introduces a Stein variational inference-based deterministic formulation for distributionally robust control in contact-rich robotic manipulation, reporting up to 3x improved robustness under parametric uncertainty.
-
Learning to Adapt: Representation-Based Reinforcement Learning for Multi-Task Skill Transfer
RepMT-SAC uses spectral MDP decomposition to build a task-agnostic value-function core plus minimal task adjustment, yielding up to 30% better performance than baselines on quadcopter trajectory tasks with zero-shot i...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.