RLlib: Abstractions for Distributed Reinforcement Learning

Eric Liang; Ion Stoica; Joseph E. Gonzalez; Ken Goldberg; Michael I. Jordan; Philipp Moritz; Richard Liaw; Robert Nishihara; Roy Fox

arxiv: 1712.09381 · v4 · pith:DZW7J67Unew · submitted 2017-12-26 · 💻 cs.AI · cs.DC· cs.LG

RLlib: Abstractions for Distributed Reinforcement Learning

Eric Liang , Richard Liaw , Philipp Moritz , Robert Nishihara , Roy Fox , Ken Goldberg , Joseph E. Gonzalez , Michael I. Jordan

show 1 more author

Ion Stoica

This is my paper

classification 💻 cs.AI cs.DCcs.LG

keywords rllibalgorithmscomputationdistributedlearningprimitivesreinforcementabstractions

0 comments

read the original abstract

Reinforcement learning (RL) algorithms involve the deep nesting of highly irregular computation patterns, each of which typically exhibits opportunities for distributed computation. We argue for distributing RL components in a composable way by adapting algorithms for top-down hierarchical control, thereby encapsulating parallelism and resource requirements within short-running compute tasks. We demonstrate the benefits of this principle through RLlib: a library that provides scalable software primitives for RL. These primitives enable a broad range of algorithms to be implemented with high performance, scalability, and substantial code reuse. RLlib is available at https://rllib.io/.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Scalable Option Learning in High-Throughput Environments
cs.LG 2025-08 unverdicted novelty 6.0

SOL is a new hierarchical RL algorithm that reaches 35x higher throughput and outperforms flat agents when trained on 30 billion frames in NetHack while showing positive scaling.
RAMP: Hybrid DRL for Online Learning of Numeric Action Models
cs.AI 2026-04 unverdicted novelty 5.0

RAMP learns numeric action models online via a DRL-planning feedback loop and outperforms PPO on IPC numeric domains in solvability and plan quality.
Gym-V: A Unified Vision Environment System for Agentic Vision Research
cs.CV 2026-03 unverdicted novelty 5.0

Gym-V supplies 179 visual environments showing that observation scaffolding like captions and rules matters more for training success than the choice of RL algorithm.
Reinforcement learning for adaptive interior point methods in convex quadratic programming
math.OC 2025-09 unverdicted novelty 5.0

Reinforcement learning learns a policy that adapts control parameters of a regularized interior-point method, accelerating high-accuracy solutions for convex quadratic programs and generalizing across problem classes ...