Openai baselines

Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Radford, John Schulman, Szymon Sidor, Yuhuai Wu, Peter Zhokhov · 2017

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Scalable Option Learning in High-Throughput Environments

cs.LG · 2025-08-30 · unverdicted · novelty 6.0

SOL is a new hierarchical RL algorithm that reaches 35x higher throughput and outperforms flat agents when trained on 30 billion frames in NetHack while showing positive scaling.

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

cs.AI · 2024-05-20 · unverdicted · novelty 6.0

OpenRLHF is a new open-source RLHF framework reporting 1.22x to 1.68x speedups and fewer lines of code than prior systems.

Learning Reward Functions by Integrating Human Demonstrations and Preferences

cs.RO · 2019-06-21 · unverdicted · novelty 6.0

DemPref uses demonstrations to form a coarse reward prior and ground active preference queries, achieving higher efficiency than pure preference learning and higher user preference than IRL in experiments.

citing papers explorer

Showing 3 of 3 citing papers.

Scalable Option Learning in High-Throughput Environments cs.LG · 2025-08-30 · unverdicted · none · ref 12
SOL is a new hierarchical RL algorithm that reaches 35x higher throughput and outperforms flat agents when trained on 30 billion frames in NetHack while showing positive scaling.
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework cs.AI · 2024-05-20 · unverdicted · none · ref 22
OpenRLHF is a new open-source RLHF framework reporting 1.22x to 1.68x speedups and fewer lines of code than prior systems.
Learning Reward Functions by Integrating Human Demonstrations and Preferences cs.RO · 2019-06-21 · unverdicted · none · ref 13
DemPref uses demonstrations to form a coarse reward prior and ground active preference queries, achieving higher efficiency than pure preference learning and higher user preference than IRL in experiments.

Openai baselines

fields

years

verdicts

representative citing papers

citing papers explorer