pith. sign in

hub Canonical reference

Dota 2 with Large Scale Deep Reinforcement Learning

Canonical reference. 92% of citing Pith papers cite this work as background.

39 Pith papers citing it
Background 92% of classified citations
abstract

On April 13th, 2019, OpenAI Five became the first AI system to defeat the world champions at an esports game. The game of Dota 2 presents novel challenges for AI systems such as long time horizons, imperfect information, and complex, continuous state-action spaces, all challenges which will become increasingly central to more capable AI systems. OpenAI Five leveraged existing reinforcement learning techniques, scaled to learn from batches of approximately 2 million frames every 2 seconds. We developed a distributed training system and tools for continual training which allowed us to train OpenAI Five for 10 months. By defeating the Dota 2 world champion (Team OG), OpenAI Five demonstrates that self-play reinforcement learning can achieve superhuman performance on a difficult task.

hub tools

citation-role summary

background 12 other 1

citation-polarity summary

representative citing papers

ARC-RL: A Reinforcement Learning Playground Inspired by ARC Raiders

cs.RO · 2026-05-19 · unverdicted · novelty 7.0

ARC-RL provides four new MuJoCo continuous-control environments with hexapod and quadruped morphologies inspired by ARC Raiders, a unified multi-component reward without motion capture, CPG expert demonstrators, and empirical comparisons of online and offline-to-online RL algorithms.

ASH: Agents that Self-Hone via Embodied Learning

cs.AI · 2026-05-14 · unverdicted · novelty 7.0

ASH reaches 11.2/12 milestones in Pokemon Emerald and 9.9/12 in Zelda by self-improving via an IDM trained on its own trajectories to label internet video, while baselines plateau at roughly 6/12.

Voyager: An Open-Ended Embodied Agent with Large Language Models

cs.AI · 2023-05-25 · unverdicted · novelty 7.0

Voyager achieves superior lifelong learning in Minecraft by combining an automatic exploration curriculum, a library of executable skills, and iterative LLM prompting with environment feedback, yielding 3.3x more unique items and 15.3x faster milestone unlocks than prior methods while generalizing技能

Heterogeneous Self-Play for Realistic Highway Traffic Simulation

cs.AI · 2026-03-31 · accept · novelty 6.0

PHASE uses heterogeneous self-play and context-conditioned policies to achieve realistic, zero-shot highway traffic simulation that outperforms traditional rule-based and self-play models on real-world datasets.

RAPTOR: A Foundation Policy for Quadrotor Control

cs.RO · 2025-09-15 · unverdicted · novelty 6.0

A 2084-parameter recurrent policy trained by distilling 1000 RL teacher policies enables zero-shot control across 10 real quadrotors differing in mass, motors, frames, propellers, and flight controllers.

VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments

cs.AI · 2025-06-03 · unverdicted · novelty 6.0

VS-Bench is a new benchmark of ten visual multi-agent environments that measures VLMs on element recognition, next-action prediction, and normalized episode return, showing strong perception but large gaps in reasoning and decision-making with the best model at 46.6% prediction accuracy and 31.4% of

citing papers explorer

Showing 39 of 39 citing papers.