StarCraft II: A New Challenge for Reinforcement Learning

Alexander Sasha Vezhnevets; Alireza Makhzani; Anders Ekermo; Anthony Brunasso; David Lawrence; David Silver; Hado van Hasselt; Heinrich K\"uttler; Jacob Repp; John Agapiou

arxiv: 1708.04782 · v1 · pith:4NZFMGQ7new · submitted 2017-08-16 · 💻 cs.LG · cs.AI

StarCraft II: A New Challenge for Reinforcement Learning

Oriol Vinyals , Timo Ewalds , Sergey Bartunov , Petko Georgiev , Alexander Sasha Vezhnevets , Michelle Yeo , Alireza Makhzani , Heinrich K\"uttler

show 17 more authors

John Agapiou Julian Schrittwieser John Quan Stephen Gaffney Stig Petersen Karen Simonyan Tom Schaul Hado van Hasselt David Silver Timothy Lillicrap Kevin Calderone Paul Keet Anthony Brunasso David Lawrence Anders Ekermo Jacob Repp Rodney Tsing

This is my paper

classification 💻 cs.LG cs.AI

keywords gamelearningstarcraftreinforcementagentsdomainenvironmentmain

0 comments

read the original abstract

This paper introduces SC2LE (StarCraft II Learning Environment), a reinforcement learning environment based on the StarCraft II game. This domain poses a new grand challenge for reinforcement learning, representing a more difficult class of problems than considered in most prior work. It is a multi-agent problem with multiple players interacting; there is imperfect information due to a partially observed map; it has a large action space involving the selection and control of hundreds of units; it has a large state space that must be observed solely from raw input feature planes; and it has delayed credit assignment requiring long-term strategies over thousands of steps. We describe the observation, action, and reward specification for the StarCraft II domain and provide an open source Python-based interface for communicating with the game engine. In addition to the main game maps, we provide a suite of mini-games focusing on different elements of StarCraft II gameplay. For the main game maps, we also provide an accompanying dataset of game replay data from human expert players. We give initial baseline results for neural networks trained from this data to predict game outcomes and player actions. Finally, we present initial baseline results for canonical deep reinforcement learning agents applied to the StarCraft II domain. On the mini-games, these agents learn to achieve a level of play that is comparable to a novice player. However, when trained on the main game, these agents are unable to make significant progress. Thus, SC2LE offers a new and challenging environment for exploring deep reinforcement learning algorithms and architectures.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 12 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Approximation-Free Differentiable Oblique Decision Trees
cs.LG 2026-05 unverdicted novelty 7.0

DTSemNet gives an exact, invertible neural-network encoding of hard oblique decision trees that supports direct gradient training for both classification and regression without probabilistic softening or quantized estimators.
Switch-JustDance: Benchmarking Whole Body Motion Tracking Controllers Using a Commercial Console Game
cs.RO 2025-11 unverdicted novelty 7.0

Switch-JustDance turns Nintendo Switch Just Dance into a reproducible benchmark for evaluating humanoid whole-body controllers on real hardware using the game's scoring system.
PyTorch: An Imperative Style, High-Performance Deep Learning Library
cs.LG 2019-12 accept novelty 7.0

PyTorch demonstrates compatibility of imperative Pythonic usability with high performance and accelerator support through its runtime architecture and benchmark results.
On the Measure of Intelligence
cs.AI 2019-11 unverdicted novelty 7.0

Intelligence is skill-acquisition efficiency, and the ARC benchmark measures human-like general fluid intelligence by testing abstraction and reasoning with minimal, innate-like priors.
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL
cs.LG 2026-05 unverdicted novelty 6.0

QHyer achieves state-of-the-art results in offline goal-conditioned RL by replacing return-to-go with a state-conditioned Q-estimator and introducing a gated hybrid attention-mamba backbone for content-adaptive histor...
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL
cs.LG 2026-05 unverdicted novelty 6.0

QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markov...
A Survey of Continual Reinforcement Learning
cs.LG 2025-06 accept novelty 6.0

The paper surveys CRL literature, proposes a taxonomy of methods into four categories based on knowledge storage and transfer, reviews metrics and benchmarks, and outlines challenges and future research directions.
Arena: a toolkit for Multi-Agent Reinforcement Learning
cs.LG 2019-07 accept novelty 6.0

Arena introduces a modular Interface design that extends OpenAI Gym wrappers to support complex multi-agent RL scenarios including self-play and cooperative-competitive interactions.
RAMP: Hybrid DRL for Online Learning of Numeric Action Models
cs.AI 2026-04 unverdicted novelty 5.0

RAMP learns numeric action models online via a DRL-planning feedback loop and outperforms PPO on IPC numeric domains in solvability and plan quality.
IPR-1: Interactive Physical Reasoner
cs.AI 2025-11 unverdicted novelty 5.0

IPR uses world-model rollouts to reinforce a VLM policy via PhysCode on a 1000+ game benchmark, achieving robust physical reasoning that improves with experience and transfers zero-shot to unseen games while surpassing GPT-5.
Reflection of Episodes: Learning to Play Game from Expert and Self Experiences
cs.AI 2025-02 unverdicted novelty 5.0

ROE framework lets LLM defeat Very Hard bot in TextStarCraft II via keyframe selection, expert/self-experience decisions, and post-game reflection for new self-experience.
Why Build an Assistant in Minecraft?
cs.AI 2019-07 unverdicted novelty 4.0

A rationale is presented for developing an assistant in Minecraft to advance natural language understanding and dialogue learning.