OpenSpiel: A Framework for Reinforcement Learning in Games

Bart De Vylder; Brennan Saeta; Daniel Hennes; David Ding; Dustin Morrill; Edward Hughes; Edward Lockhart; Finbarr Timbers; Ivo Danihelka; James Bradbury

arxiv: 1908.09453 · v6 · pith:IM2TDZG3new · submitted 2019-08-26 · 💻 cs.LG · cs.AI· cs.GT· cs.MA

OpenSpiel: A Framework for Reinforcement Learning in Games

Marc Lanctot , Edward Lockhart , Jean-Baptiste Lespiau , Vinicius Zambaldi , Satyaki Upadhyay , Julien P\'erolat , Sriram Srinivasan , Finbarr Timbers

show 19 more authors

Karl Tuyls Shayegan Omidshafiei Daniel Hennes Dustin Morrill Paul Muller Timo Ewalds Ryan Faulkner J\'anos Kram\'ar Bart De Vylder Brennan Saeta James Bradbury David Ding Sebastian Borgeaud Matthew Lai Julian Schrittwieser Thomas Anthony Edward Hughes Ivo Danihelka Jonah Ryan-Davis

This is my paper

classification 💻 cs.LG cs.AIcs.GTcs.MA

keywords learningopenspielgamesreinforcementalgorithmsenvironmentssearchacross

0 comments

read the original abstract

OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games. OpenSpiel supports n-player (single- and multi- agent) zero-sum, cooperative and general-sum, one-shot and sequential, strictly turn-taking and simultaneous-move, perfect and imperfect information games, as well as traditional multiagent environments such as (partially- and fully- observable) grid worlds and social dilemmas. OpenSpiel also includes tools to analyze learning dynamics and other common evaluation metrics. This document serves both as an overview of the code base and an introduction to the terminology, core concepts, and algorithms across the fields of reinforcement learning, computational game theory, and search.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Generalized Intention Modeling in Multi-Agent Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 7.0

Introduces a generalized intention modeling framework in multi-agent RL using a mixture of intent representations and a mutual information-based intent measure that improves or matches state-of-the-art performance.
Effective, Efficient, and General Information Abstraction for Imperfect-Information Extensive-Form Games
cs.GT 2026-05 unverdicted novelty 7.0

WEVA uses short CFR warm-up runs to build expected-value feature vectors for k-means clustering, yielding abstractions that reduce exploitability by up to 80% compared with equity- or rank-based methods across three games.
Coopetition-Gym v1: A Formally Grounded Platform for Mixed-Motive Multi-Agent Reinforcement Learning under Strategic Coopetition
cs.MA 2026-05 unverdicted novelty 7.0

Coopetition-Gym v1 provides twenty calibrated environments for mixed-motive MARL with parameterized private/integrated/cooperative rewards, game-theoretic oracles, and validation against four historical coopetitive ca...
A Dual-Positive Monotone Parameterization for Multi-Segment Bids and a Validity Assessment Framework for Reinforcement Learning Agent-based Simulation of Electricity Markets
cs.AI 2026-04 unverdicted novelty 7.0

Introduces a differentiable dual-positive monotone parameterization for multi-segment bids and a framework to measure how close RL electricity market simulations are to Nash equilibrium.
Outbidding and Outbluffing Elite Humans: Mastering Liar's Poker via Self-Play and Reinforcement Learning
cs.AI 2025-11 unverdicted novelty 7.0

Solly is the first AI to achieve elite human-level play in reduced-format Liar's Poker via self-play actor-critic reinforcement learning, outperforming both world-class humans and large language models on win rate and...
EMAgnet: Parameter-Space EMA Regularization for Policy Gradient Self-Play in Large Games
cs.LG 2026-06 unverdicted novelty 6.0

EMAgnet replaces uniform-magnet regularization in PPO self-play with an EMA of last-iterate policy parameters and reports lower exploitability on most tested zero-sum benchmarks, especially those with dominated strategies.
Real-Time Parallel Counterfactual Regret Minimization
cs.GT 2026-05 conditional novelty 6.0

Parallel CFR achieves 3.3-3.4x speedup and 47-54 ms per iteration for real-time depth-limited CFR on Heads-Up No-Limit Texas Hold'em with over one billion histories.
GAE Falls Short in Imperfect-Information Self-Play Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 6.0

GAE suffers from amplified variance in imperfect-info self-play RL; VRPO with Q-boosting and multi-step Expected SARSA(λ) reduces it and improves performance on mid-to-large games.
Verifiable Process Rewards for Agentic Reasoning
cs.AI 2026-05 unverdicted novelty 6.0

Verifiable Process Rewards (VPR) converts symbolic oracles into dense turn-level supervision for reinforcement learning in agentic reasoning, outperforming outcome-only rewards and transferring to general benchmarks.
Verifiable Process Rewards for Agentic Reasoning
cs.AI 2026-05 unverdicted novelty 6.0

VPR converts symbolic, constraint, or posterior oracles into dense turn-level rewards for RL, improving credit assignment in agentic reasoning and transferring to general benchmarks.
A Structural Threshold in Decision Capacity Governs Collapse in Self-Play Reinforcement Learning
cs.LG 2026-05 unverdicted novelty 6.0

A sharp threshold at zero reach-weighted contingent action capacity governs whether self-play RL collapses to a deterministic exploitation attractor under asymmetric perturbations.
TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning
cs.MA 2026-02 unverdicted novelty 6.0

Presents TABX, a modular JAX-accelerated sandbox simulator enabling customizable multi-agent tasks and high-throughput evaluation for cooperative MARL.
NashPG: A Policy Gradient Method with Iteratively Refined Regularization for Finding Nash Equilibria
cs.LG 2025-10 unverdicted novelty 6.0

NashPG is a policy-gradient method with iteratively refined regularization that guarantees monotonic convergence to Nash equilibria in two-player zero-sum extensive-form games and scales to large benchmarks.
VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments
cs.AI 2025-06 unverdicted novelty 6.0

VS-Bench is a new benchmark of ten visual multi-agent environments that measures VLMs on element recognition, next-action prediction, and normalized episode return, showing strong perception but large gaps in reasonin...
Robots Need More than VLA and World Models
cs.RO 2026-06 unverdicted novelty 5.0

The paper identifies four missing interfaces (data autolabelling, embodiment retargeting, physics-grounded world models, and video-based reward inference) as the central bottleneck beyond VLA scaling for robot intelligence.
A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations
cs.MA 2026-04 unverdicted novelty 5.0

A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.
StratFormer: Adaptive Opponent Modeling and Exploitation in Imperfect-Information Games
cs.AI 2026-04 unverdicted novelty 5.0

StratFormer uses a two-phase curriculum with dual-turn tokens and bucket-rate features to model and exploit opponents in Leduc Hold'em, gaining +0.106 BB/hand on average over GTO while keeping near-equilibrium safety.
Towards Learning Representations of Policies in Two-Player Zero-Sum Imperfect-Information Games
cs.LG 2026-07 unverdicted novelty 4.0

Basic dataset creation, embedding learning, and evaluation tasks on Kuhn and Leduc Poker demonstrate that useful behavioral representations appear in the learned embeddings.
Distilling Game Code World Model Generation into Lightweight Large Language Models
cs.AI 2026-05 unverdicted novelty 4.0

SFT followed by RLVR on Qwen2.5-3B-Instruct raises syntactic and execution correctness when generating Game Code World Models across 30 games.