hub

Openspiel: A framework for reinforcement learning in games

Lanctot, M · 1908 · arXiv 1908.09453

13 Pith papers cite this work. Polarity classification is still indexing.

13 Pith papers citing it

read on arXiv browse 13 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Effective, Efficient, and General Information Abstraction for Imperfect-Information Extensive-Form Games

cs.GT · 2026-05-11 · unverdicted · novelty 7.0

WEVA uses short CFR warm-up runs to build expected-value feature vectors for k-means clustering, yielding abstractions that reduce exploitability by up to 80% compared with equity- or rank-based methods across three games.

Coopetition-Gym v1: A Formally Grounded Platform for Mixed-Motive Multi-Agent Reinforcement Learning under Strategic Coopetition

cs.MA · 2026-05-03 · unverdicted · novelty 7.0

Coopetition-Gym v1 provides twenty calibrated environments for mixed-motive MARL with parameterized private/integrated/cooperative rewards, game-theoretic oracles, and validation against four historical coopetitive cases at 81-98% accuracy.

A Dual-Positive Monotone Parameterization for Multi-Segment Bids and a Validity Assessment Framework for Reinforcement Learning Agent-based Simulation of Electricity Markets

cs.AI · 2026-04-11 · unverdicted · novelty 7.0

Introduces a differentiable dual-positive monotone parameterization for multi-segment bids and a framework to measure how close RL electricity market simulations are to Nash equilibrium.

Outbidding and Outbluffing Elite Humans: Mastering Liar's Poker via Self-Play and Reinforcement Learning

cs.AI · 2025-11-05 · unverdicted · novelty 7.0

Solly is the first AI to achieve elite human-level play in reduced-format Liar's Poker via self-play actor-critic reinforcement learning, outperforming both world-class humans and large language models on win rate and equity while developing non-exploitable strategies.

Real-Time Parallel Counterfactual Regret Minimization

cs.GT · 2026-05-19 · conditional · novelty 6.0

Parallel CFR achieves 3.3-3.4x speedup and 47-54 ms per iteration for real-time depth-limited CFR on Heads-Up No-Limit Texas Hold'em with over one billion histories.

GAE Falls Short in Imperfect-Information Self-Play Reinforcement Learning

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

GAE suffers from amplified variance in imperfect-info self-play RL; VRPO with Q-boosting and multi-step Expected SARSA(λ) reduces it and improves performance on mid-to-large games.

A Structural Threshold in Decision Capacity Governs Collapse in Self-Play Reinforcement Learning

cs.LG · 2026-05-04 · unverdicted · novelty 6.0

A sharp threshold at zero reach-weighted contingent action capacity governs whether self-play RL collapses to a deterministic exploitation attractor under asymmetric perturbations.

NashPG: A Policy Gradient Method with Iteratively Refined Regularization for Finding Nash Equilibria

cs.LG · 2025-10-21 · unverdicted · novelty 6.0

NashPG is a policy-gradient method with iteratively refined regularization that guarantees monotonic convergence to Nash equilibria in two-player zero-sum extensive-form games and scales to large benchmarks.

VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments

cs.AI · 2025-06-03 · unverdicted · novelty 6.0

VS-Bench is a new benchmark of ten visual multi-agent environments that measures VLMs on element recognition, next-action prediction, and normalized episode return, showing strong perception but large gaps in reasoning and decision-making with the best model at 46.6% prediction accuracy and 31.4% of

A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations

cs.MA · 2026-04-29 · unverdicted · novelty 5.0

A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.

StratFormer: Adaptive Opponent Modeling and Exploitation in Imperfect-Information Games

cs.AI · 2026-04-28 · unverdicted · novelty 5.0

StratFormer uses a two-phase curriculum with dual-turn tokens and bucket-rate features to model and exploit opponents in Leduc Hold'em, gaining +0.106 BB/hand on average over GTO while keeping near-equilibrium safety.

Verifiable Process Rewards for Agentic Reasoning

cs.AI · 2026-05-11

TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning

cs.MA · 2026-02-02

citing papers explorer

Showing 13 of 13 citing papers.

Effective, Efficient, and General Information Abstraction for Imperfect-Information Extensive-Form Games cs.GT · 2026-05-11 · unverdicted · none · ref 52
WEVA uses short CFR warm-up runs to build expected-value feature vectors for k-means clustering, yielding abstractions that reduce exploitability by up to 80% compared with equity- or rank-based methods across three games.
Coopetition-Gym v1: A Formally Grounded Platform for Mixed-Motive Multi-Agent Reinforcement Learning under Strategic Coopetition cs.MA · 2026-05-03 · unverdicted · none · ref 18
Coopetition-Gym v1 provides twenty calibrated environments for mixed-motive MARL with parameterized private/integrated/cooperative rewards, game-theoretic oracles, and validation against four historical coopetitive cases at 81-98% accuracy.
A Dual-Positive Monotone Parameterization for Multi-Segment Bids and a Validity Assessment Framework for Reinforcement Learning Agent-based Simulation of Electricity Markets cs.AI · 2026-04-11 · unverdicted · none · ref 29
Introduces a differentiable dual-positive monotone parameterization for multi-segment bids and a framework to measure how close RL electricity market simulations are to Nash equilibrium.
Outbidding and Outbluffing Elite Humans: Mastering Liar's Poker via Self-Play and Reinforcement Learning cs.AI · 2025-11-05 · unverdicted · none · ref 6
Solly is the first AI to achieve elite human-level play in reduced-format Liar's Poker via self-play actor-critic reinforcement learning, outperforming both world-class humans and large language models on win rate and equity while developing non-exploitable strategies.
Real-Time Parallel Counterfactual Regret Minimization cs.GT · 2026-05-19 · conditional · none · ref 51
Parallel CFR achieves 3.3-3.4x speedup and 47-54 ms per iteration for real-time depth-limited CFR on Heads-Up No-Limit Texas Hold'em with over one billion histories.
GAE Falls Short in Imperfect-Information Self-Play Reinforcement Learning cs.LG · 2026-05-19 · unverdicted · none · ref 28
GAE suffers from amplified variance in imperfect-info self-play RL; VRPO with Q-boosting and multi-step Expected SARSA(λ) reduces it and improves performance on mid-to-large games.
A Structural Threshold in Decision Capacity Governs Collapse in Self-Play Reinforcement Learning cs.LG · 2026-05-04 · unverdicted · none · ref 5
A sharp threshold at zero reach-weighted contingent action capacity governs whether self-play RL collapses to a deterministic exploitation attractor under asymmetric perturbations.
NashPG: A Policy Gradient Method with Iteratively Refined Regularization for Finding Nash Equilibria cs.LG · 2025-10-21 · unverdicted · none · ref 19
NashPG is a policy-gradient method with iteratively refined regularization that guarantees monotonic convergence to Nash equilibria in two-player zero-sum extensive-form games and scales to large benchmarks.
VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments cs.AI · 2025-06-03 · unverdicted · none · ref 33
VS-Bench is a new benchmark of ten visual multi-agent environments that measures VLMs on element recognition, next-action prediction, and normalized episode return, showing strong perception but large gaps in reasoning and decision-making with the best model at 46.6% prediction accuracy and 31.4% of
A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations cs.MA · 2026-04-29 · unverdicted · none · ref 17
A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.
StratFormer: Adaptive Opponent Modeling and Exploitation in Imperfect-Information Games cs.AI · 2026-04-28 · unverdicted · none · ref 14
StratFormer uses a two-phase curriculum with dual-turn tokens and bucket-rate features to model and exploit opponents in Leduc Hold'em, gaining +0.106 BB/hand on average over GTO while keeping near-equilibrium safety.
Verifiable Process Rewards for Agentic Reasoning cs.AI · 2026-05-11 · unreviewed · ref 13
TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning cs.MA · 2026-02-02 · unreviewed · ref 10

Openspiel: A framework for reinforcement learning in games

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer