WEVA uses short CFR warm-up runs to build expected-value feature vectors for k-means clustering, yielding abstractions that reduce exploitability by up to 80% compared with equity- or rank-based methods across three games.
hub
Openspiel: A framework for reinforcement learning in games
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Coopetition-Gym v1 provides twenty calibrated environments for mixed-motive MARL with parameterized private/integrated/cooperative rewards, game-theoretic oracles, and validation against four historical coopetitive cases at 81-98% accuracy.
Introduces a differentiable dual-positive monotone parameterization for multi-segment bids and a framework to measure how close RL electricity market simulations are to Nash equilibrium.
Solly is the first AI to achieve elite human-level play in reduced-format Liar's Poker via self-play actor-critic reinforcement learning, outperforming both world-class humans and large language models on win rate and equity while developing non-exploitable strategies.
Parallel CFR achieves 3.3-3.4x speedup and 47-54 ms per iteration for real-time depth-limited CFR on Heads-Up No-Limit Texas Hold'em with over one billion histories.
GAE suffers from amplified variance in imperfect-info self-play RL; VRPO with Q-boosting and multi-step Expected SARSA(λ) reduces it and improves performance on mid-to-large games.
A sharp threshold at zero reach-weighted contingent action capacity governs whether self-play RL collapses to a deterministic exploitation attractor under asymmetric perturbations.
NashPG is a policy-gradient method with iteratively refined regularization that guarantees monotonic convergence to Nash equilibria in two-player zero-sum extensive-form games and scales to large benchmarks.
VS-Bench is a new benchmark of ten visual multi-agent environments that measures VLMs on element recognition, next-action prediction, and normalized episode return, showing strong perception but large gaps in reasoning and decision-making with the best model at 46.6% prediction accuracy and 31.4% of
A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.
StratFormer uses a two-phase curriculum with dual-turn tokens and bucket-rate features to model and exploit opponents in Leduc Hold'em, gaining +0.106 BB/hand on average over GTO while keeping near-equilibrium safety.
citing papers explorer
-
Effective, Efficient, and General Information Abstraction for Imperfect-Information Extensive-Form Games
WEVA uses short CFR warm-up runs to build expected-value feature vectors for k-means clustering, yielding abstractions that reduce exploitability by up to 80% compared with equity- or rank-based methods across three games.
-
Coopetition-Gym v1: A Formally Grounded Platform for Mixed-Motive Multi-Agent Reinforcement Learning under Strategic Coopetition
Coopetition-Gym v1 provides twenty calibrated environments for mixed-motive MARL with parameterized private/integrated/cooperative rewards, game-theoretic oracles, and validation against four historical coopetitive cases at 81-98% accuracy.
-
A Dual-Positive Monotone Parameterization for Multi-Segment Bids and a Validity Assessment Framework for Reinforcement Learning Agent-based Simulation of Electricity Markets
Introduces a differentiable dual-positive monotone parameterization for multi-segment bids and a framework to measure how close RL electricity market simulations are to Nash equilibrium.
-
Outbidding and Outbluffing Elite Humans: Mastering Liar's Poker via Self-Play and Reinforcement Learning
Solly is the first AI to achieve elite human-level play in reduced-format Liar's Poker via self-play actor-critic reinforcement learning, outperforming both world-class humans and large language models on win rate and equity while developing non-exploitable strategies.
-
Real-Time Parallel Counterfactual Regret Minimization
Parallel CFR achieves 3.3-3.4x speedup and 47-54 ms per iteration for real-time depth-limited CFR on Heads-Up No-Limit Texas Hold'em with over one billion histories.
-
GAE Falls Short in Imperfect-Information Self-Play Reinforcement Learning
GAE suffers from amplified variance in imperfect-info self-play RL; VRPO with Q-boosting and multi-step Expected SARSA(λ) reduces it and improves performance on mid-to-large games.
-
A Structural Threshold in Decision Capacity Governs Collapse in Self-Play Reinforcement Learning
A sharp threshold at zero reach-weighted contingent action capacity governs whether self-play RL collapses to a deterministic exploitation attractor under asymmetric perturbations.
-
NashPG: A Policy Gradient Method with Iteratively Refined Regularization for Finding Nash Equilibria
NashPG is a policy-gradient method with iteratively refined regularization that guarantees monotonic convergence to Nash equilibria in two-player zero-sum extensive-form games and scales to large benchmarks.
-
VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments
VS-Bench is a new benchmark of ten visual multi-agent environments that measures VLMs on element recognition, next-action prediction, and normalized episode return, showing strong perception but large gaps in reasoning and decision-making with the best model at 46.6% prediction accuracy and 31.4% of
-
A High-Throughput Compute-Efficient POMDP Hide-And-Seek-Engine (HASE) for Multi-Agent Operations
A C++ Dec-POMDP simulator using data-oriented design and zero-copy PyTorch integration achieves up to 33 million steps per second on a 16-core CPU, enabling multi-agent policy training in minutes with PPO, DQN, and SAC.
-
StratFormer: Adaptive Opponent Modeling and Exploitation in Imperfect-Information Games
StratFormer uses a two-phase curriculum with dual-turn tokens and bucket-rate features to model and exploit opponents in Leduc Hold'em, gaining +0.106 BB/hand on average over GTO while keeping near-equilibrium safety.
- Verifiable Process Rewards for Agentic Reasoning
- TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning