Superhuman AI for Stratego using self-play reinforcement learning and test-time search.arXiv preprint arXiv:2511.07312, 2025

Samuel Sokota, Eugene Vinitsky, Hengyuan Hu, J Zico Kolter, Gabriele Farina · 2025 · arXiv 2511.07312

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

GAE Falls Short in Imperfect-Information Self-Play Reinforcement Learning

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

GAE suffers from amplified variance in imperfect-info self-play RL; VRPO with Q-boosting and multi-step Expected SARSA(λ) reduces it and improves performance on mid-to-large games.

Data-Augmented Game Starts for Accelerating Self-Play Exploration in Imperfect Information Games

cs.LG · 2026-05-14 · unverdicted · novelty 5.0

DAGS initializes policy-gradient self-play from human-derived intermediate states to reduce exploitability in challenging imperfect-information games, with a multi-task flag fix for resulting bias and new benchmark environments.

citing papers explorer

Showing 2 of 2 citing papers.

GAE Falls Short in Imperfect-Information Self-Play Reinforcement Learning cs.LG · 2026-05-19 · unverdicted · none · ref 6
GAE suffers from amplified variance in imperfect-info self-play RL; VRPO with Q-boosting and multi-step Expected SARSA(λ) reduces it and improves performance on mid-to-large games.
Data-Augmented Game Starts for Accelerating Self-Play Exploration in Imperfect Information Games cs.LG · 2026-05-14 · unverdicted · none · ref 40
DAGS initializes policy-gradient self-play from human-derived intermediate states to reduce exploitability in challenging imperfect-information games, with a multi-task flag fix for resulting bias and new benchmark environments.

Superhuman AI for Stratego using self-play reinforcement learning and test-time search.arXiv preprint arXiv:2511.07312, 2025

fields

years

verdicts

representative citing papers

citing papers explorer