pith. sign in

Deep reinforcement learning from self-play in imperfect-information games.arXiv preprint arXiv:1603.01121

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it
abstract

Many real-world applications can be described as large-scale games of imperfect information. To deal with these challenging domains, prior work has focused on computing Nash equilibria in a handcrafted abstraction of the domain. In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged. In Limit Texas Holdem, a poker game of real-world scale, NFSP learnt a strategy that approached the performance of state-of-the-art, superhuman algorithms based on significant domain expertise.

citation-role summary

background 1

citation-polarity summary

roles

background 1

polarities

background 1

representative citing papers

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self-Play

cs.AI · 2026-05-16 · unverdicted · novelty 6.0

PopuLoRA shows that co-evolving populations of LoRA adapters through cross-evaluated self-play can outperform compute-matched single-agent baselines on multiple code and math reasoning benchmarks.

citing papers explorer

Showing 7 of 7 citing papers.