NeuPL: Neural Population Learning

Daniel Hennes; Josh Merel; Luke Marris; Nicolas Heess; Siqi Liu; Thore Graepel

arxiv: 2202.07415 · v1 · pith:NZHIKRZLnew · submitted 2022-02-15 · 💻 cs.AI · cs.LG· stat.ML

NeuPL: Neural Population Learning

Siqi Liu , Luke Marris , Daniel Hennes , Josh Merel , Nicolas Heess , Thore Graepel This is my paper

classification 💻 cs.AI cs.LGstat.ML

keywords populationlearningneuplpoliciesneuralacrossgamesissues

0 comments

read the original abstract

Learning in strategy games (e.g. StarCraft, poker) requires the discovery of diverse policies. This is often achieved by iteratively training new policies against existing ones, growing a policy population that is robust to exploit. This iterative approach suffers from two issues in real-world games: a) under finite budget, approximate best-response operators at each iteration needs truncating, resulting in under-trained good-responses populating the population; b) repeated learning of basic skills at each iteration is wasteful and becomes intractable in the presence of increasingly strong opponents. In this work, we propose Neural Population Learning (NeuPL) as a solution to both issues. NeuPL offers convergence guarantees to a population of best-responses under mild assumptions. By representing a population of policies within a single conditional model, NeuPL enables transfer learning across policies. Empirically, we show the generality, improved performance and efficiency of NeuPL across several test domains. Most interestingly, we show that novel strategies become more accessible, not less, as the neural population expands.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Towards Learning Representations of Policies in Two-Player Zero-Sum Imperfect-Information Games
cs.LG 2026-07 unverdicted novelty 4.0

Basic dataset creation, embedding learning, and evaluation tasks on Kuhn and Leduc Poker demonstrate that useful behavioral representations appear in the learned embeddings.