pith. sign in

arxiv: 2601.19562 · v2 · pith:GYHEVPNFnew · submitted 2026-01-27 · 💻 cs.NE

Tournament Informed Adversarial Quality Diversity

Pith reviewed 2026-05-21 14:36 UTC · model grok-4.3

classification 💻 cs.NE
keywords quality diversityadversarial optimizationMAP-Elitestournament selectioncoevolutionevolutionary algorithmsmulti-agent games
0
0 comments X

The pith

Tournament-informed task selection improves quality and diversity in adversarial quality diversity algorithms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to improve how quality diversity algorithms handle adversarial problems, where the performance of one side depends directly on the other. It builds on the Generational Adversarial MAP-Elites method, which alternates evolution between two opposing populations using previous solutions as tasks. The authors introduce an inter-variants tournament that ranks entire sets of solutions using six quality and diversity measures to enable fair comparisons. They also propose two new ways to pick the next tasks based on those tournament outcomes instead of a purely behavioral rule. Evaluations on Pong, a cat-and-mouse game, and a pursuers-and-evaders game show the tournament-informed selection produces stronger and more varied solutions on both sides.

Core claim

By replacing behavioral task selection with methods guided by the outcomes of inter-variants tournaments, the algorithm achieves higher adversarial quality and greater behavioral diversity across generations in coevolutionary settings. The tournament uses six measures to compare solution sets from different variants without the side-dependency bias that affects standard QD metrics.

What carries the argument

Tournament-informed task selection methods that rank opposing elites using six quality and diversity measures from an inter-variants tournament and then choose the next tasks accordingly.

If this is right

  • Solution sets on each side of an adversarial problem reach higher performance against a wider range of opponents.
  • Direct comparison of different algorithmic variants becomes possible without relying on metrics that favor one side.
  • The same selection approach can be inserted into other multi-task quality diversity frameworks that face opponent dependence.
  • Over generations the populations maintain both high fitness and spread-out behaviors even as the opposing side improves.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach may transfer to non-game adversarial domains such as robust control or security strategy generation where one agent must counter another.
  • If the tournament ranking proves stable, it could replace manual metric tuning in other coevolutionary algorithms.
  • Testing whether the same gains appear when the number of measures inside the tournament is reduced would clarify which components drive the improvement.

Load-bearing premise

The six quality and diversity measures inside the inter-variants tournament give a fair, unbiased ranking of solution sets that removes the comparison problems caused by side dependencies.

What would settle it

Running the original behavioral task selection and the new tournament-informed versions on the same three game domains and finding no consistent gain in the six tournament measures would show the claimed improvement does not hold.

Figures

Figures reproduced from arXiv: 2601.19562 by Alain Jaquier, Alexandre Manai, Florian Turati, Franck Legendre, Meriem Elhosni, Noah Syrkis, Sebastian Risi, Timoth\'ee Anne.

Figure 1
Figure 1. Figure 1: GAME is a coevolutionary QD algorithm that illu [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: To illuminate an adversarial problem, GAME selects elites from the previous generation and sets them as opposing [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Pretty (top) and one-frame (bottom) visualization [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
read the original abstract

Quality diversity (QD) is a branch of evolutionary computation that seeks high-quality and behaviorally diverse solutions to a problem. While adversarial problems are common, classical QD cannot be easily applied to them, as both the fitness and the behavior depend on the opposing solutions. Recently, Generational Adversarial MAP-Elites (GAME) has been proposed to coevolve both sides of an adversarial problem by alternating the execution of a multi-task QD algorithm against previous elites, called tasks. The original algorithm selects new tasks based on a behavioral criterion, which may lead to undesired dynamics due to inter-side dependencies. In addition, comparing sets of solutions cannot be done directly using classical QD measures due to side dependencies. In this paper, we (1) use an inter-variants tournament to compare the sets of solutions, ensuring a fair comparison, with 6 measures of quality and diversity, and (2) propose two tournament-informed task selection methods to promote higher quality and diversity at each generation. We evaluate the variants across three adversarial problems: Pong, a Cat-and-mouse game, and a Pursuers-and-evaders game. We show that the tournament-informed task selection method leads to higher adversarial quality and diversity. We hope that this work will help further advance adversarial quality diversity. Code, videos, and supplementary material are available at https://github.com/Timothee-ANNE/GAME_tournament_informed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper extends Generational Adversarial MAP-Elites (GAME) by introducing two tournament-informed task selection methods that use an inter-variants tournament over six quality and diversity measures. This addresses side-dependency issues in task selection for adversarial QD and enables fair comparison of solution sets. The authors evaluate the variants on Pong, a Cat-and-mouse game, and a Pursuers-and-evaders game, claiming that the tournament-informed selection produces higher adversarial quality and diversity than prior GAME variants. Code, videos, and supplementary material are provided.

Significance. If the empirical gains hold under independent validation, the work would meaningfully advance adversarial quality diversity by providing a practical mechanism for task selection that mitigates inter-side dependencies. The open release of code and the use of multiple game benchmarks are positive for reproducibility and generalizability within evolutionary computation.

major comments (2)
  1. [Methods and Evaluation sections] The inter-variants tournament is employed both to select tasks during evolution (to promote quality and diversity) and to perform the final comparison of solution sets across variants. This dual usage creates a risk that reported improvements are partly an artifact of alignment with the chosen six measures rather than an independent demonstration of stronger archives. The manuscript should clarify whether final rankings rely on the same tournament protocol or on an external validation set of adversaries (see the description of the comparison challenge and the evaluation protocol).
  2. [Experimental results] The abstract states positive outcomes on three distinct adversarial problems, yet the support for the central claim would be strengthened by explicit reporting of statistical tests, number of independent runs, effect sizes, and ablation studies that isolate the contribution of individual tournament measures. Without these, it remains unclear whether the gains generalize beyond the internal comparison protocol.
minor comments (2)
  1. [Abstract] The abstract could include one or two quantitative highlights (e.g., average improvement in a key measure) to give readers an immediate sense of effect magnitude.
  2. [Methods] Notation for the six quality/diversity measures should be introduced consistently in the main text and cross-referenced to the supplementary material for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the detailed review and constructive feedback on our manuscript. We appreciate the opportunity to clarify and strengthen our work on tournament-informed adversarial quality diversity. Below, we provide point-by-point responses to the major comments.

read point-by-point responses
  1. Referee: [Methods and Evaluation sections] The inter-variants tournament is employed both to select tasks during evolution (to promote quality and diversity) and to perform the final comparison of solution sets across variants. This dual usage creates a risk that reported improvements are partly an artifact of alignment with the chosen six measures rather than an independent demonstration of stronger archives. The manuscript should clarify whether final rankings rely on the same tournament protocol or on an external validation set of adversaries (see the description of the comparison challenge and the evaluation protocol).

    Authors: We thank the referee for highlighting this important methodological consideration. The inter-variants tournament is used both for task selection during evolution and for final comparison of solution sets because it provides a fair, side-independent evaluation mechanism, which directly addresses the inter-side dependency issues discussed in the paper. In the revised manuscript, we will add explicit clarification in the Methods and Evaluation sections stating that final rankings rely on the same tournament protocol, as it is the appropriate and consistent approach for comparing variants without introducing external biases. We will also expand the discussion to explain why this does not constitute an artifact, emphasizing that the six measures are standard in the QD literature and applied uniformly. To further address the concern, we will include additional analysis in the supplementary material comparing performance against a fixed external set of adversaries where feasible. revision: partial

  2. Referee: [Experimental results] The abstract states positive outcomes on three distinct adversarial problems, yet the support for the central claim would be strengthened by explicit reporting of statistical tests, number of independent runs, effect sizes, and ablation studies that isolate the contribution of individual tournament measures. Without these, it remains unclear whether the gains generalize beyond the internal comparison protocol.

    Authors: We agree that greater statistical transparency would strengthen the results section. The experiments were performed with 10 independent runs per variant per problem, as noted in the supplementary material, but we acknowledge that explicit reporting of tests, effect sizes, and ablations is not detailed in the main text. In the revision, we will add this information directly to the Experimental results section, including statistical tests (e.g., Wilcoxon rank-sum with corrections), effect sizes, and an ablation isolating the contribution of each tournament measure. This will better demonstrate that the observed gains in quality and diversity generalize across the three problems. revision: yes

Circularity Check

0 steps flagged

Empirical evaluation of algorithmic variants on external benchmarks shows no definitional circularity

full rationale

The paper presents an empirical study of new task-selection variants within the GAME framework, evaluated on standard external game benchmarks (Pong, Cat-and-mouse, Pursuers-and-evaders). The central claim rests on direct performance comparisons of solution sets rather than any derivation or equation that reduces a prediction to a fitted parameter or self-referential definition. Although the inter-variant tournament supplies both the selection criterion and the final quality/diversity measures, this is a consistent methodological protocol introduced to address acknowledged side-dependency problems; it does not create a load-bearing self-definition or fitted-input loop because the reported gains are measured against baseline variants on independent game environments. No self-citation chain is invoked to justify uniqueness, and the work remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard domain assumptions of evolutionary computation and quality diversity without introducing new fitted parameters or postulated entities; the algorithmic changes are the primary addition.

axioms (1)
  • domain assumption Coevolutionary search can produce high-quality diverse solutions when task selection accounts for inter-side dependencies in adversarial problems.
    This underpins the motivation for moving beyond behavioral task selection in GAME.

pith-pipeline@v0.9.0 · 5797 in / 1175 out tokens · 52136 ms · 2026-05-21T14:36:46.718415+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 3 internal anchors

  1. [1]

    Timothée Anne and Jean-Baptiste Mouret. 2023. Multi-task multi-behavior map- elites. InProceedings of the companion conference on genetic and evolutionary computation. 111–114

  2. [2]

    Timothée Anne, Noah Syrkis, Meriem Elhosni, Florian Turati, Franck Legendre, Alain Jaquier, and Sebastian Risi. 2025. Generational Adversarial MAP-Elites for Multi-Agent Game Illumination. InArtificial Life Conference Proceedings 37, Vol. 2025. MIT Press One Rogers Street, Cambridge, MA 02142-1209, USA journals-info . . . , 31

  3. [3]

    Bowen Baker, Ingmar Kanitscheider, Todor Markov, Yi Wu, Glenn Powell, Bob McGrew, and Igor Mordatch. 2019. Emergent tool use from multi-agent autocur- ricula. InInternational conference on learning representations

  4. [4]

    Mark A Bedau, John S McCaskill, Norman H Packard, Steen Rasmussen, Chris Adami, David G Green, Takashi Ikegami, Kunihiko Kaneko, and Thomas S Ray

  5. [5]

    Open problems in artificial life.Artificial life6, 4 (2000), 363–376

  6. [6]

    Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, et al. 2019. Dota 2 with large scale deep reinforcement learning.arXiv preprint arXiv:1912.06680(2019)

  7. [7]

    James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, George Necula, Adam Paszke, Jake VanderPlas, Skye Wanderman-Milne, and Qiao Zhang. 2018. JAX: composable transformations of Python+NumPy programs

  8. [8]

    Loïc Brevault and Mathieu Balesdent. 2024. Bayesian Quality-Diversity ap- proaches for constrained optimization problems with mixed continuous, discrete and categorical variables.Engineering Applications of Artificial Intelligence133 (2024), 108118

  9. [9]

    Estelle Chigot and Dennis G Wilson. 2022. Coevolution of neural networks for agents and environments. InProceedings of the Genetic and Evolutionary Computation Conference Companion. 2306–2309

  10. [10]

    Antoine Cully, Jeff Clune, Danesh Tarapore, and Jean-Baptiste Mouret. 2015. Robots that can adapt like animals.Nature521, 7553 (2015), 503–507

  11. [11]

    Quy-Anh Dang, Chris Ngo, and Truong-Son Hy. 2025. RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search.arXiv preprint arXiv:2504.15047(2025)

  12. [12]

    Richard Dawkins and John Richard Krebs. 1979. Arms races between and within species.Proceedings of the Royal Society of London. Series B. Biological Sciences 205, 1161 (1979), 489–511

  13. [13]

    Kalyanmoy Deb and Himanshu Jain. 2013. An evolutionary many-objective opti- mization algorithm using reference-point-based nondominated sorting approach, part I: solving problems with box constraints.IEEE transactions on evolutionary computation18, 4 (2013), 577–601

  14. [14]

    Aaron Dharna, Cong Lu, and Jeff Clune. 2024. Quality-Diversity Self-Play: Open- Ended Strategy Innovation via Foundation Models. InNeurIPS 2024 Workshop on Open-World Agents

  15. [15]

    Alan Dorin and Susan Stepney. 2024. What Is Artificial Life Today, and Where Should It Go? 15 pages

  16. [16]

    Maxence Faldor, Jenny Zhang, Antoine Cully, and Jeff Clune. 2024. OMNI- EPIC: Open-endedness via Models of human Notions of Interestingness with Environments Programmed in Code. InThe Thirteenth International Conference on Learning Representations

  17. [17]

    Sevan G Ficici and Jordan B Pollack. 1998. Challenges in coevolutionary learning: Arms-race dynamics, open-endedness, and mediocre stable states. InProceedings of the sixth international conference on Artificial life. MIT Press Cambridge, MA, 238–247

  18. [18]

    Matthew C Fontaine, Scott Lee, Lisa B Soros, Fernando de Mesentier Silva, Julian Togelius, and Amy K Hoover. 2019. Mapping hearthstone deck spaces through map-elites with sliding boundaries. InProceedings of The Genetic and Evolutionary Computation Conference. 161–169

  19. [19]

    Matthew C Fontaine, Julian Togelius, Stefanos Nikolaidis, and Amy K Hoover

  20. [20]

    InProceedings of the 2020 genetic and evolutionary computation conference

    Covariance matrix adaptation for the rapid illumination of behavior space. InProceedings of the 2020 genetic and evolutionary computation conference. 94–102

  21. [21]

    Deep Ganguli, Liane Lovitt, Jackson Kernion, Amanda Askell, Yuntao Bai, Saurav Kadavath, Ben Mann, Ethan Perez, Nicholas Schiefer, Kamal Ndousse, et al. 2022. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned.arXiv preprint arXiv:2209.07858(2022)

  22. [22]

    Daniele Gravina, Ahmed Khalifa, Antonios Liapis, Julian Togelius, and Georgios N Yannakakis. 2019. Procedural content generation through quality diversity. In 2019 IEEE Conference on Games (CoG). IEEE, 1–8

  23. [23]

    1999.Differential games: a mathematical theory with applications to warfare and pursuit, control and optimization

    Rufus Isaacs. 1999.Differential games: a mathematical theory with applications to warfare and pursuit, control and optimization. Courier Corporation

  24. [24]

    Yibin Jiang, Daniel Salley, Abhishek Sharma, Graham Keenan, Margaret Mullin, and Leroy Cronin. 2022. An artificial intelligence enabled chemical synthesis robot for exploration and optimization of nanomaterials.Science advances8, 40 (2022), eabo2626

  25. [25]

    Akarsh Kumar, Ryan Bahlous-Boldi, Prafull Sharma, Phillip Isola, Sebastian Risi, Yujin Tang, and David Ha. 2026. Digital Red Queen: Adversarial Program Evolution in Core War with LLMs.arXiv preprint arXiv:2601.03335(2026)

  26. [26]

    Joel Lehman and Kenneth O Stanley. 2011. Evolving a diversity of virtual creatures through novelty search and local competition. InProceedings of the 13th annual conference on Genetic and evolutionary computation. 211–218

  27. [27]

    Jean-Baptiste Mouret and Jeff Clune. 2015. Illuminating search spaces by mapping elites.arXiv preprint arXiv:1504.04909(2015)

  28. [28]

    Jean-Baptiste Mouret and Glenn Maguire. 2020. Quality diversity for multi-task optimization. InProceedings of the 2020 Genetic and Evolutionary Computation Conference. 121–129

  29. [29]

    Thai Huy Nguyen and Ngoc Hoang Luong. 2025. Diversifying Adversarial Attacks on Text-to-image Generation. InProceedings of the Genetic and Evolutionary Computation Conference Companion. 315–318

  30. [30]

    Justin K Pugh, Lisa B Soros, and Kenneth O Stanley. 2016. Quality diversity: A new frontier for evolutionary computation.Frontiers in Robotics and AI3 (2016), 40

  31. [31]

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. InInternational conference on machine learning. PmLR, 8748–8763

  32. [32]

    Mikayel Samvelyan, Davide Paglieri, Minqi Jiang, Jack Parker-Holder, and Tim Rocktäschel. 2024. Multi-agent diagnostics for robustness via illuminated diver- sity.arXiv preprint arXiv:2401.13460(2024)

  33. [33]

    Raparthy, Andrei Lupu, Eric Hambro, Aram H

    Mikayel Samvelyan, S. Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktaschel, and Roberta Raileanu. 2024. Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts.ArXivabs/2402.16822 (2024). https: //api.semanticscholar.org/CorpusId:268031888

  34. [34]

    David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, et al. 2017. Mastering the game of go without human knowledge.nature550, 7676 (2017), 354–359

  35. [35]

    Kirby Steckel and Jacob Schrum. 2021. Illuminating the space of beatable lode runner levels produced by various generative adversarial networks. InProceedings of the genetic and evolutionary computation conference companion. 111–112

  36. [36]

    Oriol Vinyals, Igor Babuschkin, Wojciech M Czarnecki, Michaël Mathieu, An- drew Dudzik, Junyoung Chung, David H Choi, Richard Powell, Timo Ewalds, Petko Georgiev, et al. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning.nature575, 7782 (2019), 350–354

  37. [37]

    Rui Wang, Joel Lehman, Jeff Clune, and Kenneth O Stanley. 2019. Poet: open- ended coevolution of environments and their optimized solutions. InProceedings of the Genetic and Evolutionary Computation Conference. 142–151. Preprint, submitted to Gecco ’26, in January 2026 Anne et al

  38. [38]

    Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeffrey Clune, and Kenneth Stanley. 2020. Enhanced poet: Open-ended reinforcement learning through unbounded invention of learning challenges and their solutions. In International conference on machine learning. PMLR, 9940–9951

  39. [39]

    Ren-Jian Wang, Ke Xue, Zeyu Qin, Ziniu Li, Sheng Tang, Hao-Tian Li, Shengcai Liu, and Chao Qian. 2025. Quality-Diversity Red-Teaming: Automated Generation of High-Quality and Diverse Attackers for Large Language Models.arXiv preprint arXiv:2506.07121(2025)

  40. [40]

    Ruize Zhang, Zelai Xu, Chengdong Ma, Chao Yu, Wei-Wei Tu, Wenhao Tang, Shiyu Huang, Deheng Ye, Wenbo Ding, Yaodong Yang, et al. 2024. A survey on self-play methods in reinforcement learning.arXiv preprint arXiv:2408.01072 (2024)