Best Agent Identification for General Game Playing

Alex Newcombe; Dennis Soemers; Eric Piette; Matthew Stephenson

arxiv: 2507.00451 · v2 · submitted 2025-07-01 · 💻 cs.LG · cs.AI· cs.DS· cs.IT· math.IT· stat.ML

Best Agent Identification for General Game Playing

Matthew Stephenson , Alex Newcombe , Eric Piette , Dennis Soemers This is my paper

Pith reviewed 2026-05-19 07:08 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.DScs.ITmath.ITstat.ML

keywords best arm identificationmulti-armed banditsgeneral game playingsimple regretagent evaluationGVGAILudiimulti-task selection

0 comments

The pith

An optimistic confidence-interval ranking across all games identifies the best agent per task while reducing overall simple regret.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats the problem of picking strong agents for many different games as a collection of best-arm identification tasks, one bandit per game. It introduces a global ranking rule that uses per-arm confidence intervals to prioritize selections likely to affect the total simple regret across the whole set of games. This produces lower average simple regret and lower error probability than earlier multi-armed-bandit algorithms when tested on the GVGAI and Ludii frameworks with limited trials. The approach directly improves how reliably we can evaluate and choose algorithms in any domain where each run is expensive.

Core claim

The central claim is that ranking every algorithm across every game by the width and location of its confidence interval, then always sampling the arm whose interval most influences the aggregate simple regret, yields a procedure that finds near-optimal agents faster and more reliably than previous best-arm methods.

What carries the argument

Optimistic selection process that ranks each arm across all bandits according to its potential to influence overall simple regret, using a chosen confidence interval.

If this is right

Fewer total agent evaluations are needed to reach a given level of identification accuracy across a suite of games.
Agent-selection procedures in GVGAI and Ludii become more accurate under a fixed budget of trials.
The same ranking logic applies to any multi-task domain where each algorithm evaluation is costly.
Simple regret and probability of error both drop measurably compared with earlier best-arm identification algorithms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If games share latent structure, explicitly modeling those correlations could further tighten the confidence intervals and reduce regret still more.
The method could be combined with transfer-learning techniques that reuse data from one game to shrink intervals on related games.
In domains with continuous performance measures rather than win rates, the same optimistic ranking could be applied by replacing the confidence interval with an appropriate uncertainty estimate.

Load-bearing premise

Performance observations on different games are sufficiently independent that one global ranking rule based on per-arm confidence intervals can reliably minimize total simple regret without modeling cross-game correlations.

What would settle it

A controlled experiment on a set of games whose agent performances are known to be strongly correlated across tasks, showing that the method no longer reduces aggregate regret relative to baselines once those correlations are present.

Figures

Figures reproduced from arXiv: 2507.00451 by Alex Newcombe, Dennis Soemers, Eric Piette, Matthew Stephenson.

**Figure 2.** Figure 2: Average simple regret for each Best Arm Identification algorithm when applied to [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗

**Figure 3.** Figure 3: Average simple regret for each Best Arm Identification algorithm when applied [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗

**Figure 4.** Figure 4: Average simple regret for each GapE hyperparameter value when applied to the [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗

**Figure 5.** Figure 5: Average simple regret for each GapE-V hyperparameter value when applied to the [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: Average simple regret for each UCB-E hyperparameter value when applied to the [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Average simple regret for each Optimistic-WS hyperparameter value when applied [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Average simple regret for each GapE hyperparameter when applied to the GVGAI [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

**Figure 9.** Figure 9: Average simple regret for each GapE-V hyperparameter when applied to the [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗

**Figure 10.** Figure 10: Average simple regret for each UCB-E hyperparameter when applied to the [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗

**Figure 11.** Figure 11: Average simple regret for each Optimistic-WS hyperparameter when applied to [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗

**Figure 12.** Figure 12: Average simple regret for each GapE hyperparameter when applied to the Ludii [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗

**Figure 13.** Figure 13: Average simple regret for each GapE-V hyperparameter when applied to the [PITH_FULL_IMAGE:figures/full_fig_p022_13.png] view at source ↗

**Figure 14.** Figure 14: Average simple regret for each UCB-E hyperparameter when applied to the Ludii [PITH_FULL_IMAGE:figures/full_fig_p023_14.png] view at source ↗

**Figure 15.** Figure 15: Average simple regret for each Optimistic-WS hyperparameter when applied to [PITH_FULL_IMAGE:figures/full_fig_p023_15.png] view at source ↗

read the original abstract

We present an efficient and generalised procedure to accurately identify the best (or near best) performing algorithm for each sub-task in a multi-problem domain. Our approach treats this as a set of best arm identification problems for multi-armed bandits, where each bandit corresponds to a specific task and each arm corresponds to a specific algorithm or agent. We propose an optimistic selection process based on a chosen confidence interval, that ranks each arm across all bandits in terms of their potential to influence our overall simple regret. We evaluate the performance of our approach on two of the most popular general game playing domains, the General Video Game AI (GVGAI) framework and the Ludii general game playing system, with the goal of selecting a high-performing agent for each game using a limited number of available trials. Compared to previous best arm identification algorithms for multi-armed bandits, our results demonstrate a substantial performance improvement in terms of average simple regret and average probability of error. This novel approach can be used to significantly improve the quality and accuracy of agent evaluation procedures for general game frameworks, as well as other multi-task domains with high algorithm runtimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's optimistic cross-bandit ranking improves empirical regret in GGP agent selection but may falter if game performances correlate.

read the letter

The punchline for this paper is a new optimistic selection rule that ranks agents across multiple games by their potential impact on total simple regret, using per-arm confidence intervals, and it reports better performance than standard best-arm identification on GVGAI and Ludii. What stands out is how they frame the multi-game agent selection as a collection of bandit problems and then optimize the next trial choice by looking at the global influence rather than per-game. This directly targets the cost of evaluations in general game playing, where you want to find good agents without wasting runs on poor ones across the board. The empirical results show reductions in average simple regret and probability of error, which is useful if it holds up. They do a solid job presenting a generalized procedure that could apply beyond GGP to other multi-task settings with expensive tests. The soft spot is the reliance on independence between games for the aggregate regret calculation. The ranking uses individual confidence intervals without modeling any cross-game correlations, but in GVGAI and Ludii many games share mechanics or similar agent strengths, so positive correlations are likely. This could mean the optimistic estimates overstate or understate the value of certain trials, and without covariance handling or checks for robustness, the gains might be overstated in correlated settings. Also, the abstract level description leaves me wanting more on how the confidence intervals are chosen and the exact trial counts or significance tests used. This work is for folks doing agent benchmarking in general game frameworks or anyone needing efficient algorithm selection across related tasks. A reader focused on practical bandit applications would find the comparisons helpful. It deserves a serious referee because the idea is fresh enough and the experiments are on real domains, though revisions would likely be needed to address the dependence issue and add more methodological detail.

Referee Report

2 major / 2 minor

Summary. The manuscript frames best-agent identification across multiple games as a collection of best-arm identification problems, one bandit per game. It introduces an optimistic ranking procedure that uses per-arm confidence intervals to order trials by their estimated contribution to aggregate simple regret, then evaluates the procedure on GVGAI and Ludii, reporting lower average simple regret and lower error probability than prior BAI algorithms.

Significance. If the reported gains are statistically reliable, the method offers a practical way to allocate limited expensive evaluations across a portfolio of games. The global optimistic ranking is a clean algorithmic idea that could be adopted for agent benchmarking in any multi-task setting with high per-trial cost. The absence of covariance modeling or sensitivity checks, however, leaves open whether the gains survive the positive correlations that are typical when games share mechanics or heuristics.

major comments (2)

[Proposed Approach] The optimistic selection process (described after the multi-bandit formulation) constructs a global ranking from per-arm confidence intervals alone. Because many GVGAI and Ludii games share rulesets or agent heuristics, performance vectors are plausibly correlated; the manuscript supplies neither a covariance model nor a sensitivity analysis showing that the aggregate simple-regret objective remains approximately additive under such dependence. This assumption is load-bearing for the claimed reduction in average simple regret.
[Experimental Evaluation] The abstract and experimental section assert substantial improvement on GVGAI and Ludii yet report neither the number of independent trials, the statistical test used, the construction of the confidence intervals, nor any data-exclusion rules. Without these details the quantitative claims cannot be verified and the comparison to prior BAI algorithms remains inconclusive.

minor comments (2)

[Method] The definition of overall simple regret (sum or average across bandits) should be stated explicitly with an equation number so that readers can confirm how the per-arm influence term is derived.
[Related Work] A short paragraph contrasting the new ranking rule with the classic successive elimination or LUCB procedures would help readers see precisely where the optimism is introduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major point below with clarifications and indicate where revisions will be made to strengthen the work.

read point-by-point responses

Referee: [Proposed Approach] The optimistic selection process (described after the multi-bandit formulation) constructs a global ranking from per-arm confidence intervals alone. Because many GVGAI and Ludii games share rulesets or agent heuristics, performance vectors are plausibly correlated; the manuscript supplies neither a covariance model nor a sensitivity analysis showing that the aggregate simple-regret objective remains approximately additive under such dependence. This assumption is load-bearing for the claimed reduction in average simple regret.

Authors: We acknowledge that shared rulesets and heuristics in GVGAI and Ludii can induce correlations across games. Our approach explicitly models each game as an independent bandit and derives the global optimistic ranking from per-arm confidence intervals under this standard independence assumption, which is common when the objective is per-game agent selection rather than joint modeling. The empirical gains are measured directly on the real benchmarks, which embed any existing correlations. We will add a dedicated paragraph in the discussion section noting this modeling choice, its implications for aggregate regret, and outlining correlated-bandit extensions as future work. revision: partial
Referee: [Experimental Evaluation] The abstract and experimental section assert substantial improvement on GVGAI and Ludii yet report neither the number of independent trials, the statistical test used, the construction of the confidence intervals, nor any data-exclusion rules. Without these details the quantitative claims cannot be verified and the comparison to prior BAI algorithms remains inconclusive.

Authors: We appreciate this request for reproducibility details. In the revised experimental section we now state: results are averaged over 30 independent runs per method and domain; statistical significance between methods is assessed via paired t-tests (p < 0.05); the confidence intervals employed by the optimistic selector are Hoeffding bounds (Section 3.2); and no trials or data points were excluded. These additions, together with the already-reported average simple regret and error probability, enable direct verification of the comparisons. revision: yes

Circularity Check

0 steps flagged

No circularity: new optimistic ranking rule for multi-bandit best-arm identification is an independent algorithmic proposal

full rationale

The paper defines a novel optimistic selection procedure that ranks arms across independent bandits (games) using per-arm confidence intervals to target aggregate simple regret. This construction is presented as an original algorithm and is evaluated empirically against prior best-arm identification methods on the GVGAI and Ludii frameworks. No equation or claim reduces by construction to a fitted parameter, self-referential definition, or load-bearing self-citation; the reported regret and error-probability improvements are obtained from external benchmark comparisons rather than from the inputs themselves. The independence assumption across games is a modeling choice whose validity is separate from circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; the method appears to rest on standard multi-armed bandit assumptions plus the novel ranking rule. No explicit free parameters, axioms, or invented entities are named.

pith-pipeline@v0.9.0 · 5737 in / 1243 out tokens · 33918 ms · 2026-05-19T07:08:28.378186+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose an optimistic selection process based on a chosen confidence interval, that ranks each arm across all bandits in terms of their potential to influence our overall simple regret.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_strictMono_of_one_lt unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Compute (w−mk,w+mk)=Wilson(μ̂mk(t−1),Tmk(t−1),α) ... Draw I(t)∈argmaxm,kΔmk

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

[1]

Alan Agresti and Brent A. Coull. Approximate is better than ``exact'' for interval estimation of binomial proportions. The American Statistician, 52 0 (2): 0 119--126, 1998. ISSN 00031305. URL http://www.jstor.org/stable/2685469

work page arXiv 1998
[2]

Atari-5: distilling the arcade learning environment down to five games

Matthew Aitchison, Penny Sweetser, and Marcus Hutter. Atari-5: distilling the arcade learning environment down to five games. In Proceedings of the 40th International Conference on Machine Learning, ICML'23, 2023

work page 2023
[3]

Ensemble decision systems for general video game playing

Damien Anderson, Philip Rodgers, John Levine, Cristina Guerrero-Romero, and Diego Perez-Liebana. Ensemble decision systems for general video game playing. In 2019 IEEE Conference on Games (CoG), pages 1--8, 2019. doi:10.1109/CIG.2019.8848089

work page doi:10.1109/cig.2019.8848089 2019
[4]

General video game playing escapes the no free lunch theorem

Daniel Ashlock, Diego Perez-Liebana, and Amanda Saunders. General video game playing escapes the no free lunch theorem. In 2017 IEEE Conference on Computational Intelligence and Games (CIG), page 17–24. IEEE Press, 2017. doi:10.1109/CIG.2017.8080410. URL https://doi.org/10.1109/CIG.2017.8080410

work page doi:10.1109/cig.2017.8080410 2017
[5]

Best arm identification in multi-armed bandits

Jean-Yves Audibert, Sébastien Bubeck, and Remi Munos. Best arm identification in multi-armed bandits. In The 23rd Conference on Learning Theory, pages 41--53, 11 2010

work page 2010
[6]

Using confidence bounds for exploitation-exploration trade-offs

Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3: 0 397–422, 2003. ISSN 1532-4435

work page 2003
[7]

Finite-time Analysis of the Multiarmed Bandit Problem,

Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47: 0 235--256, 2002. doi:10.1023/A:1013689704352

work page doi:10.1023/a:1013689704352 2002
[8]

General game learning using knowledge transfer

Bikramjit Banerjee and Peter Stone. General game learning using knowledge transfer. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI'07, page 672–677, San Francisco, CA, USA, 2007. Morgan Kaufmann Publishers Inc

work page 2007
[9]

Matching games and algorithms for general video game playing

Philip Bontrager, Ahmed Khalifa, Andre Mendes, and Julian Togelius. Matching games and algorithms for general video game playing. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 12 0 (1): 0 122--128, Jun. 2021. doi:10.1609/aiide.v12i1.12884. URL https://ojs.aaai.org/index.php/AIIDE/article/view/12884

work page doi:10.1609/aiide.v12i1.12884 2021
[10]

Multiple identifications in multi-armed bandits

Sébastien Bubeck, Tengyao Wang, and Nitin Viswanathan. Multiple identifications in multi-armed bandits. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 258--265, Atlanta, Georgia, USA, 17--19 Jun 2013. PMLR. URL https://proc...

work page 2013
[11]

Roijers, and Sander C.J

Maarten de Waard, Diederik M. Roijers, and Sander C.J. Bakkes. Monte carlo tree search with options for general video game playing. In 2016 IEEE Conference on Computational Intelligence and Games (CIG), pages 1--8, 2016. doi:10.1109/CIG.2016.7860383

work page doi:10.1109/cig.2016.7860383 2016
[12]

Active learning for personalizing treatment

Kun Deng, Joelle Pineau, and Susan Murphy. Active learning for personalizing treatment. In 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pages 32--39, 2011. doi:10.1109/ADPRL.2011.5967348

work page doi:10.1109/adprl.2011.5967348 2011
[13]

Deep thought (hpc), 2021

Flinders University . Deep thought (hpc), 2021. URL https://deepthoughtdocs.flinders.edu.au/en/latest/

work page 2021
[14]

Andersen, Sebastian Risi, and Julian Togelius

Frederik Frydenberg, Kasper R. Andersen, Sebastian Risi, and Julian Togelius. Investigating mcts modifications in general video game playing. In 2015 IEEE Conference on Computational Intelligence and Games (CIG), pages 107--113, 2015. doi:10.1109/CIG.2015.7317937

work page doi:10.1109/cig.2015.7317937 2015
[15]

Multi-bandit best arm identification

Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric, and S\' e bastien Bubeck. Multi-bandit best arm identification. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011. URL https://proceedings.neurips.cc/paper_files/paper/201...

work page 2011
[16]

Gaina, Simon M

Raluca D. Gaina, Simon M. Lucas, and Diego Perez-Liebana. Rolling horizon evolution enhancements in general video game playing. In 2017 IEEE Conference on Computational Intelligence and Games (CIG), pages 88--95, 2017. doi:10.1109/CIG.2017.8080420

work page doi:10.1109/cig.2017.8080420 2017
[17]

General game playing: Overview of the aaai competition

Michael Genesereth, Nathaniel Love, and Barney Pell. General game playing: Overview of the aaai competition. AI Magazine, 26 0 (2): 0 62, Jun. 2005. doi:10.1609/aimag.v26i2.1813. URL https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/view/1813

work page doi:10.1609/aimag.v26i2.1813 2005
[18]

Deep reinforcement learning for general game playing

Adrian Goldwaser and Michael Thielscher. Deep reinforcement learning for general game playing. Proceedings of the AAAI Conference on Artificial Intelligence, 34 0 (02): 0 1701--1708, Apr. 2020. doi:10.1609/aaai.v34i02.5533. URL https://ojs.aaai.org/index.php/AAAI/article/view/5533

work page doi:10.1609/aaai.v34i02.5533 2020
[19]

Best-arm identification in correlated multi-armed bandits

Samarth Gupta, Gauri Joshi, and Osman Yağan. Best-arm identification in correlated multi-armed bandits. IEEE Journal on Selected Areas in Information Theory, 2: 0 549--563, 2021. URL https://api.semanticscholar.org/CorpusID:235476607

work page 2021
[20]

Hoffman, Bobak Shahriari, and Nando de Freitas

Matthew D. Hoffman, Bobak Shahriari, and Nando de Freitas. On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning. In International Conference on Artificial Intelligence and Statistics, 2014. URL https://api.semanticscholar.org/CorpusID:2482948

work page 2014
[21]

Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting

Kevin Jamieson and Robert Nowak. Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting. In 2014 48th Annual Conference on Information Sciences and Systems (CISS), pages 1--6, 2014. doi:10.1109/CISS.2014.6814096

work page doi:10.1109/ciss.2014.6814096 2014
[22]

Almost optimal exploration in multi-armed bandits

Zohar Karnin, Tomer Koren, and Oren Somekh. Almost optimal exploration in multi-armed bandits. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28(3) of Proceedings of Machine Learning Research, pages 1238--1246, Atlanta, Georgia, USA, 17--19 Jun 2013. PMLR. URL https://proceedi...

work page 2013
[23]

Abbas Kazerouni and Lawrence M. Wein. Best arm identification in generalized linear bandits. Operations Research Letters, 49 0 (3): 0 365--371, 2021. ISSN 0167-6377. doi:https://doi.org/10.1016/j.orl.2021.03.011. URL https://www.sciencedirect.com/science/article/pii/S0167637721000523

work page doi:10.1016/j.orl.2021.03.011 2021
[24]

Bandit based monte-carlo planning

Levente Kocsis and Csaba Szepesv \'a ri. Bandit based monte-carlo planning. In Johannes F \"u rnkranz, Tobias Scheffer, and Myra Spiliopoulou, editors, Machine Learning: ECML 2006, pages 282--293, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg. ISBN 978-3-540-46056-5

work page 2006
[25]

Hyper-heuristic general video game playing

Andre Mendes, Julian Togelius, and Andy Nealen. Hyper-heuristic general video game playing. In 2016 IEEE Conference on Computational Intelligence and Games (CIG), pages 1--8, 2016. doi:10.1109/CIG.2016.7860398

work page doi:10.1109/cig.2016.7860398 2016
[26]

Karthik Periyapattana Narayana Prasad, Srinivas Reddy Kota, and Vincent Y. F. Tan. Best restless markov arm identification. In 2022 IEEE Information Theory Workshop (ITW), pages 648--653, 2022. doi:10.1109/ITW54588.2022.9965908

work page doi:10.1109/itw54588.2022.9965908 2022
[27]

General video game ai: Competition, challenges and opportunities

Diego Perez-Liebana, Spyridon Samothrakis, Julian Togelius, Tom Schaul, and Simon Lucas. General video game ai: Competition, challenges and opportunities. Proceedings of the AAAI Conference on Artificial Intelligence, 30 0 (1), Mar. 2016. doi:10.1609/aaai.v30i1.9869. URL https://ojs.aaai.org/index.php/AAAI/article/view/9869

work page doi:10.1609/aaai.v30i1.9869 2016
[28]

Lucas, Raluca D

Diego Perez-Liebana, Simon M. Lucas, Raluca D. Gaina, Julian Togelius, Ahmed Khalifa, and Jialin Liu. General Video Game Artificial Intelligence. Morgan & Claypool Publishers, 2019. https://gaigresearch.github.io/gvgaibook/

work page 2019
[29]

Soemers, Matthew Stephenson, Chiara F

\'E ric Piette, Dennis J.N.J. Soemers, Matthew Stephenson, Chiara F. Sironi, Mark H.M. Winands, and Cameron Browne. Ludii - the ludemic general game system. In Giuseppe De Giacomo , Alejandro Catala, Bistra Dilkina, Michela Milano, Sen \'e n Barro, Alberto Bugarin, and J \'e r \^o me Lang, editors, ECAI 2020, Frontiers in Artificial Intelligence and Appli...

work page doi:10.3233/faia200120 2020
[30]

\'E ric Piette, Dennis J. N. J. Soemers, Matthew Stephenson, and Cameron Browne. The 2022 ludii ai competition. ICGA Journal, 45 0 (1): 0 16--27, November 2023. ISSN 1389-6911. doi:10.3233/ICG-230230

work page doi:10.3233/icg-230230 2022
[31]

Soemers, and Cameron Browne

Éric Piette, Matthew Stephenson, Dennis J.N.J. Soemers, and Cameron Browne. General board game concepts. In 2021 IEEE Conference on Games (CoG), pages 932--939, 2021. doi:10.1109/CoG52621.2021.9618990

work page doi:10.1109/cog52621.2021.9618990 2021
[32]

Diego Pérez-Liébana, Spyridon Samothrakis, Julian Togelius, Tom Schaul, and Simon M. Lucas. Analyzing the robustness of general video game playing agents. In 2016 IEEE Conference on Computational Intelligence and Games (CIG), pages 1--8, 2016. doi:10.1109/CIG.2016.7860430

work page doi:10.1109/cig.2016.7860430 2016
[33]

Sagers, M

D. Sagers, M. H. M. Winands, and D. J. N. J. Soemers. Anytime sequential halving in monte-carlo tree search. In M. Hartisch, C.-H. Hsueh, and J. Schaeffer, editors, Computers and Games 2024, volume 15550 of Lecture Notes in Computer Science, pages 91--102. Springer, Cham, 2025

work page 2024
[34]

Overlapping multi-bandit best arm identification

Jonathan Scarlett, Ilija Bogunovic, and Volkan Cevher. Overlapping multi-bandit best arm identification. In 2019 IEEE International Symposium on Information Theory (ISIT), pages 2544--2548, 2019. doi:10.1109/ISIT.2019.8849327

work page doi:10.1109/isit.2019.8849327 2019
[35]

From theories to queries: Active learning in practice

Burr Settles. From theories to queries: Active learning in practice. In Isabelle Guyon, Gavin Cawley, Gideon Dror, Vincent Lemaire, and Alexander Statnikov, editors, Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, volume 16 of Proceedings of Machine Learning Research, pages 1--18, Sardinia, Italy, 16 May 2011. PMLR. URL ...

work page 2010
[36]

Dennis J. N. J. Soemers, Chiara F. Sironi, Torsten Schuster, and Mark H. M. Winands. Enhancements for real-time M onte- C arlo tree search in general video game playing. In 2016 IEEE Conference on Computational Intelligence and Games (CIG), pages 436--443, 2016. doi:10.1109/CIG.2016.7860448

work page doi:10.1109/cig.2016.7860448 2016
[37]

Dennis J. N. J. Soemers, Vegard Mella, \'E ric Piette, Matthew Stephenson, Cameron Browne, and Olivier Teytaud. Towards a general transfer approach for policy-value networks. Transactions on Machine Learning Research, December 2023. ISSN 2835-8856

work page 2023
[38]

Creating a hyper-agent for solving angry birds levels

Matthew Stephenson and Jochen Renz. Creating a hyper-agent for solving angry birds levels. In AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE'17, pages 234--240, October 2017. URL https://aaai.org/ocs/index.php/AIIDE/AIIDE17/paper/view/15828

work page 2017
[39]

Soemers, and Cameron Browne

Matthew Stephenson, Éric Piette, Dennis J.N.J. Soemers, and Cameron Browne. An overview of the L udii general game system. In 2019 IEEE Conference on Games (CoG), pages 864--865, 2019. doi:10.1109/CIG.2019.8847949

work page doi:10.1109/cig.2019.8847949 2019
[40]

A continuous information gain measure to find the most discriminatory problems for ai benchmarking

Matthew Stephenson, Damien Anderson, Ahmed Khalifa, John Levine, Jochen Renz, Julian Togelius, and Christoph Salge. A continuous information gain measure to find the most discriminatory problems for ai benchmarking. In 2020 IEEE Congress on Evolutionary Computation (CEC), page 1–8. IEEE Press, 2020. doi:10.1109/CEC48606.2020.9185834. URL https://doi.org/1...

work page doi:10.1109/cec48606.2020.9185834 2020
[41]

Matthew Stephenson, Dennis J. N. J. Soemers, Éric Piette, and Cameron Browne. General game heuristic prediction based on ludeme descriptions. In 2021 IEEE Conference on Games (CoG), pages 878--881, 2021. doi:10.1109/CoG52621.2021.9619052

work page doi:10.1109/cog52621.2021.9619052 2021
[42]

Matthew Stephenson, Dennis J. N. J. Soemers, \'E ric Piette, and Cameron Browne. Measuring board game distance. In Cameron Browne, Akihiro Kishimoto, and Jonathan Schaeffer, editors, Computers and Games, pages 121--130, Cham, 2023. Springer Nature Switzerland. ISBN 978-3-031-34017-8

work page 2023
[43]

Deep reinforcement learning for general video game ai

Ruben Rodriguez Torrado, Philip Bontrager, Julian Togelius, Jialin Liu, and Diego P \'e rez-Li \'e bana. Deep reinforcement learning for general video game ai. 2018 IEEE Conference on Computational Intelligence and Games (CIG), pages 1--8, 2018. URL https://api.semanticscholar.org/CorpusID:46976902

work page 2018
[44]

Binomial confidence intervals and contingency tests: Mathematical fundamentals and the evaluation of alternative methods

Sean Wallis. Binomial confidence intervals and contingency tests: Mathematical fundamentals and the evaluation of alternative methods. Journal of Quantitative Linguistics, 20 0 (3): 0 178--208, 2013. doi:10.1080/09296174.2013.799918. URL https://doi.org/10.1080/09296174.2013.799918

work page doi:10.1080/09296174.2013.799918 2013
[45]

Bandit-based planning and learning in continuous-action markov decision processes

Ari Weinstein and Michael Littman. Bandit-based planning and learning in continuous-action markov decision processes. Proceedings of the International Conference on Automated Planning and Scheduling, 22 0 (1): 0 306--314, May 2012. doi:10.1609/icaps.v22i1.13507. URL https://ojs.aaai.org/index.php/ICAPS/article/view/13507

work page doi:10.1609/icaps.v22i1.13507 2012
[46]

Edwin B. Wilson. Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22 0 (158): 0 209--212, 1927. ISSN 01621459. URL http://www.jstor.org/stable/2276774

work page arXiv 1927

[1] [1]

Alan Agresti and Brent A. Coull. Approximate is better than ``exact'' for interval estimation of binomial proportions. The American Statistician, 52 0 (2): 0 119--126, 1998. ISSN 00031305. URL http://www.jstor.org/stable/2685469

work page arXiv 1998

[2] [2]

Atari-5: distilling the arcade learning environment down to five games

Matthew Aitchison, Penny Sweetser, and Marcus Hutter. Atari-5: distilling the arcade learning environment down to five games. In Proceedings of the 40th International Conference on Machine Learning, ICML'23, 2023

work page 2023

[3] [3]

Ensemble decision systems for general video game playing

Damien Anderson, Philip Rodgers, John Levine, Cristina Guerrero-Romero, and Diego Perez-Liebana. Ensemble decision systems for general video game playing. In 2019 IEEE Conference on Games (CoG), pages 1--8, 2019. doi:10.1109/CIG.2019.8848089

work page doi:10.1109/cig.2019.8848089 2019

[4] [4]

General video game playing escapes the no free lunch theorem

Daniel Ashlock, Diego Perez-Liebana, and Amanda Saunders. General video game playing escapes the no free lunch theorem. In 2017 IEEE Conference on Computational Intelligence and Games (CIG), page 17–24. IEEE Press, 2017. doi:10.1109/CIG.2017.8080410. URL https://doi.org/10.1109/CIG.2017.8080410

work page doi:10.1109/cig.2017.8080410 2017

[5] [5]

Best arm identification in multi-armed bandits

Jean-Yves Audibert, Sébastien Bubeck, and Remi Munos. Best arm identification in multi-armed bandits. In The 23rd Conference on Learning Theory, pages 41--53, 11 2010

work page 2010

[6] [6]

Using confidence bounds for exploitation-exploration trade-offs

Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3: 0 397–422, 2003. ISSN 1532-4435

work page 2003

[7] [7]

Finite-time Analysis of the Multiarmed Bandit Problem,

Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47: 0 235--256, 2002. doi:10.1023/A:1013689704352

work page doi:10.1023/a:1013689704352 2002

[8] [8]

General game learning using knowledge transfer

Bikramjit Banerjee and Peter Stone. General game learning using knowledge transfer. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI'07, page 672–677, San Francisco, CA, USA, 2007. Morgan Kaufmann Publishers Inc

work page 2007

[9] [9]

Matching games and algorithms for general video game playing

Philip Bontrager, Ahmed Khalifa, Andre Mendes, and Julian Togelius. Matching games and algorithms for general video game playing. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 12 0 (1): 0 122--128, Jun. 2021. doi:10.1609/aiide.v12i1.12884. URL https://ojs.aaai.org/index.php/AIIDE/article/view/12884

work page doi:10.1609/aiide.v12i1.12884 2021

[10] [10]

Multiple identifications in multi-armed bandits

Sébastien Bubeck, Tengyao Wang, and Nitin Viswanathan. Multiple identifications in multi-armed bandits. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 258--265, Atlanta, Georgia, USA, 17--19 Jun 2013. PMLR. URL https://proc...

work page 2013

[11] [11]

Roijers, and Sander C.J

Maarten de Waard, Diederik M. Roijers, and Sander C.J. Bakkes. Monte carlo tree search with options for general video game playing. In 2016 IEEE Conference on Computational Intelligence and Games (CIG), pages 1--8, 2016. doi:10.1109/CIG.2016.7860383

work page doi:10.1109/cig.2016.7860383 2016

[12] [12]

Active learning for personalizing treatment

Kun Deng, Joelle Pineau, and Susan Murphy. Active learning for personalizing treatment. In 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pages 32--39, 2011. doi:10.1109/ADPRL.2011.5967348

work page doi:10.1109/adprl.2011.5967348 2011

[13] [13]

Deep thought (hpc), 2021

Flinders University . Deep thought (hpc), 2021. URL https://deepthoughtdocs.flinders.edu.au/en/latest/

work page 2021

[14] [14]

Andersen, Sebastian Risi, and Julian Togelius

Frederik Frydenberg, Kasper R. Andersen, Sebastian Risi, and Julian Togelius. Investigating mcts modifications in general video game playing. In 2015 IEEE Conference on Computational Intelligence and Games (CIG), pages 107--113, 2015. doi:10.1109/CIG.2015.7317937

work page doi:10.1109/cig.2015.7317937 2015

[15] [15]

Multi-bandit best arm identification

Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric, and S\' e bastien Bubeck. Multi-bandit best arm identification. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011. URL https://proceedings.neurips.cc/paper_files/paper/201...

work page 2011

[16] [16]

Gaina, Simon M

Raluca D. Gaina, Simon M. Lucas, and Diego Perez-Liebana. Rolling horizon evolution enhancements in general video game playing. In 2017 IEEE Conference on Computational Intelligence and Games (CIG), pages 88--95, 2017. doi:10.1109/CIG.2017.8080420

work page doi:10.1109/cig.2017.8080420 2017

[17] [17]

General game playing: Overview of the aaai competition

Michael Genesereth, Nathaniel Love, and Barney Pell. General game playing: Overview of the aaai competition. AI Magazine, 26 0 (2): 0 62, Jun. 2005. doi:10.1609/aimag.v26i2.1813. URL https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/view/1813

work page doi:10.1609/aimag.v26i2.1813 2005

[18] [18]

Deep reinforcement learning for general game playing

Adrian Goldwaser and Michael Thielscher. Deep reinforcement learning for general game playing. Proceedings of the AAAI Conference on Artificial Intelligence, 34 0 (02): 0 1701--1708, Apr. 2020. doi:10.1609/aaai.v34i02.5533. URL https://ojs.aaai.org/index.php/AAAI/article/view/5533

work page doi:10.1609/aaai.v34i02.5533 2020

[19] [19]

Best-arm identification in correlated multi-armed bandits

Samarth Gupta, Gauri Joshi, and Osman Yağan. Best-arm identification in correlated multi-armed bandits. IEEE Journal on Selected Areas in Information Theory, 2: 0 549--563, 2021. URL https://api.semanticscholar.org/CorpusID:235476607

work page 2021

[20] [20]

Hoffman, Bobak Shahriari, and Nando de Freitas

Matthew D. Hoffman, Bobak Shahriari, and Nando de Freitas. On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning. In International Conference on Artificial Intelligence and Statistics, 2014. URL https://api.semanticscholar.org/CorpusID:2482948

work page 2014

[21] [21]

Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting

Kevin Jamieson and Robert Nowak. Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting. In 2014 48th Annual Conference on Information Sciences and Systems (CISS), pages 1--6, 2014. doi:10.1109/CISS.2014.6814096

work page doi:10.1109/ciss.2014.6814096 2014

[22] [22]

Almost optimal exploration in multi-armed bandits

Zohar Karnin, Tomer Koren, and Oren Somekh. Almost optimal exploration in multi-armed bandits. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28(3) of Proceedings of Machine Learning Research, pages 1238--1246, Atlanta, Georgia, USA, 17--19 Jun 2013. PMLR. URL https://proceedi...

work page 2013

[23] [23]

Abbas Kazerouni and Lawrence M. Wein. Best arm identification in generalized linear bandits. Operations Research Letters, 49 0 (3): 0 365--371, 2021. ISSN 0167-6377. doi:https://doi.org/10.1016/j.orl.2021.03.011. URL https://www.sciencedirect.com/science/article/pii/S0167637721000523

work page doi:10.1016/j.orl.2021.03.011 2021

[24] [24]

Bandit based monte-carlo planning

Levente Kocsis and Csaba Szepesv \'a ri. Bandit based monte-carlo planning. In Johannes F \"u rnkranz, Tobias Scheffer, and Myra Spiliopoulou, editors, Machine Learning: ECML 2006, pages 282--293, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg. ISBN 978-3-540-46056-5

work page 2006

[25] [25]

Hyper-heuristic general video game playing

Andre Mendes, Julian Togelius, and Andy Nealen. Hyper-heuristic general video game playing. In 2016 IEEE Conference on Computational Intelligence and Games (CIG), pages 1--8, 2016. doi:10.1109/CIG.2016.7860398

work page doi:10.1109/cig.2016.7860398 2016

[26] [26]

Karthik Periyapattana Narayana Prasad, Srinivas Reddy Kota, and Vincent Y. F. Tan. Best restless markov arm identification. In 2022 IEEE Information Theory Workshop (ITW), pages 648--653, 2022. doi:10.1109/ITW54588.2022.9965908

work page doi:10.1109/itw54588.2022.9965908 2022

[27] [27]

General video game ai: Competition, challenges and opportunities

Diego Perez-Liebana, Spyridon Samothrakis, Julian Togelius, Tom Schaul, and Simon Lucas. General video game ai: Competition, challenges and opportunities. Proceedings of the AAAI Conference on Artificial Intelligence, 30 0 (1), Mar. 2016. doi:10.1609/aaai.v30i1.9869. URL https://ojs.aaai.org/index.php/AAAI/article/view/9869

work page doi:10.1609/aaai.v30i1.9869 2016

[28] [28]

Lucas, Raluca D

Diego Perez-Liebana, Simon M. Lucas, Raluca D. Gaina, Julian Togelius, Ahmed Khalifa, and Jialin Liu. General Video Game Artificial Intelligence. Morgan & Claypool Publishers, 2019. https://gaigresearch.github.io/gvgaibook/

work page 2019

[29] [29]

Soemers, Matthew Stephenson, Chiara F

\'E ric Piette, Dennis J.N.J. Soemers, Matthew Stephenson, Chiara F. Sironi, Mark H.M. Winands, and Cameron Browne. Ludii - the ludemic general game system. In Giuseppe De Giacomo , Alejandro Catala, Bistra Dilkina, Michela Milano, Sen \'e n Barro, Alberto Bugarin, and J \'e r \^o me Lang, editors, ECAI 2020, Frontiers in Artificial Intelligence and Appli...

work page doi:10.3233/faia200120 2020

[30] [30]

\'E ric Piette, Dennis J. N. J. Soemers, Matthew Stephenson, and Cameron Browne. The 2022 ludii ai competition. ICGA Journal, 45 0 (1): 0 16--27, November 2023. ISSN 1389-6911. doi:10.3233/ICG-230230

work page doi:10.3233/icg-230230 2022

[31] [31]

Soemers, and Cameron Browne

Éric Piette, Matthew Stephenson, Dennis J.N.J. Soemers, and Cameron Browne. General board game concepts. In 2021 IEEE Conference on Games (CoG), pages 932--939, 2021. doi:10.1109/CoG52621.2021.9618990

work page doi:10.1109/cog52621.2021.9618990 2021

[32] [32]

Diego Pérez-Liébana, Spyridon Samothrakis, Julian Togelius, Tom Schaul, and Simon M. Lucas. Analyzing the robustness of general video game playing agents. In 2016 IEEE Conference on Computational Intelligence and Games (CIG), pages 1--8, 2016. doi:10.1109/CIG.2016.7860430

work page doi:10.1109/cig.2016.7860430 2016

[33] [33]

Sagers, M

D. Sagers, M. H. M. Winands, and D. J. N. J. Soemers. Anytime sequential halving in monte-carlo tree search. In M. Hartisch, C.-H. Hsueh, and J. Schaeffer, editors, Computers and Games 2024, volume 15550 of Lecture Notes in Computer Science, pages 91--102. Springer, Cham, 2025

work page 2024

[34] [34]

Overlapping multi-bandit best arm identification

Jonathan Scarlett, Ilija Bogunovic, and Volkan Cevher. Overlapping multi-bandit best arm identification. In 2019 IEEE International Symposium on Information Theory (ISIT), pages 2544--2548, 2019. doi:10.1109/ISIT.2019.8849327

work page doi:10.1109/isit.2019.8849327 2019

[35] [35]

From theories to queries: Active learning in practice

Burr Settles. From theories to queries: Active learning in practice. In Isabelle Guyon, Gavin Cawley, Gideon Dror, Vincent Lemaire, and Alexander Statnikov, editors, Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, volume 16 of Proceedings of Machine Learning Research, pages 1--18, Sardinia, Italy, 16 May 2011. PMLR. URL ...

work page 2010

[36] [36]

Dennis J. N. J. Soemers, Chiara F. Sironi, Torsten Schuster, and Mark H. M. Winands. Enhancements for real-time M onte- C arlo tree search in general video game playing. In 2016 IEEE Conference on Computational Intelligence and Games (CIG), pages 436--443, 2016. doi:10.1109/CIG.2016.7860448

work page doi:10.1109/cig.2016.7860448 2016

[37] [37]

Dennis J. N. J. Soemers, Vegard Mella, \'E ric Piette, Matthew Stephenson, Cameron Browne, and Olivier Teytaud. Towards a general transfer approach for policy-value networks. Transactions on Machine Learning Research, December 2023. ISSN 2835-8856

work page 2023

[38] [38]

Creating a hyper-agent for solving angry birds levels

Matthew Stephenson and Jochen Renz. Creating a hyper-agent for solving angry birds levels. In AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE'17, pages 234--240, October 2017. URL https://aaai.org/ocs/index.php/AIIDE/AIIDE17/paper/view/15828

work page 2017

[39] [39]

Soemers, and Cameron Browne

Matthew Stephenson, Éric Piette, Dennis J.N.J. Soemers, and Cameron Browne. An overview of the L udii general game system. In 2019 IEEE Conference on Games (CoG), pages 864--865, 2019. doi:10.1109/CIG.2019.8847949

work page doi:10.1109/cig.2019.8847949 2019

[40] [40]

A continuous information gain measure to find the most discriminatory problems for ai benchmarking

Matthew Stephenson, Damien Anderson, Ahmed Khalifa, John Levine, Jochen Renz, Julian Togelius, and Christoph Salge. A continuous information gain measure to find the most discriminatory problems for ai benchmarking. In 2020 IEEE Congress on Evolutionary Computation (CEC), page 1–8. IEEE Press, 2020. doi:10.1109/CEC48606.2020.9185834. URL https://doi.org/1...

work page doi:10.1109/cec48606.2020.9185834 2020

[41] [41]

Matthew Stephenson, Dennis J. N. J. Soemers, Éric Piette, and Cameron Browne. General game heuristic prediction based on ludeme descriptions. In 2021 IEEE Conference on Games (CoG), pages 878--881, 2021. doi:10.1109/CoG52621.2021.9619052

work page doi:10.1109/cog52621.2021.9619052 2021

[42] [42]

Matthew Stephenson, Dennis J. N. J. Soemers, \'E ric Piette, and Cameron Browne. Measuring board game distance. In Cameron Browne, Akihiro Kishimoto, and Jonathan Schaeffer, editors, Computers and Games, pages 121--130, Cham, 2023. Springer Nature Switzerland. ISBN 978-3-031-34017-8

work page 2023

[43] [43]

Deep reinforcement learning for general video game ai

Ruben Rodriguez Torrado, Philip Bontrager, Julian Togelius, Jialin Liu, and Diego P \'e rez-Li \'e bana. Deep reinforcement learning for general video game ai. 2018 IEEE Conference on Computational Intelligence and Games (CIG), pages 1--8, 2018. URL https://api.semanticscholar.org/CorpusID:46976902

work page 2018

[44] [44]

Binomial confidence intervals and contingency tests: Mathematical fundamentals and the evaluation of alternative methods

Sean Wallis. Binomial confidence intervals and contingency tests: Mathematical fundamentals and the evaluation of alternative methods. Journal of Quantitative Linguistics, 20 0 (3): 0 178--208, 2013. doi:10.1080/09296174.2013.799918. URL https://doi.org/10.1080/09296174.2013.799918

work page doi:10.1080/09296174.2013.799918 2013

[45] [45]

Bandit-based planning and learning in continuous-action markov decision processes

Ari Weinstein and Michael Littman. Bandit-based planning and learning in continuous-action markov decision processes. Proceedings of the International Conference on Automated Planning and Scheduling, 22 0 (1): 0 306--314, May 2012. doi:10.1609/icaps.v22i1.13507. URL https://ojs.aaai.org/index.php/ICAPS/article/view/13507

work page doi:10.1609/icaps.v22i1.13507 2012

[46] [46]

Edwin B. Wilson. Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22 0 (158): 0 209--212, 1927. ISSN 01621459. URL http://www.jstor.org/stable/2276774

work page arXiv 1927