pith. sign in

arxiv: 2507.00451 · v2 · submitted 2025-07-01 · 💻 cs.LG · cs.AI· cs.DS· cs.IT· math.IT· stat.ML

Best Agent Identification for General Game Playing

Pith reviewed 2026-05-19 07:08 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.DScs.ITmath.ITstat.ML
keywords best arm identificationmulti-armed banditsgeneral game playingsimple regretagent evaluationGVGAILudiimulti-task selection
0
0 comments X

The pith

An optimistic confidence-interval ranking across all games identifies the best agent per task while reducing overall simple regret.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats the problem of picking strong agents for many different games as a collection of best-arm identification tasks, one bandit per game. It introduces a global ranking rule that uses per-arm confidence intervals to prioritize selections likely to affect the total simple regret across the whole set of games. This produces lower average simple regret and lower error probability than earlier multi-armed-bandit algorithms when tested on the GVGAI and Ludii frameworks with limited trials. The approach directly improves how reliably we can evaluate and choose algorithms in any domain where each run is expensive.

Core claim

The central claim is that ranking every algorithm across every game by the width and location of its confidence interval, then always sampling the arm whose interval most influences the aggregate simple regret, yields a procedure that finds near-optimal agents faster and more reliably than previous best-arm methods.

What carries the argument

Optimistic selection process that ranks each arm across all bandits according to its potential to influence overall simple regret, using a chosen confidence interval.

If this is right

  • Fewer total agent evaluations are needed to reach a given level of identification accuracy across a suite of games.
  • Agent-selection procedures in GVGAI and Ludii become more accurate under a fixed budget of trials.
  • The same ranking logic applies to any multi-task domain where each algorithm evaluation is costly.
  • Simple regret and probability of error both drop measurably compared with earlier best-arm identification algorithms.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If games share latent structure, explicitly modeling those correlations could further tighten the confidence intervals and reduce regret still more.
  • The method could be combined with transfer-learning techniques that reuse data from one game to shrink intervals on related games.
  • In domains with continuous performance measures rather than win rates, the same optimistic ranking could be applied by replacing the confidence interval with an appropriate uncertainty estimate.

Load-bearing premise

Performance observations on different games are sufficiently independent that one global ranking rule based on per-arm confidence intervals can reliably minimize total simple regret without modeling cross-game correlations.

What would settle it

A controlled experiment on a set of games whose agent performances are known to be strongly correlated across tasks, showing that the method no longer reduces aggregate regret relative to baselines once those correlations are present.

Figures

Figures reproduced from arXiv: 2507.00451 by Alex Newcombe, Dennis Soemers, Eric Piette, Matthew Stephenson.

Figure 1
Figure 1. Figure 1: Average simple regret for each Best Arm Identification algorithm when applied [PITH_FULL_IMAGE:figures/full_fig_p014_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Average simple regret for each Best Arm Identification algorithm when applied to [PITH_FULL_IMAGE:figures/full_fig_p015_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Average simple regret for each Best Arm Identification algorithm when applied [PITH_FULL_IMAGE:figures/full_fig_p016_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Average simple regret for each GapE hyperparameter value when applied to the [PITH_FULL_IMAGE:figures/full_fig_p018_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Average simple regret for each GapE-V hyperparameter value when applied to the [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Average simple regret for each UCB-E hyperparameter value when applied to the [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Average simple regret for each Optimistic-WS hyperparameter value when applied [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Average simple regret for each GapE hyperparameter when applied to the GVGAI [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Average simple regret for each GapE-V hyperparameter when applied to the [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Average simple regret for each UCB-E hyperparameter when applied to the [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Average simple regret for each Optimistic-WS hyperparameter when applied to [PITH_FULL_IMAGE:figures/full_fig_p021_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Average simple regret for each GapE hyperparameter when applied to the Ludii [PITH_FULL_IMAGE:figures/full_fig_p022_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Average simple regret for each GapE-V hyperparameter when applied to the [PITH_FULL_IMAGE:figures/full_fig_p022_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Average simple regret for each UCB-E hyperparameter when applied to the Ludii [PITH_FULL_IMAGE:figures/full_fig_p023_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Average simple regret for each Optimistic-WS hyperparameter when applied to [PITH_FULL_IMAGE:figures/full_fig_p023_15.png] view at source ↗
read the original abstract

We present an efficient and generalised procedure to accurately identify the best (or near best) performing algorithm for each sub-task in a multi-problem domain. Our approach treats this as a set of best arm identification problems for multi-armed bandits, where each bandit corresponds to a specific task and each arm corresponds to a specific algorithm or agent. We propose an optimistic selection process based on a chosen confidence interval, that ranks each arm across all bandits in terms of their potential to influence our overall simple regret. We evaluate the performance of our approach on two of the most popular general game playing domains, the General Video Game AI (GVGAI) framework and the Ludii general game playing system, with the goal of selecting a high-performing agent for each game using a limited number of available trials. Compared to previous best arm identification algorithms for multi-armed bandits, our results demonstrate a substantial performance improvement in terms of average simple regret and average probability of error. This novel approach can be used to significantly improve the quality and accuracy of agent evaluation procedures for general game frameworks, as well as other multi-task domains with high algorithm runtimes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript frames best-agent identification across multiple games as a collection of best-arm identification problems, one bandit per game. It introduces an optimistic ranking procedure that uses per-arm confidence intervals to order trials by their estimated contribution to aggregate simple regret, then evaluates the procedure on GVGAI and Ludii, reporting lower average simple regret and lower error probability than prior BAI algorithms.

Significance. If the reported gains are statistically reliable, the method offers a practical way to allocate limited expensive evaluations across a portfolio of games. The global optimistic ranking is a clean algorithmic idea that could be adopted for agent benchmarking in any multi-task setting with high per-trial cost. The absence of covariance modeling or sensitivity checks, however, leaves open whether the gains survive the positive correlations that are typical when games share mechanics or heuristics.

major comments (2)
  1. [Proposed Approach] The optimistic selection process (described after the multi-bandit formulation) constructs a global ranking from per-arm confidence intervals alone. Because many GVGAI and Ludii games share rulesets or agent heuristics, performance vectors are plausibly correlated; the manuscript supplies neither a covariance model nor a sensitivity analysis showing that the aggregate simple-regret objective remains approximately additive under such dependence. This assumption is load-bearing for the claimed reduction in average simple regret.
  2. [Experimental Evaluation] The abstract and experimental section assert substantial improvement on GVGAI and Ludii yet report neither the number of independent trials, the statistical test used, the construction of the confidence intervals, nor any data-exclusion rules. Without these details the quantitative claims cannot be verified and the comparison to prior BAI algorithms remains inconclusive.
minor comments (2)
  1. [Method] The definition of overall simple regret (sum or average across bandits) should be stated explicitly with an equation number so that readers can confirm how the per-arm influence term is derived.
  2. [Related Work] A short paragraph contrasting the new ranking rule with the classic successive elimination or LUCB procedures would help readers see precisely where the optimism is introduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major point below with clarifications and indicate where revisions will be made to strengthen the work.

read point-by-point responses
  1. Referee: [Proposed Approach] The optimistic selection process (described after the multi-bandit formulation) constructs a global ranking from per-arm confidence intervals alone. Because many GVGAI and Ludii games share rulesets or agent heuristics, performance vectors are plausibly correlated; the manuscript supplies neither a covariance model nor a sensitivity analysis showing that the aggregate simple-regret objective remains approximately additive under such dependence. This assumption is load-bearing for the claimed reduction in average simple regret.

    Authors: We acknowledge that shared rulesets and heuristics in GVGAI and Ludii can induce correlations across games. Our approach explicitly models each game as an independent bandit and derives the global optimistic ranking from per-arm confidence intervals under this standard independence assumption, which is common when the objective is per-game agent selection rather than joint modeling. The empirical gains are measured directly on the real benchmarks, which embed any existing correlations. We will add a dedicated paragraph in the discussion section noting this modeling choice, its implications for aggregate regret, and outlining correlated-bandit extensions as future work. revision: partial

  2. Referee: [Experimental Evaluation] The abstract and experimental section assert substantial improvement on GVGAI and Ludii yet report neither the number of independent trials, the statistical test used, the construction of the confidence intervals, nor any data-exclusion rules. Without these details the quantitative claims cannot be verified and the comparison to prior BAI algorithms remains inconclusive.

    Authors: We appreciate this request for reproducibility details. In the revised experimental section we now state: results are averaged over 30 independent runs per method and domain; statistical significance between methods is assessed via paired t-tests (p < 0.05); the confidence intervals employed by the optimistic selector are Hoeffding bounds (Section 3.2); and no trials or data points were excluded. These additions, together with the already-reported average simple regret and error probability, enable direct verification of the comparisons. revision: yes

Circularity Check

0 steps flagged

No circularity: new optimistic ranking rule for multi-bandit best-arm identification is an independent algorithmic proposal

full rationale

The paper defines a novel optimistic selection procedure that ranks arms across independent bandits (games) using per-arm confidence intervals to target aggregate simple regret. This construction is presented as an original algorithm and is evaluated empirically against prior best-arm identification methods on the GVGAI and Ludii frameworks. No equation or claim reduces by construction to a fitted parameter, self-referential definition, or load-bearing self-citation; the reported regret and error-probability improvements are obtained from external benchmark comparisons rather than from the inputs themselves. The independence assumption across games is a modeling choice whose validity is separate from circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; the method appears to rest on standard multi-armed bandit assumptions plus the novel ranking rule. No explicit free parameters, axioms, or invented entities are named.

pith-pipeline@v0.9.0 · 5737 in / 1243 out tokens · 33918 ms · 2026-05-19T07:08:28.378186+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages

  1. [1]

    Alan Agresti and Brent A. Coull. Approximate is better than ``exact'' for interval estimation of binomial proportions. The American Statistician, 52 0 (2): 0 119--126, 1998. ISSN 00031305. URL http://www.jstor.org/stable/2685469

  2. [2]

    Atari-5: distilling the arcade learning environment down to five games

    Matthew Aitchison, Penny Sweetser, and Marcus Hutter. Atari-5: distilling the arcade learning environment down to five games. In Proceedings of the 40th International Conference on Machine Learning, ICML'23, 2023

  3. [3]

    Ensemble decision systems for general video game playing

    Damien Anderson, Philip Rodgers, John Levine, Cristina Guerrero-Romero, and Diego Perez-Liebana. Ensemble decision systems for general video game playing. In 2019 IEEE Conference on Games (CoG), pages 1--8, 2019. doi:10.1109/CIG.2019.8848089

  4. [4]

    General video game playing escapes the no free lunch theorem

    Daniel Ashlock, Diego Perez-Liebana, and Amanda Saunders. General video game playing escapes the no free lunch theorem. In 2017 IEEE Conference on Computational Intelligence and Games (CIG), page 17–24. IEEE Press, 2017. doi:10.1109/CIG.2017.8080410. URL https://doi.org/10.1109/CIG.2017.8080410

  5. [5]

    Best arm identification in multi-armed bandits

    Jean-Yves Audibert, Sébastien Bubeck, and Remi Munos. Best arm identification in multi-armed bandits. In The 23rd Conference on Learning Theory, pages 41--53, 11 2010

  6. [6]

    Using confidence bounds for exploitation-exploration trade-offs

    Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3: 0 397–422, 2003. ISSN 1532-4435

  7. [7]

    Finite-time Analysis of the Multiarmed Bandit Problem,

    Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47: 0 235--256, 2002. doi:10.1023/A:1013689704352

  8. [8]

    General game learning using knowledge transfer

    Bikramjit Banerjee and Peter Stone. General game learning using knowledge transfer. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI'07, page 672–677, San Francisco, CA, USA, 2007. Morgan Kaufmann Publishers Inc

  9. [9]

    Matching games and algorithms for general video game playing

    Philip Bontrager, Ahmed Khalifa, Andre Mendes, and Julian Togelius. Matching games and algorithms for general video game playing. Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, 12 0 (1): 0 122--128, Jun. 2021. doi:10.1609/aiide.v12i1.12884. URL https://ojs.aaai.org/index.php/AIIDE/article/view/12884

  10. [10]

    Multiple identifications in multi-armed bandits

    Sébastien Bubeck, Tengyao Wang, and Nitin Viswanathan. Multiple identifications in multi-armed bandits. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 258--265, Atlanta, Georgia, USA, 17--19 Jun 2013. PMLR. URL https://proc...

  11. [11]

    Roijers, and Sander C.J

    Maarten de Waard, Diederik M. Roijers, and Sander C.J. Bakkes. Monte carlo tree search with options for general video game playing. In 2016 IEEE Conference on Computational Intelligence and Games (CIG), pages 1--8, 2016. doi:10.1109/CIG.2016.7860383

  12. [12]

    Active learning for personalizing treatment

    Kun Deng, Joelle Pineau, and Susan Murphy. Active learning for personalizing treatment. In 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pages 32--39, 2011. doi:10.1109/ADPRL.2011.5967348

  13. [13]

    Deep thought (hpc), 2021

    Flinders University . Deep thought (hpc), 2021. URL https://deepthoughtdocs.flinders.edu.au/en/latest/

  14. [14]

    Andersen, Sebastian Risi, and Julian Togelius

    Frederik Frydenberg, Kasper R. Andersen, Sebastian Risi, and Julian Togelius. Investigating mcts modifications in general video game playing. In 2015 IEEE Conference on Computational Intelligence and Games (CIG), pages 107--113, 2015. doi:10.1109/CIG.2015.7317937

  15. [15]

    Multi-bandit best arm identification

    Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric, and S\' e bastien Bubeck. Multi-bandit best arm identification. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 24. Curran Associates, Inc., 2011. URL https://proceedings.neurips.cc/paper_files/paper/201...

  16. [16]

    Gaina, Simon M

    Raluca D. Gaina, Simon M. Lucas, and Diego Perez-Liebana. Rolling horizon evolution enhancements in general video game playing. In 2017 IEEE Conference on Computational Intelligence and Games (CIG), pages 88--95, 2017. doi:10.1109/CIG.2017.8080420

  17. [17]

    General game playing: Overview of the aaai competition

    Michael Genesereth, Nathaniel Love, and Barney Pell. General game playing: Overview of the aaai competition. AI Magazine, 26 0 (2): 0 62, Jun. 2005. doi:10.1609/aimag.v26i2.1813. URL https://ojs.aaai.org/aimagazine/index.php/aimagazine/article/view/1813

  18. [18]

    Deep reinforcement learning for general game playing

    Adrian Goldwaser and Michael Thielscher. Deep reinforcement learning for general game playing. Proceedings of the AAAI Conference on Artificial Intelligence, 34 0 (02): 0 1701--1708, Apr. 2020. doi:10.1609/aaai.v34i02.5533. URL https://ojs.aaai.org/index.php/AAAI/article/view/5533

  19. [19]

    Best-arm identification in correlated multi-armed bandits

    Samarth Gupta, Gauri Joshi, and Osman Yağan. Best-arm identification in correlated multi-armed bandits. IEEE Journal on Selected Areas in Information Theory, 2: 0 549--563, 2021. URL https://api.semanticscholar.org/CorpusID:235476607

  20. [20]

    Hoffman, Bobak Shahriari, and Nando de Freitas

    Matthew D. Hoffman, Bobak Shahriari, and Nando de Freitas. On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning. In International Conference on Artificial Intelligence and Statistics, 2014. URL https://api.semanticscholar.org/CorpusID:2482948

  21. [21]

    Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting

    Kevin Jamieson and Robert Nowak. Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting. In 2014 48th Annual Conference on Information Sciences and Systems (CISS), pages 1--6, 2014. doi:10.1109/CISS.2014.6814096

  22. [22]

    Almost optimal exploration in multi-armed bandits

    Zohar Karnin, Tomer Koren, and Oren Somekh. Almost optimal exploration in multi-armed bandits. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28(3) of Proceedings of Machine Learning Research, pages 1238--1246, Atlanta, Georgia, USA, 17--19 Jun 2013. PMLR. URL https://proceedi...

  23. [23]

    Abbas Kazerouni and Lawrence M. Wein. Best arm identification in generalized linear bandits. Operations Research Letters, 49 0 (3): 0 365--371, 2021. ISSN 0167-6377. doi:https://doi.org/10.1016/j.orl.2021.03.011. URL https://www.sciencedirect.com/science/article/pii/S0167637721000523

  24. [24]

    Bandit based monte-carlo planning

    Levente Kocsis and Csaba Szepesv \'a ri. Bandit based monte-carlo planning. In Johannes F \"u rnkranz, Tobias Scheffer, and Myra Spiliopoulou, editors, Machine Learning: ECML 2006, pages 282--293, Berlin, Heidelberg, 2006. Springer Berlin Heidelberg. ISBN 978-3-540-46056-5

  25. [25]

    Hyper-heuristic general video game playing

    Andre Mendes, Julian Togelius, and Andy Nealen. Hyper-heuristic general video game playing. In 2016 IEEE Conference on Computational Intelligence and Games (CIG), pages 1--8, 2016. doi:10.1109/CIG.2016.7860398

  26. [26]

    Karthik Periyapattana Narayana Prasad, Srinivas Reddy Kota, and Vincent Y. F. Tan. Best restless markov arm identification. In 2022 IEEE Information Theory Workshop (ITW), pages 648--653, 2022. doi:10.1109/ITW54588.2022.9965908

  27. [27]

    General video game ai: Competition, challenges and opportunities

    Diego Perez-Liebana, Spyridon Samothrakis, Julian Togelius, Tom Schaul, and Simon Lucas. General video game ai: Competition, challenges and opportunities. Proceedings of the AAAI Conference on Artificial Intelligence, 30 0 (1), Mar. 2016. doi:10.1609/aaai.v30i1.9869. URL https://ojs.aaai.org/index.php/AAAI/article/view/9869

  28. [28]

    Lucas, Raluca D

    Diego Perez-Liebana, Simon M. Lucas, Raluca D. Gaina, Julian Togelius, Ahmed Khalifa, and Jialin Liu. General Video Game Artificial Intelligence. Morgan & Claypool Publishers, 2019. https://gaigresearch.github.io/gvgaibook/

  29. [29]

    Soemers, Matthew Stephenson, Chiara F

    \'E ric Piette, Dennis J.N.J. Soemers, Matthew Stephenson, Chiara F. Sironi, Mark H.M. Winands, and Cameron Browne. Ludii - the ludemic general game system. In Giuseppe De Giacomo , Alejandro Catala, Bistra Dilkina, Michela Milano, Sen \'e n Barro, Alberto Bugarin, and J \'e r \^o me Lang, editors, ECAI 2020, Frontiers in Artificial Intelligence and Appli...

  30. [30]

    \'E ric Piette, Dennis J. N. J. Soemers, Matthew Stephenson, and Cameron Browne. The 2022 ludii ai competition. ICGA Journal, 45 0 (1): 0 16--27, November 2023. ISSN 1389-6911. doi:10.3233/ICG-230230

  31. [31]

    Soemers, and Cameron Browne

    Éric Piette, Matthew Stephenson, Dennis J.N.J. Soemers, and Cameron Browne. General board game concepts. In 2021 IEEE Conference on Games (CoG), pages 932--939, 2021. doi:10.1109/CoG52621.2021.9618990

  32. [32]

    Diego Pérez-Liébana, Spyridon Samothrakis, Julian Togelius, Tom Schaul, and Simon M. Lucas. Analyzing the robustness of general video game playing agents. In 2016 IEEE Conference on Computational Intelligence and Games (CIG), pages 1--8, 2016. doi:10.1109/CIG.2016.7860430

  33. [33]

    Sagers, M

    D. Sagers, M. H. M. Winands, and D. J. N. J. Soemers. Anytime sequential halving in monte-carlo tree search. In M. Hartisch, C.-H. Hsueh, and J. Schaeffer, editors, Computers and Games 2024, volume 15550 of Lecture Notes in Computer Science, pages 91--102. Springer, Cham, 2025

  34. [34]

    Overlapping multi-bandit best arm identification

    Jonathan Scarlett, Ilija Bogunovic, and Volkan Cevher. Overlapping multi-bandit best arm identification. In 2019 IEEE International Symposium on Information Theory (ISIT), pages 2544--2548, 2019. doi:10.1109/ISIT.2019.8849327

  35. [35]

    From theories to queries: Active learning in practice

    Burr Settles. From theories to queries: Active learning in practice. In Isabelle Guyon, Gavin Cawley, Gideon Dror, Vincent Lemaire, and Alexander Statnikov, editors, Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, volume 16 of Proceedings of Machine Learning Research, pages 1--18, Sardinia, Italy, 16 May 2011. PMLR. URL ...

  36. [36]

    Dennis J. N. J. Soemers, Chiara F. Sironi, Torsten Schuster, and Mark H. M. Winands. Enhancements for real-time M onte- C arlo tree search in general video game playing. In 2016 IEEE Conference on Computational Intelligence and Games (CIG), pages 436--443, 2016. doi:10.1109/CIG.2016.7860448

  37. [37]

    Dennis J. N. J. Soemers, Vegard Mella, \'E ric Piette, Matthew Stephenson, Cameron Browne, and Olivier Teytaud. Towards a general transfer approach for policy-value networks. Transactions on Machine Learning Research, December 2023. ISSN 2835-8856

  38. [38]

    Creating a hyper-agent for solving angry birds levels

    Matthew Stephenson and Jochen Renz. Creating a hyper-agent for solving angry birds levels. In AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE'17, pages 234--240, October 2017. URL https://aaai.org/ocs/index.php/AIIDE/AIIDE17/paper/view/15828

  39. [39]

    Soemers, and Cameron Browne

    Matthew Stephenson, Éric Piette, Dennis J.N.J. Soemers, and Cameron Browne. An overview of the L udii general game system. In 2019 IEEE Conference on Games (CoG), pages 864--865, 2019. doi:10.1109/CIG.2019.8847949

  40. [40]

    A continuous information gain measure to find the most discriminatory problems for ai benchmarking

    Matthew Stephenson, Damien Anderson, Ahmed Khalifa, John Levine, Jochen Renz, Julian Togelius, and Christoph Salge. A continuous information gain measure to find the most discriminatory problems for ai benchmarking. In 2020 IEEE Congress on Evolutionary Computation (CEC), page 1–8. IEEE Press, 2020. doi:10.1109/CEC48606.2020.9185834. URL https://doi.org/1...

  41. [41]

    Matthew Stephenson, Dennis J. N. J. Soemers, Éric Piette, and Cameron Browne. General game heuristic prediction based on ludeme descriptions. In 2021 IEEE Conference on Games (CoG), pages 878--881, 2021. doi:10.1109/CoG52621.2021.9619052

  42. [42]

    Matthew Stephenson, Dennis J. N. J. Soemers, \'E ric Piette, and Cameron Browne. Measuring board game distance. In Cameron Browne, Akihiro Kishimoto, and Jonathan Schaeffer, editors, Computers and Games, pages 121--130, Cham, 2023. Springer Nature Switzerland. ISBN 978-3-031-34017-8

  43. [43]

    Deep reinforcement learning for general video game ai

    Ruben Rodriguez Torrado, Philip Bontrager, Julian Togelius, Jialin Liu, and Diego P \'e rez-Li \'e bana. Deep reinforcement learning for general video game ai. 2018 IEEE Conference on Computational Intelligence and Games (CIG), pages 1--8, 2018. URL https://api.semanticscholar.org/CorpusID:46976902

  44. [44]

    Binomial confidence intervals and contingency tests: Mathematical fundamentals and the evaluation of alternative methods

    Sean Wallis. Binomial confidence intervals and contingency tests: Mathematical fundamentals and the evaluation of alternative methods. Journal of Quantitative Linguistics, 20 0 (3): 0 178--208, 2013. doi:10.1080/09296174.2013.799918. URL https://doi.org/10.1080/09296174.2013.799918

  45. [45]

    Bandit-based planning and learning in continuous-action markov decision processes

    Ari Weinstein and Michael Littman. Bandit-based planning and learning in continuous-action markov decision processes. Proceedings of the International Conference on Automated Planning and Scheduling, 22 0 (1): 0 306--314, May 2012. doi:10.1609/icaps.v22i1.13507. URL https://ojs.aaai.org/index.php/ICAPS/article/view/13507

  46. [46]

    Edwin B. Wilson. Probable inference, the law of succession, and statistical inference. Journal of the American Statistical Association, 22 0 (158): 0 209--212, 1927. ISSN 01621459. URL http://www.jstor.org/stable/2276774