pith. sign in

arxiv: 1907.06508 · v1 · pith:FYDARDLKnew · submitted 2019-07-11 · 💻 cs.AI · cs.LG· stat.ML

General Board Game Playing for Education and Research in Generic AI Game Learning

Pith reviewed 2026-05-24 23:19 UTC · model grok-4.3

classification 💻 cs.AI cs.LGstat.ML
keywords general board game playingTD(lambda)-n-tuple agentgame learning frameworkgeneric AI agentsMCTS comparisonboard game competitionsAI education
0
0 comments X

The pith

A framework called GBG standardizes board game interfaces so a generic TD(λ)-n-tuple agent can play arbitrary games and outperform MCTS.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GBG as a framework that supplies common interfaces for board games, their states, and AI agents. This standardization removes repetitive coding and enables direct competitions among agents across many different games. GBG makes a TD(λ)-n-tuple learning agent usable on any 1- to N-player board game without game-specific rewrites. Tests on several games show this agent beating other generic methods such as MCTS. The work targets both teaching students game learning and building research benchmarks for agent generalization.

Core claim

GBG defines the common interfaces for board games, game states and their AI agents. It allows one to run competitions of different agents on different games. It standardizes those parts of board game playing and learning that otherwise would be tedious and repetitive parts in coding. GBG is suitable for arbitrary 1-, 2-, ..., N-player board games. It makes a generic TD(λ)-n-tuple agent for the first time available to arbitrary games. On various games, TD(λ)-n-tuple is found to be superior to other generic agents like MCTS.

What carries the argument

The GBG framework, which supplies standardized interfaces for games and agents, together with the TD(λ)-n-tuple agent applied in its generic form.

If this is right

  • Students gain quicker entry into game-learning projects because common coding tasks are already handled by the framework.
  • Researchers obtain a shared collection of games and agents for running standardized strength and generalization tests.
  • New agents can be added and compared directly on the same set of games without rewriting game-specific code each time.
  • The TD(λ)-n-tuple method becomes a reusable baseline that works across one-player, two-player, and multi-player settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the framework succeeds, it could support systematic testing of whether learning methods transfer across game families that differ in branching factor or state representation.
  • The same interface layer might later allow agents trained on one game family to initialize learning on another without full retraining from scratch.
  • Educational modules built on GBG could let beginners measure how changes in learning parameters affect performance across multiple games in a single session.

Load-bearing premise

That the TD(λ)-n-tuple agent can be applied to arbitrary games while keeping comparisons fair to baselines like MCTS without hidden game-specific optimizations.

What would settle it

A controlled test on a previously unused board game in which the generic TD(λ)-n-tuple implementation is matched against MCTS and fails to show superiority under identical resource limits.

Figures

Figures reproduced from arXiv: 1907.06508 by Wolfgang Konen.

Figure 1
Figure 1. Figure 1: Expectimax-N tree for N-player games. Expectimax-N is a generalization of Max-N [19] for nondeterministic games. Shown is an example for N = 2 and depth d = 3. A node contains a tuple of game values for each player. The first level maximizes the tuple entry of the player to move (here: 1st player), the second level calculates the expectation value of all child nodes (grey circles), each having a certain pr… view at source ↗
Figure 2
Figure 2. Figure 2: (a) Hex gameboard example: The numbers and the color coding in the cells shows the agent’s game values for the last move decision (White’s [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: 5x5 Hex: Training curves for TD-n-tuple agents with 25 random 6- [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
read the original abstract

We present a new general board game (GBG) playing and learning framework. GBG defines the common interfaces for board games, game states and their AI agents. It allows one to run competitions of different agents on different games. It standardizes those parts of board game playing and learning that otherwise would be tedious and repetitive parts in coding. GBG is suitable for arbitrary 1-, 2-, ..., N-player board games. It makes a generic TD($\lambda$)-n-tuple agent for the first time available to arbitrary games. On various games, TD($\lambda$)-n-tuple is found to be superior to other generic agents like MCTS. GBG aims at the educational perspective, where it helps students to start faster in the area of game learning. GBG aims as well at the research perspective by collecting a growing set of games and AI agents to assess their strengths and generalization capabilities in meaningful competitions. Initial successful educational and research results are reported.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces the GBG framework defining common interfaces for board games, game states, and AI agents to standardize coding and enable competitions across arbitrary 1- to N-player games. It provides a generic TD(λ)-n-tuple agent for the first time applicable to arbitrary games and reports that this agent is superior to other generic agents such as MCTS on various games. The work is positioned for educational use to help students enter game learning faster and for research to collect games and agents for assessing generalization via competitions, with initial successful results claimed.

Significance. If the empirical superiority claims can be substantiated with detailed, controlled, and reproducible experiments, the GBG framework would provide a useful standardization layer for general game playing research and a practical educational resource. The availability of a truly generic TD(λ)-n-tuple implementation is a constructive contribution to the field if comparisons remain fair and non-optimized per game.

major comments (2)
  1. [Abstract / Results] Abstract and results presentation: The central claim that TD(λ)-n-tuple is superior to MCTS and other generic agents on various games is load-bearing but unsupported by any reported details on the number of games tested, experimental setup, parameter choices, number of independent runs, statistical significance tests, or controls ensuring no game-specific optimizations were applied to the TD agent (while presumably none were applied to baselines). This absence prevents verification of the weakest assumption that comparisons are fair and generic.
  2. [Framework / Agent Implementation] Framework description: No explicit discussion or pseudocode is provided on how the TD(λ)-n-tuple agent achieves full genericity (e.g., automatic feature extraction or state representation for arbitrary board geometries and player counts) without hidden per-game engineering that would undermine the superiority claim relative to MCTS.
minor comments (1)
  1. [Abstract] The abstract states 'initial successful educational and research results are reported' but the manuscript provides no concrete examples, student outcomes, or competition results to illustrate these.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We address each major comment below and indicate the revisions planned for the manuscript.

read point-by-point responses
  1. Referee: [Abstract / Results] Abstract and results presentation: The central claim that TD(λ)-n-tuple is superior to MCTS and other generic agents on various games is load-bearing but unsupported by any reported details on the number of games tested, experimental setup, parameter choices, number of independent runs, statistical significance tests, or controls ensuring no game-specific optimizations were applied to the TD agent (while presumably none were applied to baselines). This absence prevents verification of the weakest assumption that comparisons are fair and generic.

    Authors: We agree that the presentation of experimental details can be strengthened for better reproducibility. The manuscript reports results across multiple games, but we will revise the results section (and update the abstract accordingly) to explicitly state the number of games evaluated, the full experimental protocol including parameter values for both TD(λ)-n-tuple and MCTS, the number of independent runs, and the statistical tests performed. We will also add an explicit statement confirming that the TD agent used only the generic implementation with no per-game tuning, matching the treatment of the MCTS baseline. These changes will be incorporated in the revised manuscript. revision: yes

  2. Referee: [Framework / Agent Implementation] Framework description: No explicit discussion or pseudocode is provided on how the TD(λ)-n-tuple agent achieves full genericity (e.g., automatic feature extraction or state representation for arbitrary board geometries and player counts) without hidden per-game engineering that would undermine the superiority claim relative to MCTS.

    Authors: The GBG framework section describes the standardized interfaces that support genericity, but we accept that an explicit walkthrough of the TD(λ)-n-tuple implementation would clarify how automatic feature extraction and state handling occur for arbitrary boards and player counts. In the revision we will add a new subsection containing pseudocode (or a detailed algorithmic description) that shows the generic mechanisms, including how n-tuples are constructed from the board state representation without game-specific engineering. This addition will directly address the concern about hidden per-game modifications. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical framework (GBG) and reports comparative results showing TD(λ)-n-tuple superiority on various games. No derivation chain, first-principles predictions, or equations are claimed; the central claim rests on experimental outcomes rather than any reduction of outputs to fitted inputs, self-definitions, or self-citation chains. The architecture is described as generic with standardized interfaces, and comparisons to baselines like MCTS are presented as direct empirical tests without evidence of the result being forced by construction. This is a standard non-circular empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Framework paper with no mathematical derivations, free parameters, or invented entities; relies on standard assumptions about board game rules and reinforcement learning methods.

pith-pipeline@v0.9.0 · 5696 in / 1027 out tokens · 18708 ms · 2026-05-24T23:19:14.943120+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 3 internal anchors

  1. [1]

    Mastering the game of Go without human knowledge,

    D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al. , “Mastering the game of Go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, 2017

  2. [2]

    Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

    D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel et al. , “Mastering chess and shogi by self-play with a general reinforcement learning algorithm,” arXiv preprint arXiv:1712.01815 , 2017

  3. [3]

    A strategic METAGAME player for general chess-like games,

    B. Pell, “A strategic METAGAME player for general chess-like games,” Computational Intelligence, vol. 12, no. 1, pp. 177–198, 1996

  4. [4]

    General game playing,

    M. Genesereth and M. Thielscher, “General game playing,” Synthesis Lectures on Artificial Intelligence and Machine Learning , vol. 8, no. 2, pp. 1–229, 2014

  5. [5]

    Generic heuristic approach to general game playing,

    J. Ma ´ndziuk and M. ´Swiechowski, “Generic heuristic approach to general game playing,” in International Conference on Current Trends in Theory and Practice of Computer Science . Springer, 2012, pp. 649– 660

  6. [6]

    General game playing: Overview of the AAAI competition,

    M. Genesereth, N. Love, and B. Pell, “General game playing: Overview of the AAAI competition,” AI magazine , vol. 26, no. 2, p. 62, 2005. [Online]. Available: https://www.aaai.org/ojs/index.php/ aimagazine/article/view/1813

  7. [7]

    Gen- eral game playing: Game description language specification,

    N. Love, T. Hinrichs, D. Haley, E. Schkufza, and M. Genesereth, “Gen- eral game playing: Game description language specification,” Stanford Logic Group Computer Science Department, Stanford University, Tech. Rep., 2008

  8. [8]

    A general game description language for incomplete information games,

    M. Thielscher, “A general game description language for incomplete information games,” in Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010

  9. [9]

    Recent ad- vances in general game playing,

    M. ´Swiechowski, H. Park, J. Ma ´ndziuk, and K.-J. Kim, “Recent ad- vances in general game playing,”The Scientific World Journal, vol. 2015, 2015

  10. [10]

    Neural networks for state evaluation in general game playing,

    D. Michulke and M. Thielscher, “Neural networks for state evaluation in general game playing,” in Joint European Conference on Machine Learning and Knowledge Discovery in Databases . Springer, 2009, pp. 95–110

  11. [11]

    Neural networks for high-resolution state evaluation in general game playing,

    D. Michulke, “Neural networks for high-resolution state evaluation in general game playing,” in Proceedings of the IJCAI-11 Workshop on General Game Playing (GIGA11) . Citeseer, 2011, pp. 31–37

  12. [12]

    Testing general game players against a simplified boardgames player using temporal-difference learning,

    J. Kowalski and A. Kisielewicz, “Testing general game players against a simplified boardgames player using temporal-difference learning,” in Evolutionary Computation (CEC), 2015 IEEE Congress on . IEEE, 2015, pp. 1466–1473

  13. [13]

    General video game playing,

    J. Levine, C. B. Congdon, M. Ebner, G. Kendall, S. M. Lucas, R. Mi- ikkulainen, T. Schaul, and T. Thompson, “General video game playing,” Schloss Dagstuhl–Leibniz-Zentrum f ¨ur Informatik, Tech. Rep., 2013

  14. [14]

    Adversarial hierarchical-task network plan- ning for complex real-time games,

    S. Ontan ´on and M. Buro, “Adversarial hierarchical-task network plan- ning for complex real-time games,” in Twenty-Fourth International Joint Conference on Artificial Intelligence . AAAI Press, 2015, pp. 1652– 1658

  15. [15]

    Combining Strategic Learning and Tactical Search in Real-Time Strategy Games

    N. A. Barriga, M. Stanescu, and M. Buro, “Combining strategic learn- ing and tactical search in real-time strategy games,” arXiv preprint arXiv:1709.03480, 2017

  16. [16]

    OpenAI Gym

    G. Brockman, V . Cheung, L. Pettersson, J. Schneider, J. Schul- man, J. Tang, and W. Zaremba, “OpenAI Gym,” arXiv preprint arXiv:1606.01540, 2016

  17. [17]

    The GBG class interface tutorial: General board game playing and learning,

    W. Konen, “The GBG class interface tutorial: General board game playing and learning,” Research Center CIOP (Computational Intelligence, Optimization and Data Mining), TH K ¨oln – Cologne University of Applied Sciences, Tech. Rep., 2017. [Online]. Available: http://www.gm.fh-koeln.de/ciopwebpub/Kone17a.d/TR-GBG.pdf

  18. [18]

    The GBG class interface tutorial V2.0: General board game playing and learning,

    ——, “The GBG class interface tutorial V2.0: General board game playing and learning,” Research Center CIOP (Computational Intelligence, Optimization and Data Mining), TH K ¨oln – Cologne University of Applied Sciences, Tech. Rep., 2019. [Online]. Available: http://www.gm.fh-koeln.de/ciopwebpub/Kone19a.d/TR-GBG.pdf

  19. [19]

    Multi-player alpha-beta pruning,

    R. E. Korf, “Multi-player alpha-beta pruning,” Artificial Intelligence , vol. 48, no. 1, pp. 99–111, 1991

  20. [20]

    Learning to play Othello with n-tuple systems,

    S. M. Lucas, “Learning to play Othello with n-tuple systems,” Australian Journal of Intelligent Information Processing , vol. 4, pp. 1–20, 2008

  21. [21]

    Temporal coherence and prediction decay in TD learning,

    D. F. Beal and M. C. Smith, “Temporal coherence and prediction decay in TD learning,” in Int. Joint Conf. on Artificial Intelligence (IJCAI) , T. Dean, Ed. Morgan Kaufmann, 1999, pp. 564–569

  22. [22]

    A survey of Monte Carlo tree search methods,

    C. B. Browne, E. Powley, D. Whitehouse, S. M. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis, and S. Colton, “A survey of Monte Carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in Games , vol. 4, no. 1, pp. 1–43, 2012

  23. [23]

    KI-Agenten f ¨ur das Spiel 2048: Untersuchung von Lernalgorithmen f ¨ur nichtdeterministische Spiele,

    J. Kutsch, “KI-Agenten f ¨ur das Spiel 2048: Untersuchung von Lernalgorithmen f ¨ur nichtdeterministische Spiele,” 2017, Bachelor thesis, TH K ¨oln – University of Applied Sciences. [Online]. Available: http://www.gm.fh-koeln.de/ciopwebpub/Kutsch17.d/Kutsch17.pdf

  24. [24]

    R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998

  25. [25]

    Reinforcement learning for board games: The temporal difference algorithm,

    W. Konen, “Reinforcement learning for board games: The temporal difference algorithm,” Research Center CIOP (Computational Intelligence, Optimization and Data Mining), TH K ¨oln – Cologne University of Applied Sciences, Tech. Rep., 2015. [Online]. Available: http://www.gm.fh-koeln.de/ciopwebpub/Kone15c. d/TR-TDgame EN.pdf

  26. [26]

    Temporal difference learning with eligibility traces for the game Connect-4,

    M. Thill, S. Bagheri, P. Koch, and W. Konen, “Temporal difference learning with eligibility traces for the game Connect-4,” in CIG’2014, International Conference on Computational Intelligence in Games, Dortmund, M. Preuss and G. Rudolph, Eds., 2014

  27. [27]

    Online adaptable learning rates for the game Connect-4,

    S. Bagheri, M. Thill, P. Koch, and W. Konen, “Online adaptable learning rates for the game Connect-4,” IEEE Transactions on Computational Intelligence and AI in Games , vol. 8, no. 1, pp. 33–42, 2015

  28. [28]

    Mastering 2048 with delayed temporal coherence learn- ing, multistage weight promotion, redundant encoding, and carousel shaping,

    W. Ja ´skowski, “Mastering 2048 with delayed temporal coherence learn- ing, multistage weight promotion, redundant encoding, and carousel shaping,” IEEE Transactions on Games , vol. 10, no. 1, pp. 3–14, 2018

  29. [29]

    Nim, a game with a complete mathematical theory,

    C. L. Bouton, “Nim, a game with a complete mathematical theory,” Annals of Mathematics , vol. 3, no. 1/4, pp. 35–39, 1901

  30. [30]

    Cirulli, 2014

    G. Cirulli, 2014. [Online]. Available: http://gabrielecirulli.github.io/2048

  31. [31]

    Selbstlernende Agenten f ¨ur das skalierbare Spiel Hex: Untersuchung verschiedener KI-Verfahren im GBG-Framework,

    K. Galitzki, “Selbstlernende Agenten f ¨ur das skalierbare Spiel Hex: Untersuchung verschiedener KI-Verfahren im GBG-Framework,” 2017, Bachelor thesis, TH K ¨oln – Cologne University of Applied Sciences. [Online]. Available: http://www.gm.fh-koeln.de/ciopwebpub/Galitzki17. d/Galitz17.pdf

  32. [32]

    A hierarchical approach to computer Hex,

    V . V . Anshelevich, “A hierarchical approach to computer Hex,”Artificial Intelligence, vol. 134, no. 1-2, pp. 101–120, 2002

  33. [33]

    Monte Carlo tree search in Hex,

    B. Arneson, R. B. Hayward, and P. Henderson, “Monte Carlo tree search in Hex,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 2, no. 4, pp. 251–258, 2010. 8