pith. sign in

arxiv: 2604.21896 · v1 · submitted 2026-04-23 · 💻 cs.AI

Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models

Pith reviewed 2026-05-09 21:25 UTC · model grok-4.3

classification 💻 cs.AI
keywords large language modelsAI agentsgame theoryself-programminginteractive learningreinforcement learningstrategic AIcrowdsourced data
0
0 comments X

The pith

Nemobot lets large language models create self-improving AI game agents in four categories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces Nemobot, a platform for building game agents powered by large language models that can adapt and improve their strategies. It covers dictionary-based games where state mappings are compressed, solvable games using math for optimal play with explanations, heuristic games blending algorithms and crowd data, and learning games using reinforcement with feedback. If true, this would mean AI can iteratively refine its logic using human creativity and crowdsourced insights, making strategic AI more accessible and dynamic. Readers should care because it bridges traditional game theory with modern language models toward self-programming systems.

Core claim

The central claim is that Nemobot provides an interactive environment where LLM-powered agents demonstrate self-programming by integrating crowdsourced learning and human creativity to refine their own logic. For different game classes, the agents compress mappings for adaptability, compute optimal strategies with explanations, synthesize from minimax and crowd data, and refine via reinforcement learning and self-critique. This operationalizes an extended version of Shannon's game-playing machine taxonomy and represents progress toward self-programming AI.

What carries the argument

Nemobot, the agentic engineering environment that uses LLMs for tool-augmented generation and fine-tuning to enable agents to self-program strategies in games.

If this is right

  • Agents can rapidly adapt to dictionary-based games by generalizing from compressed models.
  • Optimal strategies in solvable games come with human-readable decision explanations.
  • Strategies in heuristic games improve by merging classical algorithms with crowd insights.
  • Learning-based games see iterative refinement through human feedback and self-critique.
  • Users gain a programmable space to experiment with and deploy these strategic agents.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the approach scales, it could enable similar self-refinement in non-game domains requiring strategy like negotiations or resource allocation.
  • Combining LLMs with traditional algorithms might lower the barrier for creating sophisticated AI systems without deep coding expertise.
  • Further tests on real-world interactions could show how crowdsourcing accelerates strategy evolution beyond simulated games.

Load-bearing premise

Large language models are capable of reliably compressing complex state-action mappings, performing accurate mathematical reasoning, synthesizing effective strategies from mixed sources, and self-improving through feedback without introducing critical errors.

What would settle it

A demonstration where the Nemobot agents fail to improve or produce suboptimal strategies in a learning-based game after multiple rounds of self-critique and human feedback would disprove the self-programming capability.

Figures

Figures reproduced from arXiv: 2604.21896 by Chee Wei Tan, Shangxin Guo, Yuchen Wang.

Figure 1
Figure 1. Figure 1: Crowdsourcing and strategy optimization in game [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Shannon’s four types of game-playing machines and their typical examples of applications. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: System Flow of Nemobot. Students program strategic [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: UI of Nemobot. (a)&(b) Programs written on the coding pad are synchronously executed and rendered on the chat [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The chatbot interface of the Mancala game. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Training process illustrating the number of game trials [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
read the original abstract

This paper introduces a new paradigm for AI game programming, leveraging large language models (LLMs) to extend and operationalize Claude Shannon's taxonomy of game-playing machines. Central to this paradigm is Nemobot, an interactive agentic engineering environment that enables users to create, customize, and deploy LLM-powered game agents while actively engaging with AI-driven strategies. The LLM-based chatbot, integrated within Nemobot, demonstrates its capabilities across four distinct classes of games. For dictionary-based games, it compresses state-action mappings into efficient, generalized models for rapid adaptability. In rigorously solvable games, it employs mathematical reasoning to compute optimal strategies and generates human-readable explanations for its decisions. For heuristic-based games, it synthesizes strategies by combining insights from classical minimax algorithms (see, e.g., shannon1950chess) with crowd-sourced data. Finally, in learning-based games, it utilizes reinforcement learning with human feedback and self-critique to iteratively refine strategies through trial-and-error and imitation learning. Nemobot amplifies this framework by offering a programmable environment where users can experiment with tool-augmented generation and fine-tuning of strategic game agents. From strategic games to role-playing games, Nemobot demonstrates how AI agents can achieve a form of self-programming by integrating crowdsourced learning and human creativity to iteratively refine their own logic. This represents a step toward the long-term goal of self-programming AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces Nemobot, an interactive LLM-powered agentic environment for creating and customizing strategic game agents. It extends Shannon's game taxonomy by claiming that LLMs can (1) compress state-action mappings for dictionary games, (2) perform mathematical reasoning to derive optimal strategies and explanations in solvable games, (3) synthesize minimax algorithms with crowd-sourced data for heuristic games, and (4) apply reinforcement learning with human feedback and self-critique for iterative refinement in learning games. The central claim is that this framework enables a form of self-programming AI through integration of crowdsourced learning and human creativity.

Significance. If the claimed LLM behaviors were empirically validated with metrics, error rates, and comparisons, the work could offer a useful programmable testbed for studying interactive agent refinement and human-AI co-creation of strategies. The absence of any experimental results, datasets, success rates, or ablation studies means the significance remains speculative.

major comments (3)
  1. [Abstract] Abstract: The statement that the LLM 'demonstrates its capabilities across four distinct classes of games' is unsupported; the manuscript contains no experimental results, success rates, error analyses, ground-truth comparisons, or case studies for any of the four classes (dictionary-based compression, mathematical reasoning in solvable games, minimax+crowd synthesis, or RL+self-critique).
  2. [Learning-based games section] Description of learning-based games (final paragraph before conclusion): The claim that the system 'utilizes reinforcement learning with human feedback and self-critique to iteratively refine strategies' is presented without implementation details, reward formulation, critique mechanism, or any reported performance improvement over iterations, rendering the self-programming assertion untestable.
  3. [Heuristic-based games section] Heuristic-based games paragraph: The assertion that the LLM 'synthesizes strategies by combining insights from classical minimax algorithms with crowd-sourced data' lacks any description of the integration method, data sources, or verification that the resulting strategies outperform pure minimax or pure LLM baselines.
minor comments (2)
  1. The single citation (shannon1950chess) is given only as an example; the manuscript would benefit from additional references to recent LLM game-playing work (e.g., on tool-augmented agents or self-critique loops) to situate the contribution.
  2. Terminology such as 'state-action mappings,' 'self-programming,' and 'crowd-sourced data' is used without precise definitions or examples of how these are represented inside the Nemobot environment.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We agree that the manuscript is a conceptual introduction to the Nemobot framework and does not include empirical evaluations, implementation specifics, or performance metrics. We will revise the text to ensure all claims accurately reflect the proposed paradigm without implying completed experiments or validated results. Our responses to the major comments are provided below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The statement that the LLM 'demonstrates its capabilities across four distinct classes of games' is unsupported; the manuscript contains no experimental results, success rates, error analyses, ground-truth comparisons, or case studies for any of the four classes (dictionary-based compression, mathematical reasoning in solvable games, minimax+crowd synthesis, or RL+self-critique).

    Authors: We agree that the phrasing in the abstract implies empirical demonstration that is not supported by results in the manuscript. The paper introduces a conceptual framework extending Shannon's taxonomy via the Nemobot environment. We will revise the abstract to replace 'demonstrates its capabilities' with 'is designed to support' or 'outlines proposed applications of' the LLM across the four classes, and add an explicit statement that the work is a framework proposal without experimental validation. revision: yes

  2. Referee: [Learning-based games section] Description of learning-based games (final paragraph before conclusion): The claim that the system 'utilizes reinforcement learning with human feedback and self-critique to iteratively refine strategies' is presented without implementation details, reward formulation, critique mechanism, or any reported performance improvement over iterations, rendering the self-programming assertion untestable.

    Authors: The referee correctly notes the absence of implementation details. This section describes the intended high-level mechanism for learning-based games within Nemobot, referencing established concepts like RLHF and self-critique. We will revise the paragraph to present this as a proposed integration rather than an implemented feature, add a statement that concrete reward formulations and mechanisms are left for future work, and adjust the self-programming language to 'contributes toward self-programming AI' to reflect the aspirational nature of the claim. revision: yes

  3. Referee: [Heuristic-based games section] Heuristic-based games paragraph: The assertion that the LLM 'synthesizes strategies by combining insights from classical minimax algorithms with crowd-sourced data' lacks any description of the integration method, data sources, or verification that the resulting strategies outperform pure minimax or pure LLM baselines.

    Authors: We acknowledge that no specific integration method, data sources, or comparative verification is provided. The text describes a conceptual approach for heuristic games in the Nemobot platform. We will revise the paragraph to clarify that the LLM is proposed to interpret and incorporate crowd-sourced insights into minimax evaluations, remove any implication of outperformance or completed synthesis, and frame this as a direction enabled by the framework rather than a demonstrated capability. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive framework with no derivations or self-referential reductions.

full rationale

The paper introduces Nemobot as an interactive environment and describes LLM behaviors across game classes without any equations, fitted parameters, mathematical derivations, or load-bearing self-citations. Claims about state-action compression, optimal strategy computation, minimax synthesis, and RL refinement are presented as observed capabilities of the system rather than results derived from first principles that reduce to the inputs by construction. The Shannon citation is external and historical, not self-referential. The central narrative of self-programming via crowdsourced learning is a high-level summary of the framework, not a chain that collapses to tautology or prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The paper introduces no mathematical axioms, free parameters, or invented physical entities; it relies on the assumed capabilities of existing LLMs and standard game-AI techniques.

invented entities (1)
  • Nemobot no independent evidence
    purpose: Interactive agentic engineering environment for creating and deploying LLM-powered game agents
    Central system introduced in the paper with no independent evidence or external validation provided in the abstract.

pith-pipeline@v0.9.0 · 5552 in / 1126 out tokens · 32098 ms · 2026-05-09T21:25:17.434717+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 6 internal anchors

  1. [1]

    Programming a computer for playing chess,

    C. E. Shannon, “Programming a computer for playing chess,”Philo- sophical Magazine, vol. 41, pp. 256–275, March 1950

  2. [2]

    Computers and automata,

    ——, “Computers and automata,”Proceedings of the IRE, vol. 41, pp. 1234–1241, 1953

  3. [3]

    Why programming is a good medium for expressing poorly understood and sloppily-formulated ideas,

    M. Minsky, “Why programming is a good medium for expressing poorly understood and sloppily-formulated ideas,” inDesign and Planning II: Computers in Design and Communication, M. Krampen and P. Seitz, Eds. New York: Hastings House Publishers, 1967, pp. 120–128

  4. [4]

    A survey on large language model-based game agents

    S. Hu, T. Huang, F. Ilhan, S. Tekin, G. Liu, R. Kompella, and L. Liu, “A survey on large language model-based game agents,”arXiv preprint arXiv:2404.02039, 2024. [Online]. Available: https://arxiv.org/abs/2404.02039

  5. [5]

    PositionPaper: Agent AI Towards a HolisticIntelligence, 2024

    Q. Huang, N. Wake, B. Sarkar, Z. Durante, R. Gong, R. Taori, Y . Noda, D. Terzopoulos, N. Kuno, A. Famoti, A. J. Llorens, J. Langford, H. V o, F.-F. Li, K. Ikeuchi, and J. Gao, “Agent ai towards a holistic intelligence,”arXiv preprint arXiv:2403.00833, 2024, accessed: 2024-12-15. [Online]. Available: https://arxiv.org/abs/2403.00833

  6. [6]

    Human-level play in the game of diplomacy by combining language models with strategic reasoning,

    M. F. A. R. D. Team, A. Bakhtin, N. Brown, E. Dinan, G. Farina, C. Flaherty, D. Fried, A. Goff, J. Gray, H. Hu, A. P. Jacob, M. Komeili, K. Konath, M. Kwon, A. Lerer, and M. Lewis, “Human-level play in the game of diplomacy by combining language models with strategic reasoning,”Science, vol. 378, pp. 1067–1074, 2022

  7. [7]

    LLMs may not be human-level players, but they can be testers: Measuring game difficulty with LLM agents,

    C. Xiao and B. Z. Yang, “LLMs may not be human-level players, but they can be testers: Measuring game difficulty with LLM agents,”arXiv preprint arXiv:2410.02829, 2024. [Online]. Available: https://arxiv.org/abs/2410.02829

  8. [8]

    ChessGPT: Bridging policy learning and language modeling,

    X. Feng, Y . Luo, Z. Wang, H. Tang, M. Yang, K. Shao, D. Mguni, Y . Du, and J. Wang, “ChessGPT: Bridging policy learning and language modeling,” inAdvances in Neural Information Processing Systems, vol. 36, 2023

  9. [9]

    Alphastar: Mastering the real-time strategy game starcraft ii,

    DeepMind, “Alphastar: Mastering the real-time strategy game starcraft ii,” 2019, https://deepmind.google/discover/blog/ alphastar-mastering-the-real-time-strategy-game-starcraft-ii

  10. [10]

    Suspicion-agent: Playing imperfect information games with theory of mind aware gpt-4,

    J. Guo, B. Yang, P. Yoo, B. Y . Lin, Y . Iwasawa, and Y . Matsuo, “Suspicion-agent: Playing imperfect information games with theory of mind aware gpt-4,” inConference on Logic and Machine Learning, 2024

  11. [11]

    Chess as a testbed for language model state tracking,

    S. Toshniwal, S. Wiseman, K. Livescu, and K. Gimpel, “Chess as a testbed for language model state tracking,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, 2022, pp. 11 385–11 393

  12. [12]

    Emergent world representations: Exploring a sequence model trained on a synthetic task,

    K. Li, A. K. Hopkins, D. Bau, F. Vi ´egas, H. Pfister, and M. Wattenberg, “Emergent world representations: Exploring a sequence model trained on a synthetic task,” inInternational Conference on Learning Represen- tations, 2023

  13. [13]

    Evaluating the world model implicit in a generative model,

    K. Vafa, J. Y . Chen, A. Rambachan, J. Kleinberg, and S. Mullainathan, “Evaluating the world model implicit in a generative model,” inAd- vances in Neural Information Processing Systems, vol. 37, 2024, to appear in NeurIPS 2024

  14. [14]

    Self-consistency improves chain of thought reasoning in language models,

    X. Wang, J. Wei, D. Schuurmans, Q. V . Le, E. H. Chi, S. Narang, A. Chowdhery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” inInternational Conference on Learning Representations, 2023

  15. [15]

    ReAct: Synergizing reasoning and acting in language models,

    S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models,” in International Conference on Learning Representations, 2023

  16. [16]

    Graph of thoughts: Solving elaborate problems with large language models,

    M. Besta, N. Blach, A. Kubicek, R. Gerstenberger, M. Podstawski, L. Gianinazzi, J. Gajda, T. Lehmann, H. Niewiadomski, P. Nyczyk, and T. Hoefler, “Graph of thoughts: Solving elaborate problems with large language models,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 16, 2024, pp. 17 682–17 690

  17. [17]

    SPRING: Studying the paper and reasoning to play games,

    Y . Wu, S. Prabhumoye, S. Y . Min, Y . Bisk, R. Salakhutdinov, A. Azaria, T. Mitchell, and Y . Li, “SPRING: Studying the paper and reasoning to play games,” inAdvances in Neural Information Processing Systems, vol. 36, 2023

  18. [18]

    Evaluating Large Language Models Trained on Code

    M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y . Burda, N. Joseph, G. Brockmanet al., “Evaluating large language models trained on code,”arXiv preprint arXiv:2107.03374, 2021

  19. [19]

    Introducing GitHub Copilot: Your AI pair programmer,

    N. Friedman, “Introducing GitHub Copilot: Your AI pair programmer,” https://github.com/features/copilot, 2021, accessed: 2023-07-08

  20. [20]

    Copilot for Xcode: Exploring AI-assisted programming by prompting cloud-based large language models,

    C. W. Tan, S. Guo, M. F. Wong, and C. N. Hang, “Copilot for Xcode: Exploring AI-assisted programming by prompting cloud-based large language models,”arXiv preprint arXiv:2307.14349, 2023

  21. [21]

    From code generation to software testing: AI copilot with context-based RAG,

    Y . Wang, S. Guo, and C. W. Tan, “From code generation to software testing: AI copilot with context-based RAG,”IEEE Software, vol. 42, no. 4, pp. 34–42, 2025

  22. [22]

    Codex: A coding agent that helps you build and ship with ai—powered by chatgpt,

    OpenAI, “Codex: A coding agent that helps you build and ship with ai—powered by chatgpt,” https://openai.com/codex/, 2026, accessed: 2026-03-08

  23. [23]

    Claude Code by Anthropic,

    Anthropic Claude, “Claude Code by Anthropic,” https://www.anthropic. com/product/claude-code, 2025, accessed: 2026-03-08

  24. [24]

    GPT-4 Technical Report

    OpenAI, J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altmanet al., “GPT-4 technical report,”arXiv preprint arXiv:2303.08774, 2023. [Online]. Available: https://arxiv.org/abs/2303.08774

  25. [26]

    Gemini: A Family of Highly Capable Multimodal Models

    [Online]. Available: https://arxiv.org/abs/2312.11805

  26. [27]

    Contextual augmented multi-model programming (CAMP): A local-cloud copilot solution,

    Y . Wang, S. Guo, and C. W. Tan, “Contextual augmented multi-model programming (CAMP): A local-cloud copilot solution,” in2025 IEEE Conference on Artificial Intelligence (CAI). IEEE, 2025, pp. 675–681

  27. [28]

    Game playing machines,

    C. E. Shannon, “Game playing machines,”Journal of the Franklin Institute, vol. 260, no. 6, pp. 447–453, December 1955

  28. [29]

    Goal-oriented interactions in games using LLMs,

    A. Phillips, J. Lang, and D. Mould, “Goal-oriented interactions in games using LLMs,”IEEE Transactions on Games, vol. 17, no. 2, pp. 510–521, June 2025

  29. [30]

    GamiDOC: The importance of designing gamification in a proper way,

    S. Bassanelli, A. Bucchiarone, and F. Gini, “GamiDOC: The importance of designing gamification in a proper way,”IEEE Transactions on Games, vol. 17, no. 1, pp. 13–31, March 2025

  30. [31]

    Nemobot: Crafting strategic gaming LLM agents for k-12 AI education,

    Y . Wang, S. Guo, L. Ling, and C. W. Tan, “Nemobot: Crafting strategic gaming LLM agents for k-12 AI education,” inProceedings of the Eleventh ACM Conference on Learning@ Scale, 2024, pp. 393–397

  31. [32]

    Natural language generation and understanding of big code for AI-assisted programming: A review,

    M.-F. Wong, S. Guo, C.-N. Hang, S.-W. Ho, and C. W. Tan, “Natural language generation and understanding of big code for AI-assisted programming: A review,”Entropy, vol. 25, no. 6, p. 888, 2023

  32. [33]

    Aligning crowd-sourced human feedback for reinforcement learning on code generation by large language model,

    M. F. Wong and C. W. Tan, “Aligning crowd-sourced human feedback for reinforcement learning on code generation by large language model,” inIEEE Transactions on Big Data, to appear, 2024

  33. [34]

    A game based on the euclidean algorithm and a winning strategy for it,

    A. J. Cole and A. J. T. Davie, “A game based on the euclidean algorithm and a winning strategy for it,”Mathematical Gazette, vol. 53, no. 386, pp. 354–357, 1969

  34. [35]

    Mastering the game of go with deep neural networks and tree search,

    D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershelvam, M. Lanc- totet al., “Mastering the game of go with deep neural networks and tree search,”Nature, vol. 529, pp. 484–489, 2016

  35. [36]

    A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,

    D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,”Science, vol. 362, no. 6419, pp. 1140–1144, 2018

  36. [38]

    Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

    [Online]. Available: https://arxiv.org/abs/2201.11903

  37. [39]

    Reinforcement learning: Connections, surprises, chal- lenges,

    A. G. Barto, “Reinforcement learning: Connections, surprises, chal- lenges,”AI Magazine, vol. 40, no. 1, pp. 3–15, 2019

  38. [40]

    Boxes: An experiment in adaptive control,

    D. Michie and R. A. Chambers, “Boxes: An experiment in adaptive control,”Machine intelligence, vol. 2, no. 2, pp. 137–152, 1968

  39. [41]

    Machines and the theory of intelligence,

    D. Michie, “Machines and the theory of intelligence,”Nature, vol. 241, no. 23.02, p. 1973, 1973

  40. [42]

    Filter, rank, and transfer the knowledge: Learning to chat,

    S. Jafarpour and A. R. C. J. C. Burges, “Filter, rank, and transfer the knowledge: Learning to chat,”Advances in Neural Information Processing Systems Workshop on Advances in Ranking, vol. 10, 2010

  41. [43]

    ’Memo’ functions and machine learning,

    D. Michie, “’Memo’ functions and machine learning,”Nature, vol. 218, pp. 19–22, 1968. 13

  42. [44]

    Discovery as collaboration: It takes two (at least) to tango,

    ——, “Discovery as collaboration: It takes two (at least) to tango,” Electronic Transactions on Artificial Intelligence, vol. 4, no. B, pp. 1–19, 2000

  43. [45]

    Deep reinforcement learning from human preferences,

    P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,”Advances in Neural Information Processing Systems, vol. 30, 2017

  44. [46]

    WebGPT: Browser-assisted question-answering with human feedback

    R. Nakano, J. Hilton, S. Balaji, J. Wu, L. Ouyang, C. Kim, C. Hesse, S. Jain, V . Kosaraju, W. Saunders, X. Jiang, K. Cobbe, T. Eloundou, G. Krueger, K. Button, M. Knight, B. Chess, and J. Schulman, “WebGPT: Browser-assisted question-answering with human feedback,”CoRR, vol. abs/2112.09332, 2021. [Online]. Available: https://arxiv.org/abs/2112.09332

  45. [47]

    A chatbot-server framework for scalable machine learning education through crowdsourced data,

    J. Li, C. W. Tan, C. Hang, and X. Qi, “A chatbot-server framework for scalable machine learning education through crowdsourced data,” in Proceedings of the Ninth ACM Conference on Learning @ Scale (L@S ’22). New York, NY , USA: ACM, 2022

  46. [48]

    Kasparov and D

    G. Kasparov and D. King,Kasparov Against the World: The Story of the Greatest Online Challenge. New York: KasparovChess Online, Inc., 2000

  47. [49]

    Chatgpt: Optimizing language models for dialogue,

    OpenAI, “Chatgpt: Optimizing language models for dialogue,” Jan

  48. [50]

    Available: https://openai.com/blog/chatgpt/

    [Online]. Available: https://openai.com/blog/chatgpt/

  49. [51]

    Voyager: An Open-Ended Embodied Agent with Large Language Models

    G. Wang, Y . Xie, Y . Jiang, A. Mandlekar, C. Xiao, Y . Zhu, L. Fan, and A. Anandkumar, “V oyager: An open-ended embodied agent with large language models,”arXiv preprint arXiv:2305.16291, 2023

  50. [52]

    Generative agents: Interactive simulacra of human behavior,

    J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” inProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023, pp. 1–22

  51. [53]

    Future-proofing programmers: Optimal knowledge tracing for ai-assisted personalized education,

    Y . Wang, P.-D. Yu, and C. W. Tan, “Future-proofing programmers: Optimal knowledge tracing for ai-assisted personalized education,”IEEE Signal Processing Magazine, vol. 43, no. 1, pp. 69–82, 2026

  52. [54]

    Large language model-driven classroom flipping: Empowering student-centric peer questioning with flipped interaction,

    C. W. Tan, “Large language model-driven classroom flipping: Empowering student-centric peer questioning with flipped interaction,” CoRR, vol. abs/2311.14708, 2023. [Online]. Available: https://arxiv.org/ abs/2311.14708

  53. [55]

    Knowledge, learning and machine intelligence,

    D. Michie, “Knowledge, learning and machine intelligence,” inIntelli- gent Systems. Springer, Boston, MA, 1993, pp. 63–79

  54. [56]

    Can go AIs be adversarially robust?

    T. Tseng, E. McLean, K. Pelrine, T. T. Wang, and A. Gleave, “Can go AIs be adversarially robust?” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 26, 2025, p. 34980

  55. [57]

    Adversarial policies beat superhuman go AIs,

    T. T. Wang, A. Gleave, T. Tseng, K. Pelrine, N. Belrose, J. Miller, M. D. Dennis, Y . Duan, V . Pogrebniak, S. Levine, and S. Russell, “Adversarial policies beat superhuman go AIs,” inProceedings of the 40th International Conference on Machine Learning, 2023, p. 202

  56. [58]

    Data science at the singularity,

    D. Donoho, “Data science at the singularity,”Harvard Data Science Review, vol. 6, no. 1, 2024