Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models

Chee Wei Tan; Shangxin Guo; Yuchen Wang

arxiv: 2604.21896 · v1 · submitted 2026-04-23 · 💻 cs.AI

Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models

Chee Wei Tan , Yuchen Wang , Shangxin Guo This is my paper

Pith reviewed 2026-05-09 21:25 UTC · model grok-4.3

classification 💻 cs.AI

keywords large language modelsAI agentsgame theoryself-programminginteractive learningreinforcement learningstrategic AIcrowdsourced data

0 comments

The pith

Nemobot lets large language models create self-improving AI game agents in four categories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces Nemobot, a platform for building game agents powered by large language models that can adapt and improve their strategies. It covers dictionary-based games where state mappings are compressed, solvable games using math for optimal play with explanations, heuristic games blending algorithms and crowd data, and learning games using reinforcement with feedback. If true, this would mean AI can iteratively refine its logic using human creativity and crowdsourced insights, making strategic AI more accessible and dynamic. Readers should care because it bridges traditional game theory with modern language models toward self-programming systems.

Core claim

The central claim is that Nemobot provides an interactive environment where LLM-powered agents demonstrate self-programming by integrating crowdsourced learning and human creativity to refine their own logic. For different game classes, the agents compress mappings for adaptability, compute optimal strategies with explanations, synthesize from minimax and crowd data, and refine via reinforcement learning and self-critique. This operationalizes an extended version of Shannon's game-playing machine taxonomy and represents progress toward self-programming AI.

What carries the argument

Nemobot, the agentic engineering environment that uses LLMs for tool-augmented generation and fine-tuning to enable agents to self-program strategies in games.

If this is right

Agents can rapidly adapt to dictionary-based games by generalizing from compressed models.
Optimal strategies in solvable games come with human-readable decision explanations.
Strategies in heuristic games improve by merging classical algorithms with crowd insights.
Learning-based games see iterative refinement through human feedback and self-critique.
Users gain a programmable space to experiment with and deploy these strategic agents.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the approach scales, it could enable similar self-refinement in non-game domains requiring strategy like negotiations or resource allocation.
Combining LLMs with traditional algorithms might lower the barrier for creating sophisticated AI systems without deep coding expertise.
Further tests on real-world interactions could show how crowdsourcing accelerates strategy evolution beyond simulated games.

Load-bearing premise

Large language models are capable of reliably compressing complex state-action mappings, performing accurate mathematical reasoning, synthesizing effective strategies from mixed sources, and self-improving through feedback without introducing critical errors.

What would settle it

A demonstration where the Nemobot agents fail to improve or produce suboptimal strategies in a learning-based game after multiple rounds of self-critique and human feedback would disprove the self-programming capability.

Figures

Figures reproduced from arXiv: 2604.21896 by Chee Wei Tan, Shangxin Guo, Yuchen Wang.

**Figure 2.** Figure 2: Shannon’s four types of game-playing machines and their typical examples of applications. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: System Flow of Nemobot. Students program strategic [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: UI of Nemobot. (a)&(b) Programs written on the coding pad are synchronously executed and rendered on the chat [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: The chatbot interface of the Mancala game. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Training process illustrating the number of game trials [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

read the original abstract

This paper introduces a new paradigm for AI game programming, leveraging large language models (LLMs) to extend and operationalize Claude Shannon's taxonomy of game-playing machines. Central to this paradigm is Nemobot, an interactive agentic engineering environment that enables users to create, customize, and deploy LLM-powered game agents while actively engaging with AI-driven strategies. The LLM-based chatbot, integrated within Nemobot, demonstrates its capabilities across four distinct classes of games. For dictionary-based games, it compresses state-action mappings into efficient, generalized models for rapid adaptability. In rigorously solvable games, it employs mathematical reasoning to compute optimal strategies and generates human-readable explanations for its decisions. For heuristic-based games, it synthesizes strategies by combining insights from classical minimax algorithms (see, e.g., shannon1950chess) with crowd-sourced data. Finally, in learning-based games, it utilizes reinforcement learning with human feedback and self-critique to iteratively refine strategies through trial-and-error and imitation learning. Nemobot amplifies this framework by offering a programmable environment where users can experiment with tool-augmented generation and fine-tuning of strategic game agents. From strategic games to role-playing games, Nemobot demonstrates how AI agents can achieve a form of self-programming by integrating crowdsourced learning and human creativity to iteratively refine their own logic. This represents a step toward the long-term goal of self-programming AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper describes an LLM game agent framework called Nemobot but supplies no evidence or results to support its claims about strategy computation and self-programming.

read the letter

The key takeaway is that this paper presents Nemobot as a new interactive environment for creating LLM-based game agents by extending Shannon's taxonomy, but it makes strong claims about LLM capabilities without any experimental support or data. It does a decent job organizing game types into dictionary, solvable, heuristic, and learning categories, and suggests how LLMs could handle each—compressing states, reasoning mathematically, blending minimax with crowdsourcing, and using RL with self-critique. The interactive programmable aspect for users to experiment with agents is a practical idea for educational purposes in AI strategy. The main weakness is the complete lack of validation. The abstract asserts these behaviors occur across the classes but provides no success rates, error analysis, comparisons to baselines, or even concrete examples. Given known issues with LLM reasoning consistency, the assumption that they can reliably derive optimal strategies or achieve self-programming through iteration is not demonstrated. This work is aimed at practitioners or educators interested in building LLM tools for games and interactive learning. It could be useful as a starting point for ideas, but it doesn't offer new technical insights or reproducible findings. I wouldn't cite it or bring it to a reading group for serious discussion. It probably shouldn't go to peer review until it includes actual implementations, tests, and results to back up the framework.

Referee Report

3 major / 2 minor

Summary. The paper introduces Nemobot, an interactive LLM-powered agentic environment for creating and customizing strategic game agents. It extends Shannon's game taxonomy by claiming that LLMs can (1) compress state-action mappings for dictionary games, (2) perform mathematical reasoning to derive optimal strategies and explanations in solvable games, (3) synthesize minimax algorithms with crowd-sourced data for heuristic games, and (4) apply reinforcement learning with human feedback and self-critique for iterative refinement in learning games. The central claim is that this framework enables a form of self-programming AI through integration of crowdsourced learning and human creativity.

Significance. If the claimed LLM behaviors were empirically validated with metrics, error rates, and comparisons, the work could offer a useful programmable testbed for studying interactive agent refinement and human-AI co-creation of strategies. The absence of any experimental results, datasets, success rates, or ablation studies means the significance remains speculative.

major comments (3)

[Abstract] Abstract: The statement that the LLM 'demonstrates its capabilities across four distinct classes of games' is unsupported; the manuscript contains no experimental results, success rates, error analyses, ground-truth comparisons, or case studies for any of the four classes (dictionary-based compression, mathematical reasoning in solvable games, minimax+crowd synthesis, or RL+self-critique).
[Learning-based games section] Description of learning-based games (final paragraph before conclusion): The claim that the system 'utilizes reinforcement learning with human feedback and self-critique to iteratively refine strategies' is presented without implementation details, reward formulation, critique mechanism, or any reported performance improvement over iterations, rendering the self-programming assertion untestable.
[Heuristic-based games section] Heuristic-based games paragraph: The assertion that the LLM 'synthesizes strategies by combining insights from classical minimax algorithms with crowd-sourced data' lacks any description of the integration method, data sources, or verification that the resulting strategies outperform pure minimax or pure LLM baselines.

minor comments (2)

The single citation (shannon1950chess) is given only as an example; the manuscript would benefit from additional references to recent LLM game-playing work (e.g., on tool-augmented agents or self-critique loops) to situate the contribution.
Terminology such as 'state-action mappings,' 'self-programming,' and 'crowd-sourced data' is used without precise definitions or examples of how these are represented inside the Nemobot environment.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We agree that the manuscript is a conceptual introduction to the Nemobot framework and does not include empirical evaluations, implementation specifics, or performance metrics. We will revise the text to ensure all claims accurately reflect the proposed paradigm without implying completed experiments or validated results. Our responses to the major comments are provided below.

read point-by-point responses

Referee: [Abstract] Abstract: The statement that the LLM 'demonstrates its capabilities across four distinct classes of games' is unsupported; the manuscript contains no experimental results, success rates, error analyses, ground-truth comparisons, or case studies for any of the four classes (dictionary-based compression, mathematical reasoning in solvable games, minimax+crowd synthesis, or RL+self-critique).

Authors: We agree that the phrasing in the abstract implies empirical demonstration that is not supported by results in the manuscript. The paper introduces a conceptual framework extending Shannon's taxonomy via the Nemobot environment. We will revise the abstract to replace 'demonstrates its capabilities' with 'is designed to support' or 'outlines proposed applications of' the LLM across the four classes, and add an explicit statement that the work is a framework proposal without experimental validation. revision: yes
Referee: [Learning-based games section] Description of learning-based games (final paragraph before conclusion): The claim that the system 'utilizes reinforcement learning with human feedback and self-critique to iteratively refine strategies' is presented without implementation details, reward formulation, critique mechanism, or any reported performance improvement over iterations, rendering the self-programming assertion untestable.

Authors: The referee correctly notes the absence of implementation details. This section describes the intended high-level mechanism for learning-based games within Nemobot, referencing established concepts like RLHF and self-critique. We will revise the paragraph to present this as a proposed integration rather than an implemented feature, add a statement that concrete reward formulations and mechanisms are left for future work, and adjust the self-programming language to 'contributes toward self-programming AI' to reflect the aspirational nature of the claim. revision: yes
Referee: [Heuristic-based games section] Heuristic-based games paragraph: The assertion that the LLM 'synthesizes strategies by combining insights from classical minimax algorithms with crowd-sourced data' lacks any description of the integration method, data sources, or verification that the resulting strategies outperform pure minimax or pure LLM baselines.

Authors: We acknowledge that no specific integration method, data sources, or comparative verification is provided. The text describes a conceptual approach for heuristic games in the Nemobot platform. We will revise the paragraph to clarify that the LLM is proposed to interpret and incorporate crowd-sourced insights into minimax evaluations, remove any implication of outperformance or completed synthesis, and frame this as a direction enabled by the framework rather than a demonstrated capability. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive framework with no derivations or self-referential reductions.

full rationale

The paper introduces Nemobot as an interactive environment and describes LLM behaviors across game classes without any equations, fitted parameters, mathematical derivations, or load-bearing self-citations. Claims about state-action compression, optimal strategy computation, minimax synthesis, and RL refinement are presented as observed capabilities of the system rather than results derived from first principles that reduce to the inputs by construction. The Shannon citation is external and historical, not self-referential. The central narrative of self-programming via crowdsourced learning is a high-level summary of the framework, not a chain that collapses to tautology or prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The paper introduces no mathematical axioms, free parameters, or invented physical entities; it relies on the assumed capabilities of existing LLMs and standard game-AI techniques.

invented entities (1)

Nemobot no independent evidence
purpose: Interactive agentic engineering environment for creating and deploying LLM-powered game agents
Central system introduced in the paper with no independent evidence or external validation provided in the abstract.

pith-pipeline@v0.9.0 · 5552 in / 1126 out tokens · 32098 ms · 2026-05-09T21:25:17.434717+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 6 internal anchors

[1]

Programming a computer for playing chess,

C. E. Shannon, “Programming a computer for playing chess,”Philo- sophical Magazine, vol. 41, pp. 256–275, March 1950

work page 1950
[2]

Computers and automata,

——, “Computers and automata,”Proceedings of the IRE, vol. 41, pp. 1234–1241, 1953

work page 1953
[3]

Why programming is a good medium for expressing poorly understood and sloppily-formulated ideas,

M. Minsky, “Why programming is a good medium for expressing poorly understood and sloppily-formulated ideas,” inDesign and Planning II: Computers in Design and Communication, M. Krampen and P. Seitz, Eds. New York: Hastings House Publishers, 1967, pp. 120–128

work page 1967
[4]

A survey on large language model-based game agents

S. Hu, T. Huang, F. Ilhan, S. Tekin, G. Liu, R. Kompella, and L. Liu, “A survey on large language model-based game agents,”arXiv preprint arXiv:2404.02039, 2024. [Online]. Available: https://arxiv.org/abs/2404.02039

work page arXiv 2024
[5]

PositionPaper: Agent AI Towards a HolisticIntelligence, 2024

Q. Huang, N. Wake, B. Sarkar, Z. Durante, R. Gong, R. Taori, Y . Noda, D. Terzopoulos, N. Kuno, A. Famoti, A. J. Llorens, J. Langford, H. V o, F.-F. Li, K. Ikeuchi, and J. Gao, “Agent ai towards a holistic intelligence,”arXiv preprint arXiv:2403.00833, 2024, accessed: 2024-12-15. [Online]. Available: https://arxiv.org/abs/2403.00833

work page arXiv 2024
[6]

Human-level play in the game of diplomacy by combining language models with strategic reasoning,

M. F. A. R. D. Team, A. Bakhtin, N. Brown, E. Dinan, G. Farina, C. Flaherty, D. Fried, A. Goff, J. Gray, H. Hu, A. P. Jacob, M. Komeili, K. Konath, M. Kwon, A. Lerer, and M. Lewis, “Human-level play in the game of diplomacy by combining language models with strategic reasoning,”Science, vol. 378, pp. 1067–1074, 2022

work page 2022
[7]

LLMs may not be human-level players, but they can be testers: Measuring game difficulty with LLM agents,

C. Xiao and B. Z. Yang, “LLMs may not be human-level players, but they can be testers: Measuring game difficulty with LLM agents,”arXiv preprint arXiv:2410.02829, 2024. [Online]. Available: https://arxiv.org/abs/2410.02829

work page arXiv 2024
[8]

ChessGPT: Bridging policy learning and language modeling,

X. Feng, Y . Luo, Z. Wang, H. Tang, M. Yang, K. Shao, D. Mguni, Y . Du, and J. Wang, “ChessGPT: Bridging policy learning and language modeling,” inAdvances in Neural Information Processing Systems, vol. 36, 2023

work page 2023
[9]

Alphastar: Mastering the real-time strategy game starcraft ii,

DeepMind, “Alphastar: Mastering the real-time strategy game starcraft ii,” 2019, https://deepmind.google/discover/blog/ alphastar-mastering-the-real-time-strategy-game-starcraft-ii

work page 2019
[10]

Suspicion-agent: Playing imperfect information games with theory of mind aware gpt-4,

J. Guo, B. Yang, P. Yoo, B. Y . Lin, Y . Iwasawa, and Y . Matsuo, “Suspicion-agent: Playing imperfect information games with theory of mind aware gpt-4,” inConference on Logic and Machine Learning, 2024

work page 2024
[11]

Chess as a testbed for language model state tracking,

S. Toshniwal, S. Wiseman, K. Livescu, and K. Gimpel, “Chess as a testbed for language model state tracking,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, 2022, pp. 11 385–11 393

work page 2022
[12]

Emergent world representations: Exploring a sequence model trained on a synthetic task,

K. Li, A. K. Hopkins, D. Bau, F. Vi ´egas, H. Pfister, and M. Wattenberg, “Emergent world representations: Exploring a sequence model trained on a synthetic task,” inInternational Conference on Learning Represen- tations, 2023

work page 2023
[13]

Evaluating the world model implicit in a generative model,

K. Vafa, J. Y . Chen, A. Rambachan, J. Kleinberg, and S. Mullainathan, “Evaluating the world model implicit in a generative model,” inAd- vances in Neural Information Processing Systems, vol. 37, 2024, to appear in NeurIPS 2024

work page 2024
[14]

Self-consistency improves chain of thought reasoning in language models,

X. Wang, J. Wei, D. Schuurmans, Q. V . Le, E. H. Chi, S. Narang, A. Chowdhery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” inInternational Conference on Learning Representations, 2023

work page 2023
[15]

ReAct: Synergizing reasoning and acting in language models,

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models,” in International Conference on Learning Representations, 2023

work page 2023
[16]

Graph of thoughts: Solving elaborate problems with large language models,

M. Besta, N. Blach, A. Kubicek, R. Gerstenberger, M. Podstawski, L. Gianinazzi, J. Gajda, T. Lehmann, H. Niewiadomski, P. Nyczyk, and T. Hoefler, “Graph of thoughts: Solving elaborate problems with large language models,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 16, 2024, pp. 17 682–17 690

work page 2024
[17]

SPRING: Studying the paper and reasoning to play games,

Y . Wu, S. Prabhumoye, S. Y . Min, Y . Bisk, R. Salakhutdinov, A. Azaria, T. Mitchell, and Y . Li, “SPRING: Studying the paper and reasoning to play games,” inAdvances in Neural Information Processing Systems, vol. 36, 2023

work page 2023
[18]

Evaluating Large Language Models Trained on Code

M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y . Burda, N. Joseph, G. Brockmanet al., “Evaluating large language models trained on code,”arXiv preprint arXiv:2107.03374, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[19]

Introducing GitHub Copilot: Your AI pair programmer,

N. Friedman, “Introducing GitHub Copilot: Your AI pair programmer,” https://github.com/features/copilot, 2021, accessed: 2023-07-08

work page 2021
[20]

Copilot for Xcode: Exploring AI-assisted programming by prompting cloud-based large language models,

C. W. Tan, S. Guo, M. F. Wong, and C. N. Hang, “Copilot for Xcode: Exploring AI-assisted programming by prompting cloud-based large language models,”arXiv preprint arXiv:2307.14349, 2023

work page arXiv 2023
[21]

From code generation to software testing: AI copilot with context-based RAG,

Y . Wang, S. Guo, and C. W. Tan, “From code generation to software testing: AI copilot with context-based RAG,”IEEE Software, vol. 42, no. 4, pp. 34–42, 2025

work page 2025
[22]

Codex: A coding agent that helps you build and ship with ai—powered by chatgpt,

OpenAI, “Codex: A coding agent that helps you build and ship with ai—powered by chatgpt,” https://openai.com/codex/, 2026, accessed: 2026-03-08

work page 2026
[23]

Claude Code by Anthropic,

Anthropic Claude, “Claude Code by Anthropic,” https://www.anthropic. com/product/claude-code, 2025, accessed: 2026-03-08

work page 2025
[24]

GPT-4 Technical Report

OpenAI, J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altmanet al., “GPT-4 technical report,”arXiv preprint arXiv:2303.08774, 2023. [Online]. Available: https://arxiv.org/abs/2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2023
[26]

Gemini: A Family of Highly Capable Multimodal Models

[Online]. Available: https://arxiv.org/abs/2312.11805

work page internal anchor Pith review Pith/arXiv arXiv
[27]

Contextual augmented multi-model programming (CAMP): A local-cloud copilot solution,

Y . Wang, S. Guo, and C. W. Tan, “Contextual augmented multi-model programming (CAMP): A local-cloud copilot solution,” in2025 IEEE Conference on Artificial Intelligence (CAI). IEEE, 2025, pp. 675–681

work page 2025
[28]

Game playing machines,

C. E. Shannon, “Game playing machines,”Journal of the Franklin Institute, vol. 260, no. 6, pp. 447–453, December 1955

work page 1955
[29]

Goal-oriented interactions in games using LLMs,

A. Phillips, J. Lang, and D. Mould, “Goal-oriented interactions in games using LLMs,”IEEE Transactions on Games, vol. 17, no. 2, pp. 510–521, June 2025

work page 2025
[30]

GamiDOC: The importance of designing gamification in a proper way,

S. Bassanelli, A. Bucchiarone, and F. Gini, “GamiDOC: The importance of designing gamification in a proper way,”IEEE Transactions on Games, vol. 17, no. 1, pp. 13–31, March 2025

work page 2025
[31]

Nemobot: Crafting strategic gaming LLM agents for k-12 AI education,

Y . Wang, S. Guo, L. Ling, and C. W. Tan, “Nemobot: Crafting strategic gaming LLM agents for k-12 AI education,” inProceedings of the Eleventh ACM Conference on Learning@ Scale, 2024, pp. 393–397

work page 2024
[32]

Natural language generation and understanding of big code for AI-assisted programming: A review,

M.-F. Wong, S. Guo, C.-N. Hang, S.-W. Ho, and C. W. Tan, “Natural language generation and understanding of big code for AI-assisted programming: A review,”Entropy, vol. 25, no. 6, p. 888, 2023

work page 2023
[33]

Aligning crowd-sourced human feedback for reinforcement learning on code generation by large language model,

M. F. Wong and C. W. Tan, “Aligning crowd-sourced human feedback for reinforcement learning on code generation by large language model,” inIEEE Transactions on Big Data, to appear, 2024

work page 2024
[34]

A game based on the euclidean algorithm and a winning strategy for it,

A. J. Cole and A. J. T. Davie, “A game based on the euclidean algorithm and a winning strategy for it,”Mathematical Gazette, vol. 53, no. 386, pp. 354–357, 1969

work page 1969
[35]

Mastering the game of go with deep neural networks and tree search,

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershelvam, M. Lanc- totet al., “Mastering the game of go with deep neural networks and tree search,”Nature, vol. 529, pp. 484–489, 2016

work page 2016
[36]

A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,

D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,”Science, vol. 362, no. 6419, pp. 1140–1144, 2018

work page 2018
[38]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

[Online]. Available: https://arxiv.org/abs/2201.11903

work page internal anchor Pith review arXiv
[39]

Reinforcement learning: Connections, surprises, chal- lenges,

A. G. Barto, “Reinforcement learning: Connections, surprises, chal- lenges,”AI Magazine, vol. 40, no. 1, pp. 3–15, 2019

work page 2019
[40]

Boxes: An experiment in adaptive control,

D. Michie and R. A. Chambers, “Boxes: An experiment in adaptive control,”Machine intelligence, vol. 2, no. 2, pp. 137–152, 1968

work page 1968
[41]

Machines and the theory of intelligence,

D. Michie, “Machines and the theory of intelligence,”Nature, vol. 241, no. 23.02, p. 1973, 1973

work page 1973
[42]

Filter, rank, and transfer the knowledge: Learning to chat,

S. Jafarpour and A. R. C. J. C. Burges, “Filter, rank, and transfer the knowledge: Learning to chat,”Advances in Neural Information Processing Systems Workshop on Advances in Ranking, vol. 10, 2010

work page 2010
[43]

’Memo’ functions and machine learning,

D. Michie, “’Memo’ functions and machine learning,”Nature, vol. 218, pp. 19–22, 1968. 13

work page 1968
[44]

Discovery as collaboration: It takes two (at least) to tango,

——, “Discovery as collaboration: It takes two (at least) to tango,” Electronic Transactions on Artificial Intelligence, vol. 4, no. B, pp. 1–19, 2000

work page 2000
[45]

Deep reinforcement learning from human preferences,

P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,”Advances in Neural Information Processing Systems, vol. 30, 2017

work page 2017
[46]

WebGPT: Browser-assisted question-answering with human feedback

R. Nakano, J. Hilton, S. Balaji, J. Wu, L. Ouyang, C. Kim, C. Hesse, S. Jain, V . Kosaraju, W. Saunders, X. Jiang, K. Cobbe, T. Eloundou, G. Krueger, K. Button, M. Knight, B. Chess, and J. Schulman, “WebGPT: Browser-assisted question-answering with human feedback,”CoRR, vol. abs/2112.09332, 2021. [Online]. Available: https://arxiv.org/abs/2112.09332

work page internal anchor Pith review arXiv 2021
[47]

A chatbot-server framework for scalable machine learning education through crowdsourced data,

J. Li, C. W. Tan, C. Hang, and X. Qi, “A chatbot-server framework for scalable machine learning education through crowdsourced data,” in Proceedings of the Ninth ACM Conference on Learning @ Scale (L@S ’22). New York, NY , USA: ACM, 2022

work page 2022
[48]

Kasparov and D

G. Kasparov and D. King,Kasparov Against the World: The Story of the Greatest Online Challenge. New York: KasparovChess Online, Inc., 2000

work page 2000
[49]

Chatgpt: Optimizing language models for dialogue,

OpenAI, “Chatgpt: Optimizing language models for dialogue,” Jan

work page
[50]

Available: https://openai.com/blog/chatgpt/

[Online]. Available: https://openai.com/blog/chatgpt/

work page
[51]

Voyager: An Open-Ended Embodied Agent with Large Language Models

G. Wang, Y . Xie, Y . Jiang, A. Mandlekar, C. Xiao, Y . Zhu, L. Fan, and A. Anandkumar, “V oyager: An open-ended embodied agent with large language models,”arXiv preprint arXiv:2305.16291, 2023

work page internal anchor Pith review arXiv 2023
[52]

Generative agents: Interactive simulacra of human behavior,

J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” inProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023, pp. 1–22

work page 2023
[53]

Future-proofing programmers: Optimal knowledge tracing for ai-assisted personalized education,

Y . Wang, P.-D. Yu, and C. W. Tan, “Future-proofing programmers: Optimal knowledge tracing for ai-assisted personalized education,”IEEE Signal Processing Magazine, vol. 43, no. 1, pp. 69–82, 2026

work page 2026
[54]

Large language model-driven classroom flipping: Empowering student-centric peer questioning with flipped interaction,

C. W. Tan, “Large language model-driven classroom flipping: Empowering student-centric peer questioning with flipped interaction,” CoRR, vol. abs/2311.14708, 2023. [Online]. Available: https://arxiv.org/ abs/2311.14708

work page arXiv 2023
[55]

Knowledge, learning and machine intelligence,

D. Michie, “Knowledge, learning and machine intelligence,” inIntelli- gent Systems. Springer, Boston, MA, 1993, pp. 63–79

work page 1993
[56]

Can go AIs be adversarially robust?

T. Tseng, E. McLean, K. Pelrine, T. T. Wang, and A. Gleave, “Can go AIs be adversarially robust?” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 26, 2025, p. 34980

work page 2025
[57]

Adversarial policies beat superhuman go AIs,

T. T. Wang, A. Gleave, T. Tseng, K. Pelrine, N. Belrose, J. Miller, M. D. Dennis, Y . Duan, V . Pogrebniak, S. Levine, and S. Russell, “Adversarial policies beat superhuman go AIs,” inProceedings of the 40th International Conference on Machine Learning, 2023, p. 202

work page 2023
[58]

Data science at the singularity,

D. Donoho, “Data science at the singularity,”Harvard Data Science Review, vol. 6, no. 1, 2024

work page 2024

[1] [1]

Programming a computer for playing chess,

C. E. Shannon, “Programming a computer for playing chess,”Philo- sophical Magazine, vol. 41, pp. 256–275, March 1950

work page 1950

[2] [2]

Computers and automata,

——, “Computers and automata,”Proceedings of the IRE, vol. 41, pp. 1234–1241, 1953

work page 1953

[3] [3]

Why programming is a good medium for expressing poorly understood and sloppily-formulated ideas,

M. Minsky, “Why programming is a good medium for expressing poorly understood and sloppily-formulated ideas,” inDesign and Planning II: Computers in Design and Communication, M. Krampen and P. Seitz, Eds. New York: Hastings House Publishers, 1967, pp. 120–128

work page 1967

[4] [4]

A survey on large language model-based game agents

S. Hu, T. Huang, F. Ilhan, S. Tekin, G. Liu, R. Kompella, and L. Liu, “A survey on large language model-based game agents,”arXiv preprint arXiv:2404.02039, 2024. [Online]. Available: https://arxiv.org/abs/2404.02039

work page arXiv 2024

[5] [5]

PositionPaper: Agent AI Towards a HolisticIntelligence, 2024

Q. Huang, N. Wake, B. Sarkar, Z. Durante, R. Gong, R. Taori, Y . Noda, D. Terzopoulos, N. Kuno, A. Famoti, A. J. Llorens, J. Langford, H. V o, F.-F. Li, K. Ikeuchi, and J. Gao, “Agent ai towards a holistic intelligence,”arXiv preprint arXiv:2403.00833, 2024, accessed: 2024-12-15. [Online]. Available: https://arxiv.org/abs/2403.00833

work page arXiv 2024

[6] [6]

Human-level play in the game of diplomacy by combining language models with strategic reasoning,

M. F. A. R. D. Team, A. Bakhtin, N. Brown, E. Dinan, G. Farina, C. Flaherty, D. Fried, A. Goff, J. Gray, H. Hu, A. P. Jacob, M. Komeili, K. Konath, M. Kwon, A. Lerer, and M. Lewis, “Human-level play in the game of diplomacy by combining language models with strategic reasoning,”Science, vol. 378, pp. 1067–1074, 2022

work page 2022

[7] [7]

LLMs may not be human-level players, but they can be testers: Measuring game difficulty with LLM agents,

C. Xiao and B. Z. Yang, “LLMs may not be human-level players, but they can be testers: Measuring game difficulty with LLM agents,”arXiv preprint arXiv:2410.02829, 2024. [Online]. Available: https://arxiv.org/abs/2410.02829

work page arXiv 2024

[8] [8]

ChessGPT: Bridging policy learning and language modeling,

X. Feng, Y . Luo, Z. Wang, H. Tang, M. Yang, K. Shao, D. Mguni, Y . Du, and J. Wang, “ChessGPT: Bridging policy learning and language modeling,” inAdvances in Neural Information Processing Systems, vol. 36, 2023

work page 2023

[9] [9]

Alphastar: Mastering the real-time strategy game starcraft ii,

DeepMind, “Alphastar: Mastering the real-time strategy game starcraft ii,” 2019, https://deepmind.google/discover/blog/ alphastar-mastering-the-real-time-strategy-game-starcraft-ii

work page 2019

[10] [10]

Suspicion-agent: Playing imperfect information games with theory of mind aware gpt-4,

J. Guo, B. Yang, P. Yoo, B. Y . Lin, Y . Iwasawa, and Y . Matsuo, “Suspicion-agent: Playing imperfect information games with theory of mind aware gpt-4,” inConference on Logic and Machine Learning, 2024

work page 2024

[11] [11]

Chess as a testbed for language model state tracking,

S. Toshniwal, S. Wiseman, K. Livescu, and K. Gimpel, “Chess as a testbed for language model state tracking,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, 2022, pp. 11 385–11 393

work page 2022

[12] [12]

Emergent world representations: Exploring a sequence model trained on a synthetic task,

K. Li, A. K. Hopkins, D. Bau, F. Vi ´egas, H. Pfister, and M. Wattenberg, “Emergent world representations: Exploring a sequence model trained on a synthetic task,” inInternational Conference on Learning Represen- tations, 2023

work page 2023

[13] [13]

Evaluating the world model implicit in a generative model,

K. Vafa, J. Y . Chen, A. Rambachan, J. Kleinberg, and S. Mullainathan, “Evaluating the world model implicit in a generative model,” inAd- vances in Neural Information Processing Systems, vol. 37, 2024, to appear in NeurIPS 2024

work page 2024

[14] [14]

Self-consistency improves chain of thought reasoning in language models,

X. Wang, J. Wei, D. Schuurmans, Q. V . Le, E. H. Chi, S. Narang, A. Chowdhery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” inInternational Conference on Learning Representations, 2023

work page 2023

[15] [15]

ReAct: Synergizing reasoning and acting in language models,

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models,” in International Conference on Learning Representations, 2023

work page 2023

[16] [16]

Graph of thoughts: Solving elaborate problems with large language models,

M. Besta, N. Blach, A. Kubicek, R. Gerstenberger, M. Podstawski, L. Gianinazzi, J. Gajda, T. Lehmann, H. Niewiadomski, P. Nyczyk, and T. Hoefler, “Graph of thoughts: Solving elaborate problems with large language models,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 16, 2024, pp. 17 682–17 690

work page 2024

[17] [17]

SPRING: Studying the paper and reasoning to play games,

Y . Wu, S. Prabhumoye, S. Y . Min, Y . Bisk, R. Salakhutdinov, A. Azaria, T. Mitchell, and Y . Li, “SPRING: Studying the paper and reasoning to play games,” inAdvances in Neural Information Processing Systems, vol. 36, 2023

work page 2023

[18] [18]

Evaluating Large Language Models Trained on Code

M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y . Burda, N. Joseph, G. Brockmanet al., “Evaluating large language models trained on code,”arXiv preprint arXiv:2107.03374, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[19] [19]

Introducing GitHub Copilot: Your AI pair programmer,

N. Friedman, “Introducing GitHub Copilot: Your AI pair programmer,” https://github.com/features/copilot, 2021, accessed: 2023-07-08

work page 2021

[20] [20]

Copilot for Xcode: Exploring AI-assisted programming by prompting cloud-based large language models,

C. W. Tan, S. Guo, M. F. Wong, and C. N. Hang, “Copilot for Xcode: Exploring AI-assisted programming by prompting cloud-based large language models,”arXiv preprint arXiv:2307.14349, 2023

work page arXiv 2023

[21] [21]

From code generation to software testing: AI copilot with context-based RAG,

Y . Wang, S. Guo, and C. W. Tan, “From code generation to software testing: AI copilot with context-based RAG,”IEEE Software, vol. 42, no. 4, pp. 34–42, 2025

work page 2025

[22] [22]

Codex: A coding agent that helps you build and ship with ai—powered by chatgpt,

OpenAI, “Codex: A coding agent that helps you build and ship with ai—powered by chatgpt,” https://openai.com/codex/, 2026, accessed: 2026-03-08

work page 2026

[23] [23]

Claude Code by Anthropic,

Anthropic Claude, “Claude Code by Anthropic,” https://www.anthropic. com/product/claude-code, 2025, accessed: 2026-03-08

work page 2025

[24] [24]

GPT-4 Technical Report

OpenAI, J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altmanet al., “GPT-4 technical report,”arXiv preprint arXiv:2303.08774, 2023. [Online]. Available: https://arxiv.org/abs/2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2023

[25] [26]

Gemini: A Family of Highly Capable Multimodal Models

[Online]. Available: https://arxiv.org/abs/2312.11805

work page internal anchor Pith review Pith/arXiv arXiv

[26] [27]

Contextual augmented multi-model programming (CAMP): A local-cloud copilot solution,

Y . Wang, S. Guo, and C. W. Tan, “Contextual augmented multi-model programming (CAMP): A local-cloud copilot solution,” in2025 IEEE Conference on Artificial Intelligence (CAI). IEEE, 2025, pp. 675–681

work page 2025

[27] [28]

Game playing machines,

C. E. Shannon, “Game playing machines,”Journal of the Franklin Institute, vol. 260, no. 6, pp. 447–453, December 1955

work page 1955

[28] [29]

Goal-oriented interactions in games using LLMs,

A. Phillips, J. Lang, and D. Mould, “Goal-oriented interactions in games using LLMs,”IEEE Transactions on Games, vol. 17, no. 2, pp. 510–521, June 2025

work page 2025

[29] [30]

GamiDOC: The importance of designing gamification in a proper way,

S. Bassanelli, A. Bucchiarone, and F. Gini, “GamiDOC: The importance of designing gamification in a proper way,”IEEE Transactions on Games, vol. 17, no. 1, pp. 13–31, March 2025

work page 2025

[30] [31]

Nemobot: Crafting strategic gaming LLM agents for k-12 AI education,

Y . Wang, S. Guo, L. Ling, and C. W. Tan, “Nemobot: Crafting strategic gaming LLM agents for k-12 AI education,” inProceedings of the Eleventh ACM Conference on Learning@ Scale, 2024, pp. 393–397

work page 2024

[31] [32]

Natural language generation and understanding of big code for AI-assisted programming: A review,

M.-F. Wong, S. Guo, C.-N. Hang, S.-W. Ho, and C. W. Tan, “Natural language generation and understanding of big code for AI-assisted programming: A review,”Entropy, vol. 25, no. 6, p. 888, 2023

work page 2023

[32] [33]

Aligning crowd-sourced human feedback for reinforcement learning on code generation by large language model,

M. F. Wong and C. W. Tan, “Aligning crowd-sourced human feedback for reinforcement learning on code generation by large language model,” inIEEE Transactions on Big Data, to appear, 2024

work page 2024

[33] [34]

A game based on the euclidean algorithm and a winning strategy for it,

A. J. Cole and A. J. T. Davie, “A game based on the euclidean algorithm and a winning strategy for it,”Mathematical Gazette, vol. 53, no. 386, pp. 354–357, 1969

work page 1969

[34] [35]

Mastering the game of go with deep neural networks and tree search,

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershelvam, M. Lanc- totet al., “Mastering the game of go with deep neural networks and tree search,”Nature, vol. 529, pp. 484–489, 2016

work page 2016

[35] [36]

A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,

D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,”Science, vol. 362, no. 6419, pp. 1140–1144, 2018

work page 2018

[36] [38]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

[Online]. Available: https://arxiv.org/abs/2201.11903

work page internal anchor Pith review arXiv

[37] [39]

Reinforcement learning: Connections, surprises, chal- lenges,

A. G. Barto, “Reinforcement learning: Connections, surprises, chal- lenges,”AI Magazine, vol. 40, no. 1, pp. 3–15, 2019

work page 2019

[38] [40]

Boxes: An experiment in adaptive control,

D. Michie and R. A. Chambers, “Boxes: An experiment in adaptive control,”Machine intelligence, vol. 2, no. 2, pp. 137–152, 1968

work page 1968

[39] [41]

Machines and the theory of intelligence,

D. Michie, “Machines and the theory of intelligence,”Nature, vol. 241, no. 23.02, p. 1973, 1973

work page 1973

[40] [42]

Filter, rank, and transfer the knowledge: Learning to chat,

S. Jafarpour and A. R. C. J. C. Burges, “Filter, rank, and transfer the knowledge: Learning to chat,”Advances in Neural Information Processing Systems Workshop on Advances in Ranking, vol. 10, 2010

work page 2010

[41] [43]

’Memo’ functions and machine learning,

D. Michie, “’Memo’ functions and machine learning,”Nature, vol. 218, pp. 19–22, 1968. 13

work page 1968

[42] [44]

Discovery as collaboration: It takes two (at least) to tango,

——, “Discovery as collaboration: It takes two (at least) to tango,” Electronic Transactions on Artificial Intelligence, vol. 4, no. B, pp. 1–19, 2000

work page 2000

[43] [45]

Deep reinforcement learning from human preferences,

P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,”Advances in Neural Information Processing Systems, vol. 30, 2017

work page 2017

[44] [46]

WebGPT: Browser-assisted question-answering with human feedback

R. Nakano, J. Hilton, S. Balaji, J. Wu, L. Ouyang, C. Kim, C. Hesse, S. Jain, V . Kosaraju, W. Saunders, X. Jiang, K. Cobbe, T. Eloundou, G. Krueger, K. Button, M. Knight, B. Chess, and J. Schulman, “WebGPT: Browser-assisted question-answering with human feedback,”CoRR, vol. abs/2112.09332, 2021. [Online]. Available: https://arxiv.org/abs/2112.09332

work page internal anchor Pith review arXiv 2021

[45] [47]

A chatbot-server framework for scalable machine learning education through crowdsourced data,

J. Li, C. W. Tan, C. Hang, and X. Qi, “A chatbot-server framework for scalable machine learning education through crowdsourced data,” in Proceedings of the Ninth ACM Conference on Learning @ Scale (L@S ’22). New York, NY , USA: ACM, 2022

work page 2022

[46] [48]

Kasparov and D

G. Kasparov and D. King,Kasparov Against the World: The Story of the Greatest Online Challenge. New York: KasparovChess Online, Inc., 2000

work page 2000

[47] [49]

Chatgpt: Optimizing language models for dialogue,

OpenAI, “Chatgpt: Optimizing language models for dialogue,” Jan

work page

[48] [50]

Available: https://openai.com/blog/chatgpt/

[Online]. Available: https://openai.com/blog/chatgpt/

work page

[49] [51]

Voyager: An Open-Ended Embodied Agent with Large Language Models

G. Wang, Y . Xie, Y . Jiang, A. Mandlekar, C. Xiao, Y . Zhu, L. Fan, and A. Anandkumar, “V oyager: An open-ended embodied agent with large language models,”arXiv preprint arXiv:2305.16291, 2023

work page internal anchor Pith review arXiv 2023

[50] [52]

Generative agents: Interactive simulacra of human behavior,

J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” inProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023, pp. 1–22

work page 2023

[51] [53]

Future-proofing programmers: Optimal knowledge tracing for ai-assisted personalized education,

Y . Wang, P.-D. Yu, and C. W. Tan, “Future-proofing programmers: Optimal knowledge tracing for ai-assisted personalized education,”IEEE Signal Processing Magazine, vol. 43, no. 1, pp. 69–82, 2026

work page 2026

[52] [54]

Large language model-driven classroom flipping: Empowering student-centric peer questioning with flipped interaction,

C. W. Tan, “Large language model-driven classroom flipping: Empowering student-centric peer questioning with flipped interaction,” CoRR, vol. abs/2311.14708, 2023. [Online]. Available: https://arxiv.org/ abs/2311.14708

work page arXiv 2023

[53] [55]

Knowledge, learning and machine intelligence,

D. Michie, “Knowledge, learning and machine intelligence,” inIntelli- gent Systems. Springer, Boston, MA, 1993, pp. 63–79

work page 1993

[54] [56]

Can go AIs be adversarially robust?

T. Tseng, E. McLean, K. Pelrine, T. T. Wang, and A. Gleave, “Can go AIs be adversarially robust?” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 26, 2025, p. 34980

work page 2025

[55] [57]

Adversarial policies beat superhuman go AIs,

T. T. Wang, A. Gleave, T. Tseng, K. Pelrine, N. Belrose, J. Miller, M. D. Dennis, Y . Duan, V . Pogrebniak, S. Levine, and S. Russell, “Adversarial policies beat superhuman go AIs,” inProceedings of the 40th International Conference on Machine Learning, 2023, p. 202

work page 2023

[56] [58]

Data science at the singularity,

D. Donoho, “Data science at the singularity,”Harvard Data Science Review, vol. 6, no. 1, 2024

work page 2024