Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models
Pith reviewed 2026-05-09 21:25 UTC · model grok-4.3
The pith
Nemobot lets large language models create self-improving AI game agents in four categories.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that Nemobot provides an interactive environment where LLM-powered agents demonstrate self-programming by integrating crowdsourced learning and human creativity to refine their own logic. For different game classes, the agents compress mappings for adaptability, compute optimal strategies with explanations, synthesize from minimax and crowd data, and refine via reinforcement learning and self-critique. This operationalizes an extended version of Shannon's game-playing machine taxonomy and represents progress toward self-programming AI.
What carries the argument
Nemobot, the agentic engineering environment that uses LLMs for tool-augmented generation and fine-tuning to enable agents to self-program strategies in games.
If this is right
- Agents can rapidly adapt to dictionary-based games by generalizing from compressed models.
- Optimal strategies in solvable games come with human-readable decision explanations.
- Strategies in heuristic games improve by merging classical algorithms with crowd insights.
- Learning-based games see iterative refinement through human feedback and self-critique.
- Users gain a programmable space to experiment with and deploy these strategic agents.
Where Pith is reading between the lines
- If the approach scales, it could enable similar self-refinement in non-game domains requiring strategy like negotiations or resource allocation.
- Combining LLMs with traditional algorithms might lower the barrier for creating sophisticated AI systems without deep coding expertise.
- Further tests on real-world interactions could show how crowdsourcing accelerates strategy evolution beyond simulated games.
Load-bearing premise
Large language models are capable of reliably compressing complex state-action mappings, performing accurate mathematical reasoning, synthesizing effective strategies from mixed sources, and self-improving through feedback without introducing critical errors.
What would settle it
A demonstration where the Nemobot agents fail to improve or produce suboptimal strategies in a learning-based game after multiple rounds of self-critique and human feedback would disprove the self-programming capability.
Figures
read the original abstract
This paper introduces a new paradigm for AI game programming, leveraging large language models (LLMs) to extend and operationalize Claude Shannon's taxonomy of game-playing machines. Central to this paradigm is Nemobot, an interactive agentic engineering environment that enables users to create, customize, and deploy LLM-powered game agents while actively engaging with AI-driven strategies. The LLM-based chatbot, integrated within Nemobot, demonstrates its capabilities across four distinct classes of games. For dictionary-based games, it compresses state-action mappings into efficient, generalized models for rapid adaptability. In rigorously solvable games, it employs mathematical reasoning to compute optimal strategies and generates human-readable explanations for its decisions. For heuristic-based games, it synthesizes strategies by combining insights from classical minimax algorithms (see, e.g., shannon1950chess) with crowd-sourced data. Finally, in learning-based games, it utilizes reinforcement learning with human feedback and self-critique to iteratively refine strategies through trial-and-error and imitation learning. Nemobot amplifies this framework by offering a programmable environment where users can experiment with tool-augmented generation and fine-tuning of strategic game agents. From strategic games to role-playing games, Nemobot demonstrates how AI agents can achieve a form of self-programming by integrating crowdsourced learning and human creativity to iteratively refine their own logic. This represents a step toward the long-term goal of self-programming AI.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Nemobot, an interactive LLM-powered agentic environment for creating and customizing strategic game agents. It extends Shannon's game taxonomy by claiming that LLMs can (1) compress state-action mappings for dictionary games, (2) perform mathematical reasoning to derive optimal strategies and explanations in solvable games, (3) synthesize minimax algorithms with crowd-sourced data for heuristic games, and (4) apply reinforcement learning with human feedback and self-critique for iterative refinement in learning games. The central claim is that this framework enables a form of self-programming AI through integration of crowdsourced learning and human creativity.
Significance. If the claimed LLM behaviors were empirically validated with metrics, error rates, and comparisons, the work could offer a useful programmable testbed for studying interactive agent refinement and human-AI co-creation of strategies. The absence of any experimental results, datasets, success rates, or ablation studies means the significance remains speculative.
major comments (3)
- [Abstract] Abstract: The statement that the LLM 'demonstrates its capabilities across four distinct classes of games' is unsupported; the manuscript contains no experimental results, success rates, error analyses, ground-truth comparisons, or case studies for any of the four classes (dictionary-based compression, mathematical reasoning in solvable games, minimax+crowd synthesis, or RL+self-critique).
- [Learning-based games section] Description of learning-based games (final paragraph before conclusion): The claim that the system 'utilizes reinforcement learning with human feedback and self-critique to iteratively refine strategies' is presented without implementation details, reward formulation, critique mechanism, or any reported performance improvement over iterations, rendering the self-programming assertion untestable.
- [Heuristic-based games section] Heuristic-based games paragraph: The assertion that the LLM 'synthesizes strategies by combining insights from classical minimax algorithms with crowd-sourced data' lacks any description of the integration method, data sources, or verification that the resulting strategies outperform pure minimax or pure LLM baselines.
minor comments (2)
- The single citation (shannon1950chess) is given only as an example; the manuscript would benefit from additional references to recent LLM game-playing work (e.g., on tool-augmented agents or self-critique loops) to situate the contribution.
- Terminology such as 'state-action mappings,' 'self-programming,' and 'crowd-sourced data' is used without precise definitions or examples of how these are represented inside the Nemobot environment.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We agree that the manuscript is a conceptual introduction to the Nemobot framework and does not include empirical evaluations, implementation specifics, or performance metrics. We will revise the text to ensure all claims accurately reflect the proposed paradigm without implying completed experiments or validated results. Our responses to the major comments are provided below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The statement that the LLM 'demonstrates its capabilities across four distinct classes of games' is unsupported; the manuscript contains no experimental results, success rates, error analyses, ground-truth comparisons, or case studies for any of the four classes (dictionary-based compression, mathematical reasoning in solvable games, minimax+crowd synthesis, or RL+self-critique).
Authors: We agree that the phrasing in the abstract implies empirical demonstration that is not supported by results in the manuscript. The paper introduces a conceptual framework extending Shannon's taxonomy via the Nemobot environment. We will revise the abstract to replace 'demonstrates its capabilities' with 'is designed to support' or 'outlines proposed applications of' the LLM across the four classes, and add an explicit statement that the work is a framework proposal without experimental validation. revision: yes
-
Referee: [Learning-based games section] Description of learning-based games (final paragraph before conclusion): The claim that the system 'utilizes reinforcement learning with human feedback and self-critique to iteratively refine strategies' is presented without implementation details, reward formulation, critique mechanism, or any reported performance improvement over iterations, rendering the self-programming assertion untestable.
Authors: The referee correctly notes the absence of implementation details. This section describes the intended high-level mechanism for learning-based games within Nemobot, referencing established concepts like RLHF and self-critique. We will revise the paragraph to present this as a proposed integration rather than an implemented feature, add a statement that concrete reward formulations and mechanisms are left for future work, and adjust the self-programming language to 'contributes toward self-programming AI' to reflect the aspirational nature of the claim. revision: yes
-
Referee: [Heuristic-based games section] Heuristic-based games paragraph: The assertion that the LLM 'synthesizes strategies by combining insights from classical minimax algorithms with crowd-sourced data' lacks any description of the integration method, data sources, or verification that the resulting strategies outperform pure minimax or pure LLM baselines.
Authors: We acknowledge that no specific integration method, data sources, or comparative verification is provided. The text describes a conceptual approach for heuristic games in the Nemobot platform. We will revise the paragraph to clarify that the LLM is proposed to interpret and incorporate crowd-sourced insights into minimax evaluations, remove any implication of outperformance or completed synthesis, and frame this as a direction enabled by the framework rather than a demonstrated capability. revision: yes
Circularity Check
No circularity: purely descriptive framework with no derivations or self-referential reductions.
full rationale
The paper introduces Nemobot as an interactive environment and describes LLM behaviors across game classes without any equations, fitted parameters, mathematical derivations, or load-bearing self-citations. Claims about state-action compression, optimal strategy computation, minimax synthesis, and RL refinement are presented as observed capabilities of the system rather than results derived from first principles that reduce to the inputs by construction. The Shannon citation is external and historical, not self-referential. The central narrative of self-programming via crowdsourced learning is a high-level summary of the framework, not a chain that collapses to tautology or prior author work.
Axiom & Free-Parameter Ledger
invented entities (1)
-
Nemobot
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Programming a computer for playing chess,
C. E. Shannon, “Programming a computer for playing chess,”Philo- sophical Magazine, vol. 41, pp. 256–275, March 1950
work page 1950
-
[2]
——, “Computers and automata,”Proceedings of the IRE, vol. 41, pp. 1234–1241, 1953
work page 1953
-
[3]
Why programming is a good medium for expressing poorly understood and sloppily-formulated ideas,
M. Minsky, “Why programming is a good medium for expressing poorly understood and sloppily-formulated ideas,” inDesign and Planning II: Computers in Design and Communication, M. Krampen and P. Seitz, Eds. New York: Hastings House Publishers, 1967, pp. 120–128
work page 1967
-
[4]
A survey on large language model-based game agents
S. Hu, T. Huang, F. Ilhan, S. Tekin, G. Liu, R. Kompella, and L. Liu, “A survey on large language model-based game agents,”arXiv preprint arXiv:2404.02039, 2024. [Online]. Available: https://arxiv.org/abs/2404.02039
-
[5]
PositionPaper: Agent AI Towards a HolisticIntelligence, 2024
Q. Huang, N. Wake, B. Sarkar, Z. Durante, R. Gong, R. Taori, Y . Noda, D. Terzopoulos, N. Kuno, A. Famoti, A. J. Llorens, J. Langford, H. V o, F.-F. Li, K. Ikeuchi, and J. Gao, “Agent ai towards a holistic intelligence,”arXiv preprint arXiv:2403.00833, 2024, accessed: 2024-12-15. [Online]. Available: https://arxiv.org/abs/2403.00833
-
[6]
Human-level play in the game of diplomacy by combining language models with strategic reasoning,
M. F. A. R. D. Team, A. Bakhtin, N. Brown, E. Dinan, G. Farina, C. Flaherty, D. Fried, A. Goff, J. Gray, H. Hu, A. P. Jacob, M. Komeili, K. Konath, M. Kwon, A. Lerer, and M. Lewis, “Human-level play in the game of diplomacy by combining language models with strategic reasoning,”Science, vol. 378, pp. 1067–1074, 2022
work page 2022
-
[7]
C. Xiao and B. Z. Yang, “LLMs may not be human-level players, but they can be testers: Measuring game difficulty with LLM agents,”arXiv preprint arXiv:2410.02829, 2024. [Online]. Available: https://arxiv.org/abs/2410.02829
-
[8]
ChessGPT: Bridging policy learning and language modeling,
X. Feng, Y . Luo, Z. Wang, H. Tang, M. Yang, K. Shao, D. Mguni, Y . Du, and J. Wang, “ChessGPT: Bridging policy learning and language modeling,” inAdvances in Neural Information Processing Systems, vol. 36, 2023
work page 2023
-
[9]
Alphastar: Mastering the real-time strategy game starcraft ii,
DeepMind, “Alphastar: Mastering the real-time strategy game starcraft ii,” 2019, https://deepmind.google/discover/blog/ alphastar-mastering-the-real-time-strategy-game-starcraft-ii
work page 2019
-
[10]
Suspicion-agent: Playing imperfect information games with theory of mind aware gpt-4,
J. Guo, B. Yang, P. Yoo, B. Y . Lin, Y . Iwasawa, and Y . Matsuo, “Suspicion-agent: Playing imperfect information games with theory of mind aware gpt-4,” inConference on Logic and Machine Learning, 2024
work page 2024
-
[11]
Chess as a testbed for language model state tracking,
S. Toshniwal, S. Wiseman, K. Livescu, and K. Gimpel, “Chess as a testbed for language model state tracking,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 36, 2022, pp. 11 385–11 393
work page 2022
-
[12]
Emergent world representations: Exploring a sequence model trained on a synthetic task,
K. Li, A. K. Hopkins, D. Bau, F. Vi ´egas, H. Pfister, and M. Wattenberg, “Emergent world representations: Exploring a sequence model trained on a synthetic task,” inInternational Conference on Learning Represen- tations, 2023
work page 2023
-
[13]
Evaluating the world model implicit in a generative model,
K. Vafa, J. Y . Chen, A. Rambachan, J. Kleinberg, and S. Mullainathan, “Evaluating the world model implicit in a generative model,” inAd- vances in Neural Information Processing Systems, vol. 37, 2024, to appear in NeurIPS 2024
work page 2024
-
[14]
Self-consistency improves chain of thought reasoning in language models,
X. Wang, J. Wei, D. Schuurmans, Q. V . Le, E. H. Chi, S. Narang, A. Chowdhery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” inInternational Conference on Learning Representations, 2023
work page 2023
-
[15]
ReAct: Synergizing reasoning and acting in language models,
S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “ReAct: Synergizing reasoning and acting in language models,” in International Conference on Learning Representations, 2023
work page 2023
-
[16]
Graph of thoughts: Solving elaborate problems with large language models,
M. Besta, N. Blach, A. Kubicek, R. Gerstenberger, M. Podstawski, L. Gianinazzi, J. Gajda, T. Lehmann, H. Niewiadomski, P. Nyczyk, and T. Hoefler, “Graph of thoughts: Solving elaborate problems with large language models,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 16, 2024, pp. 17 682–17 690
work page 2024
-
[17]
SPRING: Studying the paper and reasoning to play games,
Y . Wu, S. Prabhumoye, S. Y . Min, Y . Bisk, R. Salakhutdinov, A. Azaria, T. Mitchell, and Y . Li, “SPRING: Studying the paper and reasoning to play games,” inAdvances in Neural Information Processing Systems, vol. 36, 2023
work page 2023
-
[18]
Evaluating Large Language Models Trained on Code
M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y . Burda, N. Joseph, G. Brockmanet al., “Evaluating large language models trained on code,”arXiv preprint arXiv:2107.03374, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[19]
Introducing GitHub Copilot: Your AI pair programmer,
N. Friedman, “Introducing GitHub Copilot: Your AI pair programmer,” https://github.com/features/copilot, 2021, accessed: 2023-07-08
work page 2021
-
[20]
Copilot for Xcode: Exploring AI-assisted programming by prompting cloud-based large language models,
C. W. Tan, S. Guo, M. F. Wong, and C. N. Hang, “Copilot for Xcode: Exploring AI-assisted programming by prompting cloud-based large language models,”arXiv preprint arXiv:2307.14349, 2023
-
[21]
From code generation to software testing: AI copilot with context-based RAG,
Y . Wang, S. Guo, and C. W. Tan, “From code generation to software testing: AI copilot with context-based RAG,”IEEE Software, vol. 42, no. 4, pp. 34–42, 2025
work page 2025
-
[22]
Codex: A coding agent that helps you build and ship with ai—powered by chatgpt,
OpenAI, “Codex: A coding agent that helps you build and ship with ai—powered by chatgpt,” https://openai.com/codex/, 2026, accessed: 2026-03-08
work page 2026
-
[23]
Anthropic Claude, “Claude Code by Anthropic,” https://www.anthropic. com/product/claude-code, 2025, accessed: 2026-03-08
work page 2025
-
[24]
OpenAI, J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altmanet al., “GPT-4 technical report,”arXiv preprint arXiv:2303.08774, 2023. [Online]. Available: https://arxiv.org/abs/2303.08774
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[26]
Gemini: A Family of Highly Capable Multimodal Models
[Online]. Available: https://arxiv.org/abs/2312.11805
work page internal anchor Pith review Pith/arXiv arXiv
-
[27]
Contextual augmented multi-model programming (CAMP): A local-cloud copilot solution,
Y . Wang, S. Guo, and C. W. Tan, “Contextual augmented multi-model programming (CAMP): A local-cloud copilot solution,” in2025 IEEE Conference on Artificial Intelligence (CAI). IEEE, 2025, pp. 675–681
work page 2025
-
[28]
C. E. Shannon, “Game playing machines,”Journal of the Franklin Institute, vol. 260, no. 6, pp. 447–453, December 1955
work page 1955
-
[29]
Goal-oriented interactions in games using LLMs,
A. Phillips, J. Lang, and D. Mould, “Goal-oriented interactions in games using LLMs,”IEEE Transactions on Games, vol. 17, no. 2, pp. 510–521, June 2025
work page 2025
-
[30]
GamiDOC: The importance of designing gamification in a proper way,
S. Bassanelli, A. Bucchiarone, and F. Gini, “GamiDOC: The importance of designing gamification in a proper way,”IEEE Transactions on Games, vol. 17, no. 1, pp. 13–31, March 2025
work page 2025
-
[31]
Nemobot: Crafting strategic gaming LLM agents for k-12 AI education,
Y . Wang, S. Guo, L. Ling, and C. W. Tan, “Nemobot: Crafting strategic gaming LLM agents for k-12 AI education,” inProceedings of the Eleventh ACM Conference on Learning@ Scale, 2024, pp. 393–397
work page 2024
-
[32]
Natural language generation and understanding of big code for AI-assisted programming: A review,
M.-F. Wong, S. Guo, C.-N. Hang, S.-W. Ho, and C. W. Tan, “Natural language generation and understanding of big code for AI-assisted programming: A review,”Entropy, vol. 25, no. 6, p. 888, 2023
work page 2023
-
[33]
M. F. Wong and C. W. Tan, “Aligning crowd-sourced human feedback for reinforcement learning on code generation by large language model,” inIEEE Transactions on Big Data, to appear, 2024
work page 2024
-
[34]
A game based on the euclidean algorithm and a winning strategy for it,
A. J. Cole and A. J. T. Davie, “A game based on the euclidean algorithm and a winning strategy for it,”Mathematical Gazette, vol. 53, no. 386, pp. 354–357, 1969
work page 1969
-
[35]
Mastering the game of go with deep neural networks and tree search,
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V . Panneershelvam, M. Lanc- totet al., “Mastering the game of go with deep neural networks and tree search,”Nature, vol. 529, pp. 484–489, 2016
work page 2016
-
[36]
A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,
D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. Lillicrap, K. Simonyan, and D. Hassabis, “A general reinforcement learning algorithm that masters chess, shogi, and go through self-play,”Science, vol. 362, no. 6419, pp. 1140–1144, 2018
work page 2018
-
[38]
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
[Online]. Available: https://arxiv.org/abs/2201.11903
work page internal anchor Pith review arXiv
-
[39]
Reinforcement learning: Connections, surprises, chal- lenges,
A. G. Barto, “Reinforcement learning: Connections, surprises, chal- lenges,”AI Magazine, vol. 40, no. 1, pp. 3–15, 2019
work page 2019
-
[40]
Boxes: An experiment in adaptive control,
D. Michie and R. A. Chambers, “Boxes: An experiment in adaptive control,”Machine intelligence, vol. 2, no. 2, pp. 137–152, 1968
work page 1968
-
[41]
Machines and the theory of intelligence,
D. Michie, “Machines and the theory of intelligence,”Nature, vol. 241, no. 23.02, p. 1973, 1973
work page 1973
-
[42]
Filter, rank, and transfer the knowledge: Learning to chat,
S. Jafarpour and A. R. C. J. C. Burges, “Filter, rank, and transfer the knowledge: Learning to chat,”Advances in Neural Information Processing Systems Workshop on Advances in Ranking, vol. 10, 2010
work page 2010
-
[43]
’Memo’ functions and machine learning,
D. Michie, “’Memo’ functions and machine learning,”Nature, vol. 218, pp. 19–22, 1968. 13
work page 1968
-
[44]
Discovery as collaboration: It takes two (at least) to tango,
——, “Discovery as collaboration: It takes two (at least) to tango,” Electronic Transactions on Artificial Intelligence, vol. 4, no. B, pp. 1–19, 2000
work page 2000
-
[45]
Deep reinforcement learning from human preferences,
P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei, “Deep reinforcement learning from human preferences,”Advances in Neural Information Processing Systems, vol. 30, 2017
work page 2017
-
[46]
WebGPT: Browser-assisted question-answering with human feedback
R. Nakano, J. Hilton, S. Balaji, J. Wu, L. Ouyang, C. Kim, C. Hesse, S. Jain, V . Kosaraju, W. Saunders, X. Jiang, K. Cobbe, T. Eloundou, G. Krueger, K. Button, M. Knight, B. Chess, and J. Schulman, “WebGPT: Browser-assisted question-answering with human feedback,”CoRR, vol. abs/2112.09332, 2021. [Online]. Available: https://arxiv.org/abs/2112.09332
work page internal anchor Pith review arXiv 2021
-
[47]
A chatbot-server framework for scalable machine learning education through crowdsourced data,
J. Li, C. W. Tan, C. Hang, and X. Qi, “A chatbot-server framework for scalable machine learning education through crowdsourced data,” in Proceedings of the Ninth ACM Conference on Learning @ Scale (L@S ’22). New York, NY , USA: ACM, 2022
work page 2022
-
[48]
G. Kasparov and D. King,Kasparov Against the World: The Story of the Greatest Online Challenge. New York: KasparovChess Online, Inc., 2000
work page 2000
-
[49]
Chatgpt: Optimizing language models for dialogue,
OpenAI, “Chatgpt: Optimizing language models for dialogue,” Jan
-
[50]
Available: https://openai.com/blog/chatgpt/
[Online]. Available: https://openai.com/blog/chatgpt/
-
[51]
Voyager: An Open-Ended Embodied Agent with Large Language Models
G. Wang, Y . Xie, Y . Jiang, A. Mandlekar, C. Xiao, Y . Zhu, L. Fan, and A. Anandkumar, “V oyager: An open-ended embodied agent with large language models,”arXiv preprint arXiv:2305.16291, 2023
work page internal anchor Pith review arXiv 2023
-
[52]
Generative agents: Interactive simulacra of human behavior,
J. S. Park, J. C. O’Brien, C. J. Cai, M. R. Morris, P. Liang, and M. S. Bernstein, “Generative agents: Interactive simulacra of human behavior,” inProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, 2023, pp. 1–22
work page 2023
-
[53]
Future-proofing programmers: Optimal knowledge tracing for ai-assisted personalized education,
Y . Wang, P.-D. Yu, and C. W. Tan, “Future-proofing programmers: Optimal knowledge tracing for ai-assisted personalized education,”IEEE Signal Processing Magazine, vol. 43, no. 1, pp. 69–82, 2026
work page 2026
-
[54]
C. W. Tan, “Large language model-driven classroom flipping: Empowering student-centric peer questioning with flipped interaction,” CoRR, vol. abs/2311.14708, 2023. [Online]. Available: https://arxiv.org/ abs/2311.14708
-
[55]
Knowledge, learning and machine intelligence,
D. Michie, “Knowledge, learning and machine intelligence,” inIntelli- gent Systems. Springer, Boston, MA, 1993, pp. 63–79
work page 1993
-
[56]
Can go AIs be adversarially robust?
T. Tseng, E. McLean, K. Pelrine, T. T. Wang, and A. Gleave, “Can go AIs be adversarially robust?” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 26, 2025, p. 34980
work page 2025
-
[57]
Adversarial policies beat superhuman go AIs,
T. T. Wang, A. Gleave, T. Tseng, K. Pelrine, N. Belrose, J. Miller, M. D. Dennis, Y . Duan, V . Pogrebniak, S. Levine, and S. Russell, “Adversarial policies beat superhuman go AIs,” inProceedings of the 40th International Conference on Machine Learning, 2023, p. 202
work page 2023
-
[58]
Data science at the singularity,
D. Donoho, “Data science at the singularity,”Harvard Data Science Review, vol. 6, no. 1, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.