A Survey on Large Language Model-Based Game Agents

Fatih Ilhan; Gaowen Liu; Ling Liu; Ramana Rao Kompella; Selim Furkan Tekin; Sihao Hu; Tiansheng Huang; Yichang Xu; Zachary Yahn

arxiv: 2404.02039 · v5 · pith:WZZUKGBBnew · submitted 2024-04-02 · 💻 cs.AI

A Survey on Large Language Model-Based Game Agents

Sihao Hu , Tiansheng Huang , Gaowen Liu , Ramana Rao Kompella , Fatih Ilhan , Selim Furkan Tekin , Yichang Xu , Zachary Yahn

show 1 more author

Ling Liu

This is my paper

classification 💻 cs.AI

keywords gameagentslanguageenvironmentslargelevelmemorymodels

0 comments

read the original abstract

Game environments provide rich, controllable settings that stimulate many aspects of real-world complexity. As such, game agents offer a valuable testbed for exploring capabilities relevant to Artificial General Intelligence. Recently, the emergence of Large Language Models (LLMs) provides new opportunities to endow these agents with generalizable reasoning, memory, and adaptability in complex game environments. This survey offers an up-to-date review of LLM-based game agents (LLMGAs) through a unified reference architecture. At the single-agent level, we synthesize existing studies around three core components: memory, reasoning, and perception-action interfaces, which jointly characterize how language enables agents to perceive, think, and act. At the multi-agent level, we outline how communication protocols and organizational models support coordination, role differentiation, and large-scale social behaviors. To contextualize these designs, we introduce a challenge-centered taxonomy linking six major game genres to their dominant agent requirements, from low-latency control in action games to open-ended goal formation in sandbox worlds. A curated list of related papers is available at https://github.com/git-disl/awesome-LLM-game-agent-papers

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 13 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Scale-Dependent Collective Adaptation in Self-Amending LLM Societies: A Cross-Family Study of Emergent Governance
nlin.AO 2026-05 unverdicted novelty 7.0

LLM societies in Nomic show non-monotonic collective adaptation peaking at mid-scales, with smaller models rule-inert and larger ones restrictive.
Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks
cs.AI 2026-04 unverdicted novelty 7.0

COSPLAY co-evolves an LLM decision agent with a skill bank agent to improve long-horizon game performance, reporting over 25.1% average reward gains versus frontier LLM baselines on single-player benchmarks.
Open-Ended Video Game Glitch Detection with Agentic Reasoning and Temporal Grounding
cs.MA 2026-04 unverdicted novelty 7.0

Introduces the first benchmark for open-ended video game glitch detection with temporal localization and proposes GliDe, an agentic framework that achieves stronger performance than vanilla multimodal models.
Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games
cs.AI 2025-06 unverdicted novelty 7.0

Orak is a foundational benchmark providing training data, interfaces, and evaluation tools for LLM agents across diverse video game genres.
A Generative AI Driven Interactive Narrative Serious Game for Stress Relief and Its Randomized Controlled Pilot Study
cs.HC 2026-05 unverdicted novelty 5.0

Reverie is a new AI-powered game that reduced stress levels in a pilot study of 20 students while providing excellent user experience and improved cognitive emotion regulation.
A Generative AI Driven Interactive Narrative Serious Game for Stress Relief and Its Randomized Controlled Pilot Study
cs.HC 2026-05 unverdicted novelty 5.0

Pilot study of a ChatGPT-driven narrative game found significant stress reduction (p=0.016) and positive user experience among 20 stressed students.
IPR-1: Interactive Physical Reasoner
cs.AI 2025-11 unverdicted novelty 5.0

IPR uses world-model rollouts to reinforce a VLM policy via PhysCode on a 1000+ game benchmark, achieving robust physical reasoning that improves with experience and transfers zero-shot to unseen games while surpassing GPT-5.
In Context Learning and Reasoning for Symbolic Regression with Large Language Models
cs.CL 2024-10 unverdicted novelty 5.0

GPT-4 models rediscover Langmuir isotherms and produce fits on Nikuradse pipe-flow data via iterative chain-of-thought prompting with scientific context and external code feedback.
Nemobot Games: Crafting Strategic AI Gaming Agents for Interactive Learning with Large Language Models
cs.AI 2026-04 unverdicted novelty 4.0

Nemobot is an LLM-powered platform for creating and refining strategic game agents across dictionary, solvable, heuristic, and learning-based games, moving toward self-programming AI.
When control meets large language models: From words to dynamics
eess.SY 2026-02 unverdicted novelty 3.0

The paper proposes a bidirectional continuum between LLMs and control systems, covering LLM-assisted controller design, control-based LLM steering, and state-space modeling of LLMs.
Large Language Model Agent: A Survey on Methodology, Applications and Challenges
cs.CL 2025-03 accept novelty 3.0

A survey that deconstructs LLM agent systems via a methodology-centered taxonomy linking design principles to emergent behaviors, applications, and challenges.
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security
cs.HC 2024-01 unverdicted novelty 3.0

This survey discusses key components and challenges for Personal LLM Agents and reviews solutions for their capability, efficiency, and security.
Harmful Fine-tuning Attacks and Defenses for Large Language Models: A Survey
cs.CR 2024-09 unverdicted novelty 2.0

Survey of harmful fine-tuning attacks on LLMs, their variants, defense strategies, mechanical analysis, and evaluation methodologies.