Recognition: 3 theorem links
· Lean TheoremGenerative Agents: Interactive Simulacra of Human Behavior
Pith reviewed 2026-05-11 18:58 UTC · model grok-4.3
The pith
Generative agents extend large language models with memory, reflection, and planning to produce believable individual and social behaviors in a simulated town.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Generative agents are software agents that simulate believable human behavior by extending a large language model to maintain a full record of experiences in natural language, synthesize those records into periodic reflections, and retrieve memories dynamically when planning the next actions. When placed in an interactive environment modeled on The Sims, twenty-five such agents exhibit consistent daily activities and produce unscripted social patterns, including autonomously organizing a multi-day party from a single starting instruction.
What carries the argument
The three-part architecture of observation (storing experiences), reflection (synthesizing memories into higher-level summaries), and planning (retrieving relevant memories to decide actions) layered on top of a large language model.
If this is right
- End users can steer the simulation through ordinary natural-language instructions rather than code-level scripting.
- Individual agents sustain believable routines and social awareness over multi-day periods through accumulated memory.
- Social events and relationships can emerge at the group level without any agent being explicitly told the full plan.
- Removing any one of the three components (observation, reflection, or planning) measurably reduces the believability of agent actions.
- The same architecture can be applied to other interactive domains such as communication rehearsal or design prototyping.
Where Pith is reading between the lines
- Simulations of this kind could support safer testing of social policies or training scenarios before real-world deployment.
- Scaling the approach to hundreds of agents or weeks of simulated time may require additional mechanisms to prevent memory drift.
- The method offers a concrete testbed for studying how language-model consistency affects perceived human-likeness in longer interactions.
- Similar memory-and-reflection loops could be added to other agent systems to improve long-term coherence in everyday assistant tasks.
Load-bearing premise
Large language models will keep producing coherent, non-contradictory actions and dialogues that stay consistent with an agent's growing memory and personality across many simulated days without outside fixes.
What would settle it
A run of the town simulation in which agents begin to issue contradictory statements about past events or fail to coordinate attendance at the party despite the initial prompt, resulting in visibly incoherent group behavior.
read the original abstract
Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces generative agents: LLM-augmented software agents that maintain a natural-language memory stream of experiences, synthesize reflections, and retrieve context to plan actions. These agents are placed in a Sims-inspired sandbox with 25 agents; the central claim is that they produce believable individual and emergent social behaviors, illustrated by a scenario in which a single user-specified desire to host a Valentine's Day party leads to autonomous invitation spreading, new acquaintances, date invitations, and coordinated attendance over two simulated days. Ablation experiments and a human evaluation with 100 participants are presented to show that observation, planning, and reflection each contribute to believability.
Significance. If the results hold, the work offers a concrete architectural pattern for believable interactive simulacra with clear applications in virtual environments, social rehearsal tools, and HCI prototyping. The explicit use of natural-language memory and reflection to enable emergent multi-agent coordination without hand-crafted rules is a notable strength, and the ablations provide direct evidence that each component matters. The absence of fitted parameters or circular evaluation metrics further supports the claim's internal validity.
major comments (2)
- [§5] §5 (Evaluation): The human evaluation and ablation results are tied to a single small-scale scenario (25 agents, ~2 days). While this suffices to demonstrate the Valentine's party example, it leaves the general claim of 'believable individual and emergent social behaviors' under-supported; additional independent scenarios or quantitative metrics of behavioral diversity would be needed to establish robustness.
- [§4, §5] §4 (Architecture) and §5: The architecture stores experiences as text, retrieves by embedding similarity, and generates plans/reflections via single LLM calls, yet no mechanism or metric is reported for detecting or recovering from contradictions between new outputs and prior memories. The ablations show reflection improves the target behavior, but the paper does not measure contradiction rate or long-term consistency across the multi-day run, which is load-bearing for the autonomous coordination claim.
minor comments (2)
- The exact prompts, retrieval count, and reflection interval values used in the reported runs should be listed explicitly (perhaps in an appendix) to aid reproducibility.
- Figure 2 or the environment description: clarify how the sandbox renders agent observations and handles concurrent actions to make the interaction loop fully transparent.
Simulated Author's Rebuttal
We thank the referee for the positive summary, significance assessment, and recommendation for minor revision. We address the two major comments point by point below, proposing targeted textual revisions that strengthen the manuscript while remaining within the scope of a minor revision.
read point-by-point responses
-
Referee: [§5] §5 (Evaluation): The human evaluation and ablation results are tied to a single small-scale scenario (25 agents, ~2 days). While this suffices to demonstrate the Valentine's party example, it leaves the general claim of 'believable individual and emergent social behaviors' under-supported; additional independent scenarios or quantitative metrics of behavioral diversity would be needed to establish robustness.
Authors: We agree that the primary evaluation and human study are anchored to one extended scenario. This scenario was chosen because it produces a clear, observable chain of emergent social behaviors (invitation spreading, new acquaintances, date invitations, and coordinated attendance) that can be directly attributed to the architecture rather than hand-crafted rules. The ablation results and the 100-participant human evaluation provide direct evidence that observation, planning, and reflection each improve believability within this setting. In the revised manuscript we will (1) add brief qualitative descriptions of other observed behaviors from the same sandbox runs (e.g., daily routines, opinion formation, and spontaneous conversations) to illustrate individual believability beyond the party, and (2) expand the §5 discussion to explicitly state the evaluation's scope and to identify the collection of additional independent scenarios and quantitative diversity metrics as valuable future work. These changes clarify the current evidence without introducing new experiments. revision: partial
-
Referee: [§4, §5] §4 (Architecture) and §5: The architecture stores experiences as text, retrieves by embedding similarity, and generates plans/reflections via single LLM calls, yet no mechanism or metric is reported for detecting or recovering from contradictions between new outputs and prior memories. The ablations show reflection improves the target behavior, but the paper does not measure contradiction rate or long-term consistency across the multi-day run, which is load-bearing for the autonomous coordination claim.
Authors: We acknowledge that the architecture contains no explicit contradiction-detection or consistency-enforcement module; coherence is delegated to the LLM's conditioning on retrieved memories and to the synthesis performed during reflection. In the two-day runs we observed that reflection helped agents maintain coherent plans that supported the observed coordination, but we did not compute contradiction rates or long-term consistency statistics. In the revision we will (1) add a short paragraph in §4 describing how memory retrieval and reflection are intended to promote consistency in practice, and (2) insert a limitations subsection in §5 that notes the lack of quantitative consistency metrics and lists explicit contradiction checking as a promising direction for future agent architectures. These additions directly address the referee's concern while accurately reflecting what was implemented and measured. revision: partial
Circularity Check
No circularity: architecture and qualitative demonstration are self-contained
full rationale
The paper defines an agent architecture (memory storage as natural-language records, LLM-based reflection synthesis, embedding retrieval for planning) and demonstrates emergent behaviors via a single run in a 25-agent sandbox. No equations, fitted parameters, or quantitative predictions are claimed; the Valentine's Day party example is an observed output of the described LLM calls rather than a reduction to prior data or self-citation. Ablations compare component removals but remain empirical observations of the same generative process. The derivation chain contains no self-definitional, fitted-input, or uniqueness-imported steps.
Axiom & Free-Parameter Ledger
free parameters (2)
- reflection interval
- retrieval count
axioms (1)
- domain assumption Large language models can produce coherent, personality-consistent plans and reflections when given a natural-language memory context.
invented entities (2)
-
Memory stream
no independent evidence
-
Reflection mechanism
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith.Foundation.HierarchyEmergencehierarchy_emergence_forces_phi unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 52 Pith papers
-
SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning
SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.
-
SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning
SimWorld Studio uses a self-evolving coding agent to generate adaptive 3D environments that improve embodied agent performance, with reported gains of 18 points over fixed environments in navigation tasks.
-
OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation
OccuBench is a new benchmark for AI agents on real-world occupational tasks via LLM-driven simulators, showing no model dominates all industries, implicit faults are hardest, and larger models with more reasoning perf...
-
AgentSocialBench: Evaluating Privacy Risks in Human-Centered Agentic Social Networks
AgentSocialBench demonstrates that privacy preservation is fundamentally harder in human-centered agentic social networks than in single-agent cases due to cross-domain coordination pressures and an abstraction parado...
-
Why Do Multi-Agent LLM Systems Fail?
The authors create the first large-scale dataset and taxonomy of failure modes in multi-agent LLM systems to explain their limited performance gains.
-
ScioMind: Cognitively Grounded Multi-Agent Social Simulation with Anchoring-Based Belief Dynamics and Dynamic Profiles
ScioMind combines anchoring-based belief updates, hierarchical memory, and dynamic profiles in LLM multi-agent systems to produce more stable, diverse, and psychologically aligned opinion trajectories than prior fixed...
-
MMSkills: Towards Multimodal Skills for General Visual Agents
MMSkills creates compact multimodal skill packages from trajectories and uses a branch-loaded agent to improve visual decision-making on GUI and game benchmarks.
-
Mechanism Plausibility in Generative Agent-Based Modeling
Introduces the Mechanism Plausibility Scale to distinguish generative sufficiency from mechanistic plausibility in LLM-based agent-based models.
-
Internal vs. External: Comparing Deliberation and Evolution for Multi-Agent Constitutional Design
External evolution beats internal deliberation in collective-action tasks with statistical significance but neither helps in trading, and deliberation never discovers punishment while evolution does.
-
NARRA-Gym for Evaluating Interactive Narrative Agents
NARRA-Gym is an executable benchmark that generates complete interactive narrative episodes from emotional seeds and logs full model trajectories to expose gaps in coherence, adaptation, and personalization that stati...
-
Agent Island: A Saturation- and Contamination-Resistant Benchmark from Multiagent Games
Agent Island is a new multiagent game environment that functions as a dynamic benchmark resistant to saturation and contamination, with Bayesian ranking showing OpenAI GPT-5.5 as the strongest performer among 49 model...
-
A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework
A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.
-
The Platform Is Mostly Not a Platform: Token Economies and Agent Discourse on Moltbook
Moltbook operates as two largely separate layers: a dominant transactional token economy using protocols like MBC-20 and a thinner discursive conversation layer with only 3.6% agent overlap.
-
EMBER: Autonomous Cognitive Behaviour from Learned Spiking Neural Network Dynamics in a Hybrid LLM Architecture
A hybrid SNN-LLM system uses learned spiking dynamics and lateral STDP propagation to trigger LLM actions without external prompts, producing the first autonomous action after 7 exchanges from a clean start.
-
Strategic Persuasion with Trait-Conditioned Multi-Agent Systems for Iterative Legal Argumentation
Multi-agent LLM simulations with trait-conditioned agents and a reinforcement-learning orchestrator show heterogeneous teams and dynamic trait selection outperform static configurations in simulated legal argumentation.
-
$\tau$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
τ-bench shows state-of-the-art agents like GPT-4o succeed on under 50% of tool-using, rule-following tasks and are inconsistent across repeated trials.
-
Voyager: An Open-Ended Embodied Agent with Large Language Models
Voyager achieves superior lifelong learning in Minecraft by combining an automatic exploration curriculum, a library of executable skills, and iterative LLM prompting with environment feedback, yielding 3.3x more uniq...
-
Reflexion: Language Agents with Verbal Reinforcement Learning
Reflexion lets LLM agents improve via stored verbal reflections on task feedback, reaching 91% pass@1 on HumanEval and outperforming prior GPT-4 results.
-
MMSkills: Towards Multimodal Skills for General Visual Agents
MMSkills turns public interaction trajectories into compact multimodal skill packages that visual agents can consult at runtime to improve decision-making on benchmarks.
-
CA2: Code-Aware Agent for Automated Game Testing
CA2 integrates call stack information into RL agents for game testing and shows consistent gains over baselines that ignore code signals.
-
CHAL: Council of Hierarchical Agentic Language
CHAL is a multi-agent dialectic system that performs structured belief optimization over defeasible domains using Bayesian-inspired graph representations and configurable meta-cognitive value system hyperparameters.
-
Positive Alignment: Artificial Intelligence for Human Flourishing
Positive Alignment introduces AI systems that support human flourishing pluralistically and proactively while remaining safe, as a necessary complement to traditional safety-focused alignment research.
-
Workspace Optimization: How to Train Your Agent
Workspace optimization evolves an agent's external workspace using multi-agent systems, with DreamTeam raising ARC-AGI-3 scores from 36% to 38.4% while using 31% fewer actions.
-
OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces
OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.
-
LoopTrap: Termination Poisoning Attacks on LLM Agents
LoopTrap is an automated red-teaming framework that crafts termination-poisoning prompts to amplify LLM agent steps by 3.57x on average (up to 25x) across 8 agents.
-
Agentic Coding Needs Proactivity, Not Just Autonomy
Coding agents require a three-level proactivity taxonomy (Reactive, Scheduled, Situation Aware) evaluated by insight policy quality using Insight Decision Quality, Context Grounding Score, and Learning Lift.
-
A Meta Reinforcement Learning Approach to Goals-Based Wealth Management
MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.
-
Self-Adaptive Multi-Agent LLM-Based Security Pattern Selection for IoT Systems
ASPO combines multi-agent LLM proposals with deterministic enforcement in a MAPE-K loop to select conflict-free, resource-feasible security patterns for IoT, delivering 100% safety invariants and 21-23% tail latency/e...
-
The Pragmatic Persona: Discovering LLM Persona through Bridging Inference
Modeling LLM dialogues as bridging-inference knowledge graphs reveals more stable and coherent personas than traditional lexical or stylistic analysis methods.
-
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence
Agent-World autonomously synthesizes verifiable real-world tasks and uses continuous self-evolution to train 8B and 14B agents that outperform proprietary models on 23 benchmarks.
-
GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization (V1.0)
GenericAgent outperforms other LLM agents on long-horizon tasks by maximizing context information density with fewer tokens via minimal tools, on-demand memory, trajectory-to-SOP evolution, and compression.
-
Human Cognition in Machines: A Unified Perspective of World Models
The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and pro...
-
Auditable Agents
No agent system can be accountable without auditability, which requires five dimensions (action recoverability, lifecycle coverage, policy checkability, responsibility attribution, evidence integrity) and mechanisms f...
-
Towards Automated Crowdsourced Testing via Personified-LLM
PersonaTester uses LLMs guided by three-dimensional personas to replicate crowdworker testing patterns, yielding higher behavioral consistency, variability, and more bug detections than baseline LLM agents.
-
SoK: Agentic Skills -- Beyond Tool Use in LLM Agents
The paper systematizes agentic skills beyond tool use, providing design pattern and representation-scope taxonomies plus security analysis of malicious skill infiltration in agent marketplaces.
-
MemGPT: Towards LLMs as Operating Systems
MemGPT uses OS-inspired virtual context management to extend LLM context windows for large document analysis and long-term multi-session chat.
-
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Multi-agent debate among LLMs yields more reliable text evaluations than single-agent prompting by simulating collaborative human judgment.
-
Good to Go: The LOOP Skill Engine That Hits 99% Success and Slashes Token Usage by 99% via One-Shot Recording and Deterministic Replay
The LOOP Skill Engine records one LLM-powered run of a periodic task and converts it into a deterministic replay template that eliminates further LLM usage while maintaining high success rates.
-
Beyond Inefficiency: Systemic Costs of Incivility in Multi-Agent Monte Carlo Simulations
Monte Carlo simulations of LLM agents confirm that toxic debates take 25% longer to converge, with larger delays in smaller models, and show a first-mover advantage independent of toxicity.
-
MIRAGE: A Micro-Interaction Relational Architecture for Grounded Exploration in Multi-Figure Artworks
MIRAGE improves VLM analysis of multi-figure art by inserting a verifiable structured representation of micro-interactions between spatial grounding and narrative output.
-
Mesh Memory Protocol: Semantic Infrastructure for Multi-Agent LLM Systems
MMP defines a seven-field CMB schema, role-based SVAF evaluation, content-hash lineage, and remix storage to enable traceable cross-session collaboration among autonomous LLM agents.
-
Agentic Copyright, Data Scraping & AI Governance: Toward a Coasean Bargain in the Era of Artificial Intelligence
The paper introduces agentic copyright and a supervised multi-agent governance framework to manage large-scale AI-mediated copyright transactions and restore efficient market ordering in creative industries.
-
MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents
MemMachine stores entire conversational episodes and applies contextualized retrieval plus adaptive query routing to achieve 0.9169 accuracy on LoCoMo and 93 percent on LongMemEvalS while using 80 percent fewer tokens...
-
EconAI: Dynamic Persona Evolution and Memory-Aware Agents in Evolving Economic Environments
EconAI adds memory weighting and economic sentiment indexing to LLM agents so they adapt short-term actions to long-term goals inside a single macro/micro simulation loop.
-
Positive Alignment: Artificial Intelligence for Human Flourishing
Positive Alignment is introduced as a distinct AI agenda that supports human flourishing through pluralistic and context-sensitive design, complementing traditional safety-focused alignment.
-
Behavioral Determinants of Deployed AI Agents in Social Networks: A Multi-Factor Study of Personality, Model, and Guardrail Specification
Personality specifications dominate AI agent social behaviors such as response length more than model choice or operational rules in a controlled deployment study.
-
Behavioral Determinants of Deployed AI Agents in Social Networks: A Multi-Factor Study of Personality, Model, and Guardrail Specification
Personality specifications dominate emergent social behaviors of AI agents in simulated networks, producing large variations in response length while model and rule changes yield moderate shifts in style and topic breadth.
-
A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications
The paper surveys agent skills for LLM agents, organizing the literature into a four-stage lifecycle of representation, acquisition, retrieval, and evolution while highlighting their role in system scalability.
-
Multi-Agent Consensus as a Cognitive Bias Trigger in Human-AI Interaction
Majority consensus among AI agents speeds up human opinion change and raises confidence via social proof, while minority dissent slows it and encourages more deliberation, based on an experiment comparing three agent ...
-
Memory as Metabolism: A Design for Companion Knowledge Systems
This paper designs a companion knowledge system with TRIAGE, DECAY, CONTEXTUALIZE, CONSOLIDATE, and AUDIT operations plus memory gravity and minority-hypothesis retention to give contradictory evidence a path to updat...
-
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
The paper surveys LLM-based multi-agent systems, covering simulated domains, agent profiling and communication, mechanisms for capacity growth, and common benchmarks.
-
The Rise and Potential of Large Language Model Based Agents: A Survey
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.
Reference graph
Works this paper leans on
- [1]
-
[2]
Robert Ackland, Jamsheed Shorish, Paul Thomas, and Lexing Xie. 2013. How dense is a network? http://users.cecs.anu.edu.au/~xlx/teaching/css2013/ network-density.html
work page 2013
-
[3]
Eytan Adar, Mira Dontcheva, and Gierad Laput. 2014. CommandSpace: Modeling the Relationships between Tasks, Descriptions and Features. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (Honolulu, Hawaii, USA) (UIST ’14). Association for Computing Machinery, New York, NY, USA, 167–176. https://doi.org/10.1145/2642918.2647395
-
[4]
Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza
-
[5]
AI Magazine 35, 4 (2014), 105–120
Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105–120
work page 2014
-
[6]
Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, et al. 2019. Guidelines for human-AI interaction. In Proceedings of the 2019 chi conference on human factors in computing systems . 1–13
work page 2019
- [7]
-
[8]
Electronic Arts. 2009. The Sims 3. Video game
work page 2009
-
[9]
Ruth Aylett. 1999. Narrative in virtual environments—towards emergent narra- tive. In Narrative Intelligence: Papers from the AAAI Fall Symposium (Technical Report FS-99-01). AAAI Press, 83–86
work page 1999
-
[10]
Christoph Bartneck and Jodi Forlizzi. 2004. A design-centered framework for social human-robot interaction. In Proceedings of the 13th IEEE International Workshop on Robot and Human Interactive Communication (RO-MAN’04). 591–
work page 2004
-
[11]
https://doi.org/10.1109/ROMAN.2004.1374827
-
[12]
Joseph Bates. 1994. The Role of Emotion in Believable Agents. Commun. ACM 37, 7 (1994), 122–125. https://doi.org/10.1145/176789.176803
-
[13]
Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique P. d.O. Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, J...
work page internal anchor Pith review arXiv 2019
-
[14]
Marcel Binz and Eric Schulz. 2023. Using cognitive psychology to under- stand GPT-3. Proceedings of the National Academy of Sciences 120, 6 (2023), e2218523120
work page 2023
-
[15]
BioWare. 2007. Mass Effect. Video game
work page 2007
-
[16]
Woody Bledsoe. 1986. I had a dream: AAAI presidential address. AI Magazine 7, 1 (1986), 57–61
work page 1986
-
[17]
On the Opportunities and Risks of Foundation Models
Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, and et al. 2022. On the Opportunities and Risks of Foundation Models. arXiv:2108.07258 [cs.LG]
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[18]
Michael Brenner. 2010. Creating dynamic story plots with continual multiagent planning. In Proceedings of the 24th AAAI Conference on Artificial Intelligence
work page 2010
-
[19]
Brooks, Cynthia Breazeal, Marko Marjanovic, Brian Scassellati, and Matthew Williamson
Rodney A. Brooks, Cynthia Breazeal, Marko Marjanovic, Brian Scassellati, and Matthew Williamson. 2000. The Cog Project: Building a Humanoid Robot. In Computation for Metaphors, Analogy, and Agents (Lecture Notes on Artificial Intelligence, 1562), Chrystopher Nehaniv (Ed.). Springer-Verlag, Berlin, 52–87
work page 2000
-
[20]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[21]
Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al
-
[22]
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[23]
Robin Burkinshaw. 2009. Alice and Kev: The Story of Being Homeless in The Sims 3
work page 2009
-
[24]
Chris Callison-Burch, Gaurav Singh Tomar, Lara Martin, Daphne Ippolito, Suma Bailis, and David Reitter. 2022. Dungeons and Dragons as a Dialog Challenge for Artificial Intelligence. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 9379–939...
work page 2022
-
[25]
Stuart K Card, Thomas P Moran, and Allen Newell. 1980. The keystroke- level model for user performance time with interactive systems. Com- mun. ACM 23, 7 (1980), 396–410. https://doi.org/10.1145/358886.358895 arXiv:https://doi.org/10.1145/358886.358895
-
[26]
Stuart K Card, Thomas P Moran, and Alan Newell. 1983. The psychology of human-computer interaction. (1983)
work page 1983
-
[27]
Alex Champandard. 2012. Tutorial presentation. In IEEE Conference on Compu- tational Intelligence and Games
work page 2012
-
[28]
Dong kyu Choi, Tolga Konik, Negin Nejati, Chunki Park, and Pat Langley. 2021. A Believable Agent for First-Person Shooter Games. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment , Vol. 3. 71–73
work page 2021
-
[29]
Anind K Dey. 2001. Understanding and using context. Personal and ubiquitous computing 5 (2001), 4–7
work page 2001
-
[30]
Kevin Dill and L Martin. 2011. A Game AI Approach to Autonomous Con- trol of Virtual Characters. In Proceedings of the Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC’11). Orlando, FL, USA
work page 2011
-
[31]
David Easley and Jon Kleinberg. 2010. Networks, crowds, and markets: Reasoning about a highly connected world . Cambridge university press
work page 2010
-
[32]
Arpad E Elo. 1967. The Proposed USCF Rating System, Its Development, Theory, and Applications. Chess Life XXII, 8 (August 1967), 242–247
work page 1967
-
[33]
Jerry Alan Fails and Dan R Olsen Jr. 2003. Interactive machine learning. In Proceedings of the 8th international conference on Intelligent user interfaces . ACM, 39–45
work page 2003
-
[34]
Ethan Fast, William McGrath, Pranav Rajpurkar, and Michael S Bernstein. 2016. Augur: Mining human behaviors from fiction to power interactive systems. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems . 237–247
work page 2016
-
[35]
Rebecca Fiebrink and Perry R Cook. 2010. The Wekinator: a system for real-time, interactive machine learning in music. In Proceedings of The Eleventh Interna- tional Society for Music Information Retrieval Conference (ISMIR 2010)(Utrecht) , Vol. 3. Citeseer, 2–1
work page 2010
-
[36]
Uwe Flick. 2009. An Introduction to Qualitative Research . SAGE
work page 2009
-
[37]
James Fogarty, Desney Tan, Ashish Kapoor, and Simon Winder. 2008. CueFlik: Interactive Concept Learning in Image Search. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Florence, Italy) (CHI ’08). Association for Computing Machinery, New York, NY, USA, 29–38. https: //doi.org/10.1145/1357054.1357061
-
[38]
Adam Fourney, Richard Mann, and Michael Terry. 2011. Query-feature graphs: bridging user vocabulary and system functionality. In Proceedings of the ACM Symposium on User Interface Software and Technology (UIST) (Santa Barbara, California, USA). ACM
work page 2011
-
[39]
Tom Francis. 2010. The Minecraft Experiment, day 1: Chasing Water- falls. http://www.pcgamer.com/2010/11/20/the-minecraft-experiment-day- 1-chasing-waterfalls/
work page 2010
-
[40]
Jonas Freiknecht and Wolfgang Effelsberg. 2020. Procedural Generation of Interactive Stories using Language Models. In International Conference on the Foundations of Digital Games (FDG ’20). ACM, Bugibba, Malta, 8. https://doi. org/10.1145/3402942.3409599
- [41]
-
[42]
Perttu Hämäläinen, Mikke Tavast, and Anton Kunnari. 2023. Evaluating Large Language Models in Generating Synthetic HCI Research Data: a Case Study. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems . ACM
work page 2023
-
[43]
Matthew Hausknecht, Prithviraj Ammanabrolu, Marc-Alexandre Cote, and Xinyu Yuan. 2020. Interactive Fiction Games: A Colossal Adventure. In Pro- ceedings of the AAAI Conference on Artificial Intelligence , Vol. 34. 7903–7910. https://doi.org/10.1609/aaai.v34i05.6297
-
[44]
Chris Hecker. 2011. My Liner Notes for Spore . http://chrishecker.com/My_liner_ notes_for_spore
work page 2011
-
[45]
Ralf Herbrich, Tom Minka, and Thore Graepel. 2006. TrueSkill ™ : A Bayesian Skill Rating System. In Advances in Neural Information Pro- cessing Systems , B. Schölkopf, J. Platt, and T. Hoffman (Eds.), Vol. 19. MIT Press. https://proceedings.neurips.cc/paper_files/paper/2006/file/ f44ee263952e65b3610b8ba51229d1f9-Paper.pdf
work page 2006
-
[46]
Douglas Hofstadter. 1995. Fluid concepts and creative analogies: computer models of the fundamental mechanisms of thought . Basic Books
work page 1995
-
[47]
James D. Hollan, Edwin L. Hutchins, and Louis Weitzman. 1984. STEAMER: An Interactive Inspectable Simulation-Based Training System. AI Magazine 5, 2 (1984), 23–36
work page 1984
-
[48]
Sture Holm. 1979. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 2 (1979), 65–70. https://doi.org/notspecified
work page 1979
- [49]
-
[50]
Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems . 159–166
work page 1999
-
[51]
Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Ser- manet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, and Brian Ichter. 2022. Inner Monologue: Embodied Reasoning through Planning with Language Models. arXiv:2207.05608 [cs.RO]
work page internal anchor Pith review arXiv 2022
-
[52]
Kristen Ibister and Clifford Nass. 2000. Consistency of personality in interactive characters: verbal cues, non-verbal cues, and user characteristics. International Journal of Human-Computer Studies 52, 1 (2000), 65–80
work page 2000
-
[53]
Ellen Jiang, Kristen Olson, Edwin Toh, Alejandra Molina, Aaron Donsbach, Michael Terry, and Carrie J Cai. 2022. PromptMaker: Prompt-Based Prototyping with Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New York, NY, USA,...
-
[54]
Bonnie E John and David E Kieras. 1996. The GOMS family of user interface analysis techniques: Comparison and contrast. ACM Transactions on Computer- Human Interaction (TOCHI) 3, 4 (1996), 320–351
work page 1996
-
[55]
Randolph M Jones, John E Laird, Paul E Nielsen, Karen J Coulter, Patrick Kenny, and Frank V Koss. 1999. Automated Intelligent Pilots for Combat Flight Simula- tion. AI Magazine 20, 1 (1999), 27–42
work page 1999
- [56]
-
[57]
Bjoern Knafla. 2011. Introduction to Behavior Trees. http://bjoernknafla.com/ introduction-to-behavior-trees
work page 2011
- [58]
-
[59]
Proceedings of the National Academy of Sciences 119, 39 (2022), e2115730119
Socially situated artificial intelligence enables learning from human interaction. Proceedings of the National Academy of Sciences 119, 39 (2022), e2115730119. https://doi.org/10.1073/pnas.2115730119 arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.2115730119
- [60]
-
[61]
Phaser Labs. 2023. Welcome to Phaser 3. https://phaser.io/phaser3. Accessed on: 2023-04-03
work page 2023
-
[62]
John Laird. 2001. It Knows What You’re Going To Do: Adding Anticipation to a Quakebot. In Proceedings of the 2001 Workshop on Intelligent Cinematography and Editing. 63–69
work page 2001
-
[63]
John Laird and Michael VanLent. 2001. Human-Level AI’s Killer Application: Interactive Computer Games. AI Magazine 22, 2 (2001), 15. https://doi.org/10. 1609/aimag.v22i2.1558
work page 2001
-
[64]
John E. Laird. 2000. It Knows What You’re Going To Do: Adding Anticipation to a QUAKEBOT. In Papers from the AAAI 2000 Spring Symposium on Artificial UIST ’23, October 29-November 1, 2023, San Francisco, CA, USA J.S. Park, J.C. O’Brien, C.J. Cai, M.R. Morris, P. Liang, M.S. Bernstein Intelligence and Interactive Entertainment (Technical Report SS-00-02) ....
work page 2000
-
[65]
John E. Laird. 2012. The Soar Cognitive Architecture. MIT Press
work page 2012
-
[66]
Laird, Christian Lebiere, and Paul S
John E. Laird, Christian Lebiere, and Paul S. Rosenbloom. 2017. A Standard Model of the Mind: Toward a Common Computational Framework across Artificial Intelligence, Cognitive Science, Neuroscience, and Robotics. AI Magazine 38, 1 (2017), 13–26
work page 2017
-
[67]
Michelle S Lam, Zixian Ma, Anne Li, Izequiel Freitas, Dakuo Wang, James A Landay, and Michael S Bernstein. 2023. Model Sketching: Centering Concepts in Early-Stage Machine Learning Model Design. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
work page 2023
-
[68]
Pat Langley, Dongkyu Choi, and Seth Rogers. 2005. Interleaving Learning, Problem Solving, and Execution in the Icarus Architecture . Technical Report. Stanford University, Center for the Study of Language and Information
work page 2005
-
[69]
Jason Linder, Gierad Laput, Mira Dontcheva, Gregg Wilensky, Walter Chang, Aseem Agarwala, and Eytan Adar. 2013. PixelTone: A Multimodal Interface for Image Editing. In CHI ’13 Extended Abstracts on Human Factors in Computing Systems (Paris, France) (CHI EA ’13). Association for Computing Machinery, New York, NY, USA, 2829–2830. https://doi.org/10.1145/246...
- [70]
-
[71]
Vivian Liu, Han Qiao, and Lydia Chilton. 2022. Opal: Multimodal Image Gener- ation for News Illustration. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology . 1–17
work page 2022
- [72]
-
[73]
Josh McCoy, Michael Mateas, and Noah Wardrip-Fruin. 2009. Comme il Faut: A System for Simulating Social Games Between Autonomous Characters. In Proceedings of the 7th International Conference on Digital Arts and Culture. 87–94
work page 2009
-
[74]
Josh McCoy, Mike Treanor, Ben Samuel, Michael Mateas, and Noah Wardrip- Fruin. 2011. Prom Week: Social Physics as Gameplay. In Proceedings of the 6th International Conference on Foundations of Digital Games (FDG’11) . ACM, Bordeaux, France, 70–77. https://doi.org/10.1145/2159365.2159377
-
[75]
Josh McCoy, Mike Treanor, Ben Samuel, Anna Reed, Michael Mateas, and Noah Wardrip-Fruin. 2012. Prom Week. In Proceedings of the 7th International Confer- ence on Foundations of Digital Games (FDG’12) . ACM, Raleigh, NC, USA, 1–8. https://doi.org/10.1145/2282338.2282340
-
[76]
Josh McCoy, Mike Treanor, Ben Samuel, Noah Wardrip-Fruin, and Michael Mateas. 2011. Comme il faut: A System for Authoring Playable Social Models. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE’11). AAAI, Stanford, CA, USA, 38–43
work page 2011
-
[77]
Marvin Minsky and Seymour Papert. 1970. Draft of a proposal to ARPA for research on artificial intelligence at MIT, 1970–71
work page 1970
-
[78]
Shohei Miyashita, Xinyu Lian, Xiao Zeng, Takashi Matsubara, and Kuniaki Uehara. 2017. Developing Game AI Agent Behaving Like Human by Mixing Reinforcement Learning and Supervised Learning. In Proceedings of the 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) . K...
work page 2017
-
[79]
https://doi.org/10.1109/SNPD.2017.8023884
-
[80]
Alexander Nareyek. 2007. Game AI is dead. Long live game AI! IEEE Intelligent Systems 22, 1 (2007), 9–11
work page 2007
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.