pith. machine review for the scientific record. sign in

arxiv: 2304.03442 · v2 · submitted 2023-04-07 · 💻 cs.HC · cs.AI· cs.LG

Recognition: 3 theorem links

· Lean Theorem

Generative Agents: Interactive Simulacra of Human Behavior

Authors on Pith no claims yet

Pith reviewed 2026-05-11 18:58 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.LG
keywords generative agentslarge language modelsbelievable simulationinteractive sandboxemergent social behaviormemory and reflectionvirtual environmenthuman behavior simulation
0
0 comments X

The pith

Generative agents extend large language models with memory, reflection, and planning to produce believable individual and social behaviors in a simulated town.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents generative agents as computational proxies for human behavior that can handle daily routines, form opinions, initiate conversations, and coordinate group events without step-by-step scripting. The core architecture records an agent's experiences in natural language, condenses them into higher-level reflections over time, and retrieves relevant memories to guide future actions. In a small-town sandbox with twenty-five agents, this setup leads to emergent outcomes such as agents spreading invitations and arranging dates for a Valentine's Day party after receiving only one initial user prompt. The authors show through component ablations that observation, reflection, and planning each add to the consistency and realism of the resulting behaviors. The goal is to support interactive applications like immersive environments and rehearsal tools by making agent simulations more responsive to natural language input.

Core claim

Generative agents are software agents that simulate believable human behavior by extending a large language model to maintain a full record of experiences in natural language, synthesize those records into periodic reflections, and retrieve memories dynamically when planning the next actions. When placed in an interactive environment modeled on The Sims, twenty-five such agents exhibit consistent daily activities and produce unscripted social patterns, including autonomously organizing a multi-day party from a single starting instruction.

What carries the argument

The three-part architecture of observation (storing experiences), reflection (synthesizing memories into higher-level summaries), and planning (retrieving relevant memories to decide actions) layered on top of a large language model.

If this is right

  • End users can steer the simulation through ordinary natural-language instructions rather than code-level scripting.
  • Individual agents sustain believable routines and social awareness over multi-day periods through accumulated memory.
  • Social events and relationships can emerge at the group level without any agent being explicitly told the full plan.
  • Removing any one of the three components (observation, reflection, or planning) measurably reduces the believability of agent actions.
  • The same architecture can be applied to other interactive domains such as communication rehearsal or design prototyping.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Simulations of this kind could support safer testing of social policies or training scenarios before real-world deployment.
  • Scaling the approach to hundreds of agents or weeks of simulated time may require additional mechanisms to prevent memory drift.
  • The method offers a concrete testbed for studying how language-model consistency affects perceived human-likeness in longer interactions.
  • Similar memory-and-reflection loops could be added to other agent systems to improve long-term coherence in everyday assistant tasks.

Load-bearing premise

Large language models will keep producing coherent, non-contradictory actions and dialogues that stay consistent with an agent's growing memory and personality across many simulated days without outside fixes.

What would settle it

A run of the town simulation in which agents begin to issue contradictory statements about past events or fail to coordinate attendance at the party despite the initial prompt, resulting in visibly incoherent group behavior.

read the original abstract

Believable proxies of human behavior can empower interactive applications ranging from immersive environments to rehearsal spaces for interpersonal communication to prototyping tools. In this paper, we introduce generative agents--computational software agents that simulate believable human behavior. Generative agents wake up, cook breakfast, and head to work; artists paint, while authors write; they form opinions, notice each other, and initiate conversations; they remember and reflect on days past as they plan the next day. To enable generative agents, we describe an architecture that extends a large language model to store a complete record of the agent's experiences using natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. We instantiate generative agents to populate an interactive sandbox environment inspired by The Sims, where end users can interact with a small town of twenty five agents using natural language. In an evaluation, these generative agents produce believable individual and emergent social behaviors: for example, starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time. We demonstrate through ablation that the components of our agent architecture--observation, planning, and reflection--each contribute critically to the believability of agent behavior. By fusing large language models with computational, interactive agents, this work introduces architectural and interaction patterns for enabling believable simulations of human behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces generative agents: LLM-augmented software agents that maintain a natural-language memory stream of experiences, synthesize reflections, and retrieve context to plan actions. These agents are placed in a Sims-inspired sandbox with 25 agents; the central claim is that they produce believable individual and emergent social behaviors, illustrated by a scenario in which a single user-specified desire to host a Valentine's Day party leads to autonomous invitation spreading, new acquaintances, date invitations, and coordinated attendance over two simulated days. Ablation experiments and a human evaluation with 100 participants are presented to show that observation, planning, and reflection each contribute to believability.

Significance. If the results hold, the work offers a concrete architectural pattern for believable interactive simulacra with clear applications in virtual environments, social rehearsal tools, and HCI prototyping. The explicit use of natural-language memory and reflection to enable emergent multi-agent coordination without hand-crafted rules is a notable strength, and the ablations provide direct evidence that each component matters. The absence of fitted parameters or circular evaluation metrics further supports the claim's internal validity.

major comments (2)
  1. [§5] §5 (Evaluation): The human evaluation and ablation results are tied to a single small-scale scenario (25 agents, ~2 days). While this suffices to demonstrate the Valentine's party example, it leaves the general claim of 'believable individual and emergent social behaviors' under-supported; additional independent scenarios or quantitative metrics of behavioral diversity would be needed to establish robustness.
  2. [§4, §5] §4 (Architecture) and §5: The architecture stores experiences as text, retrieves by embedding similarity, and generates plans/reflections via single LLM calls, yet no mechanism or metric is reported for detecting or recovering from contradictions between new outputs and prior memories. The ablations show reflection improves the target behavior, but the paper does not measure contradiction rate or long-term consistency across the multi-day run, which is load-bearing for the autonomous coordination claim.
minor comments (2)
  1. The exact prompts, retrieval count, and reflection interval values used in the reported runs should be listed explicitly (perhaps in an appendix) to aid reproducibility.
  2. Figure 2 or the environment description: clarify how the sandbox renders agent observations and handles concurrent actions to make the interaction loop fully transparent.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation for minor revision. We address the two major comments point by point below, proposing targeted textual revisions that strengthen the manuscript while remaining within the scope of a minor revision.

read point-by-point responses
  1. Referee: [§5] §5 (Evaluation): The human evaluation and ablation results are tied to a single small-scale scenario (25 agents, ~2 days). While this suffices to demonstrate the Valentine's party example, it leaves the general claim of 'believable individual and emergent social behaviors' under-supported; additional independent scenarios or quantitative metrics of behavioral diversity would be needed to establish robustness.

    Authors: We agree that the primary evaluation and human study are anchored to one extended scenario. This scenario was chosen because it produces a clear, observable chain of emergent social behaviors (invitation spreading, new acquaintances, date invitations, and coordinated attendance) that can be directly attributed to the architecture rather than hand-crafted rules. The ablation results and the 100-participant human evaluation provide direct evidence that observation, planning, and reflection each improve believability within this setting. In the revised manuscript we will (1) add brief qualitative descriptions of other observed behaviors from the same sandbox runs (e.g., daily routines, opinion formation, and spontaneous conversations) to illustrate individual believability beyond the party, and (2) expand the §5 discussion to explicitly state the evaluation's scope and to identify the collection of additional independent scenarios and quantitative diversity metrics as valuable future work. These changes clarify the current evidence without introducing new experiments. revision: partial

  2. Referee: [§4, §5] §4 (Architecture) and §5: The architecture stores experiences as text, retrieves by embedding similarity, and generates plans/reflections via single LLM calls, yet no mechanism or metric is reported for detecting or recovering from contradictions between new outputs and prior memories. The ablations show reflection improves the target behavior, but the paper does not measure contradiction rate or long-term consistency across the multi-day run, which is load-bearing for the autonomous coordination claim.

    Authors: We acknowledge that the architecture contains no explicit contradiction-detection or consistency-enforcement module; coherence is delegated to the LLM's conditioning on retrieved memories and to the synthesis performed during reflection. In the two-day runs we observed that reflection helped agents maintain coherent plans that supported the observed coordination, but we did not compute contradiction rates or long-term consistency statistics. In the revision we will (1) add a short paragraph in §4 describing how memory retrieval and reflection are intended to promote consistency in practice, and (2) insert a limitations subsection in §5 that notes the lack of quantitative consistency metrics and lists explicit contradiction checking as a promising direction for future agent architectures. These additions directly address the referee's concern while accurately reflecting what was implemented and measured. revision: partial

Circularity Check

0 steps flagged

No circularity: architecture and qualitative demonstration are self-contained

full rationale

The paper defines an agent architecture (memory storage as natural-language records, LLM-based reflection synthesis, embedding retrieval for planning) and demonstrates emergent behaviors via a single run in a 25-agent sandbox. No equations, fitted parameters, or quantitative predictions are claimed; the Valentine's Day party example is an observed output of the described LLM calls rather than a reduction to prior data or self-citation. Ablations compare component removals but remain empirical observations of the same generative process. The derivation chain contains no self-definitional, fitted-input, or uniqueness-imported steps.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 2 invented entities

The central claim rests on the assumption that current LLMs can serve as reliable planners and reflectors when given structured natural-language context; the paper introduces three new architectural components (memory stream, reflection, planning) without independent prior validation beyond LLM capabilities.

free parameters (2)
  • reflection interval
    The frequency at which agents synthesize memories into reflections is a design parameter chosen by the authors.
  • retrieval count
    The number of memories retrieved for each planning step is a tunable hyperparameter.
axioms (1)
  • domain assumption Large language models can produce coherent, personality-consistent plans and reflections when given a natural-language memory context.
    Invoked throughout the architecture description in sections on memory, reflection, and planning.
invented entities (2)
  • Memory stream no independent evidence
    purpose: Stores a complete chronological record of an agent's experiences as natural language observations.
    New data structure introduced by the paper; no independent evidence outside this work.
  • Reflection mechanism no independent evidence
    purpose: Synthesizes raw memories into higher-level summaries about self, relationships, and plans.
    New component invented for this architecture; no prior independent validation.

pith-pipeline@v0.9.0 · 5597 in / 1512 out tokens · 62963 ms · 2026-05-11T18:58:57.161389+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith.Foundation.HierarchyEmergence hierarchy_emergence_forces_phi unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    starting with only a single user-specified notion that one agent wants to throw a Valentine's Day party, the agents autonomously spread invitations to the party over the next two days, make new acquaintances, ask each other out on dates to the party, and coordinate to show up for the party together at the right time

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 52 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

    cs.AI 2026-05 accept novelty 8.0

    SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.

  2. SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

    cs.AI 2026-05 unverdicted novelty 8.0

    SimWorld Studio uses a self-evolving coding agent to generate adaptive 3D environments that improve embodied agent performance, with reported gains of 18 points over fixed environments in navigation tasks.

  3. OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language Environment Simulation

    cs.CL 2026-04 unverdicted novelty 8.0

    OccuBench is a new benchmark for AI agents on real-world occupational tasks via LLM-driven simulators, showing no model dominates all industries, implicit faults are hardest, and larger models with more reasoning perf...

  4. AgentSocialBench: Evaluating Privacy Risks in Human-Centered Agentic Social Networks

    cs.AI 2026-04 unverdicted novelty 8.0

    AgentSocialBench demonstrates that privacy preservation is fundamentally harder in human-centered agentic social networks than in single-agent cases due to cross-domain coordination pressures and an abstraction parado...

  5. Why Do Multi-Agent LLM Systems Fail?

    cs.AI 2025-03 unverdicted novelty 8.0

    The authors create the first large-scale dataset and taxonomy of failure modes in multi-agent LLM systems to explain their limited performance gains.

  6. ScioMind: Cognitively Grounded Multi-Agent Social Simulation with Anchoring-Based Belief Dynamics and Dynamic Profiles

    cs.AI 2026-05 unverdicted novelty 7.0

    ScioMind combines anchoring-based belief updates, hierarchical memory, and dynamic profiles in LLM multi-agent systems to produce more stable, diverse, and psychologically aligned opinion trajectories than prior fixed...

  7. MMSkills: Towards Multimodal Skills for General Visual Agents

    cs.AI 2026-05 unverdicted novelty 7.0

    MMSkills creates compact multimodal skill packages from trajectories and uses a branch-loaded agent to improve visual decision-making on GUI and game benchmarks.

  8. Mechanism Plausibility in Generative Agent-Based Modeling

    cs.MA 2026-05 unverdicted novelty 7.0

    Introduces the Mechanism Plausibility Scale to distinguish generative sufficiency from mechanistic plausibility in LLM-based agent-based models.

  9. Internal vs. External: Comparing Deliberation and Evolution for Multi-Agent Constitutional Design

    cs.MA 2026-05 unverdicted novelty 7.0

    External evolution beats internal deliberation in collective-action tasks with statistical significance but neither helps in trading, and deliberation never discovers punishment while evolution does.

  10. NARRA-Gym for Evaluating Interactive Narrative Agents

    cs.CL 2026-05 unverdicted novelty 7.0

    NARRA-Gym is an executable benchmark that generates complete interactive narrative episodes from emotional seeds and logs full model trajectories to expose gaps in coherence, adaptation, and personalization that stati...

  11. Agent Island: A Saturation- and Contamination-Resistant Benchmark from Multiagent Games

    cs.AI 2026-05 unverdicted novelty 7.0

    Agent Island is a new multiagent game environment that functions as a dynamic benchmark resistant to saturation and contamination, with Bayesian ranking showing OpenAI GPT-5.5 as the strongest performer among 49 model...

  12. A Systematic Survey of Security Threats and Defenses in LLM-Based AI Agents: A Layered Attack Surface Framework

    cs.CR 2026-04 unverdicted novelty 7.0

    A new 7x4 taxonomy organizes agentic AI security threats by architectural layer and persistence timescale, revealing under-explored upper layers and missing defenses after surveying 116 papers.

  13. The Platform Is Mostly Not a Platform: Token Economies and Agent Discourse on Moltbook

    cs.CY 2026-04 unverdicted novelty 7.0

    Moltbook operates as two largely separate layers: a dominant transactional token economy using protocols like MBC-20 and a thinner discursive conversation layer with only 3.6% agent overlap.

  14. EMBER: Autonomous Cognitive Behaviour from Learned Spiking Neural Network Dynamics in a Hybrid LLM Architecture

    cs.AI 2026-04 unverdicted novelty 7.0

    A hybrid SNN-LLM system uses learned spiking dynamics and lateral STDP propagation to trigger LLM actions without external prompts, producing the first autonomous action after 7 exchanges from a clean start.

  15. Strategic Persuasion with Trait-Conditioned Multi-Agent Systems for Iterative Legal Argumentation

    cs.MA 2026-04 unverdicted novelty 7.0

    Multi-agent LLM simulations with trait-conditioned agents and a reinforcement-learning orchestrator show heterogeneous teams and dynamic trait selection outperform static configurations in simulated legal argumentation.

  16. $\tau$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains

    cs.AI 2024-06 unverdicted novelty 7.0

    τ-bench shows state-of-the-art agents like GPT-4o succeed on under 50% of tool-using, rule-following tasks and are inconsistent across repeated trials.

  17. Voyager: An Open-Ended Embodied Agent with Large Language Models

    cs.AI 2023-05 unverdicted novelty 7.0

    Voyager achieves superior lifelong learning in Minecraft by combining an automatic exploration curriculum, a library of executable skills, and iterative LLM prompting with environment feedback, yielding 3.3x more uniq...

  18. Reflexion: Language Agents with Verbal Reinforcement Learning

    cs.AI 2023-03 conditional novelty 7.0

    Reflexion lets LLM agents improve via stored verbal reflections on task feedback, reaching 91% pass@1 on HumanEval and outperforming prior GPT-4 results.

  19. MMSkills: Towards Multimodal Skills for General Visual Agents

    cs.AI 2026-05 unverdicted novelty 6.0

    MMSkills turns public interaction trajectories into compact multimodal skill packages that visual agents can consult at runtime to improve decision-making on benchmarks.

  20. CA2: Code-Aware Agent for Automated Game Testing

    cs.SE 2026-05 unverdicted novelty 6.0

    CA2 integrates call stack information into RL agents for game testing and shows consistent gains over baselines that ignore code signals.

  21. CHAL: Council of Hierarchical Agentic Language

    cs.AI 2026-05 unverdicted novelty 6.0

    CHAL is a multi-agent dialectic system that performs structured belief optimization over defeasible domains using Bayesian-inspired graph representations and configurable meta-cognitive value system hyperparameters.

  22. Positive Alignment: Artificial Intelligence for Human Flourishing

    cs.AI 2026-05 unverdicted novelty 6.0

    Positive Alignment introduces AI systems that support human flourishing pluralistically and proactively while remaining safe, as a necessary complement to traditional safety-focused alignment research.

  23. Workspace Optimization: How to Train Your Agent

    cs.AI 2026-05 unverdicted novelty 6.0

    Workspace optimization evolves an agent's external workspace using multi-agent systems, with DreamTeam raising ARC-AGI-3 scores from 36% to 38.4% while using 31% fewer actions.

  24. OPT-BENCH: Evaluating the Iterative Self-Optimization of LLM Agents in Large-Scale Search Spaces

    cs.AI 2026-05 unverdicted novelty 6.0

    OPT-BENCH and OPT-Agent evaluate LLM self-optimization in large search spaces, showing stronger models improve via feedback but stay constrained by base capacity and below human performance.

  25. LoopTrap: Termination Poisoning Attacks on LLM Agents

    cs.CR 2026-05 unverdicted novelty 6.0

    LoopTrap is an automated red-teaming framework that crafts termination-poisoning prompts to amplify LLM agent steps by 3.57x on average (up to 25x) across 8 agents.

  26. Agentic Coding Needs Proactivity, Not Just Autonomy

    cs.SE 2026-05 conditional novelty 6.0

    Coding agents require a three-level proactivity taxonomy (Reactive, Scheduled, Situation Aware) evaluated by insight policy quality using Insight Decision Quality, Context Grounding Score, and Learning Lift.

  27. A Meta Reinforcement Learning Approach to Goals-Based Wealth Management

    cs.LG 2026-05 unverdicted novelty 6.0

    MetaRL pre-trained on GBWM problems delivers near-optimal dynamic strategies in 0.01s achieving 97.8% of DP optimal utility and handles larger problems where DP fails.

  28. Self-Adaptive Multi-Agent LLM-Based Security Pattern Selection for IoT Systems

    cs.CR 2026-05 unverdicted novelty 6.0

    ASPO combines multi-agent LLM proposals with deterministic enforcement in a MAPE-K loop to select conflict-free, resource-feasible security patterns for IoT, delivering 100% safety invariants and 21-23% tail latency/e...

  29. The Pragmatic Persona: Discovering LLM Persona through Bridging Inference

    cs.CL 2026-04 unverdicted novelty 6.0

    Modeling LLM dialogues as bridging-inference knowledge graphs reveals more stable and coherent personas than traditional lexical or stylistic analysis methods.

  30. Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence

    cs.AI 2026-04 unverdicted novelty 6.0

    Agent-World autonomously synthesizes verifiable real-world tasks and uses continuous self-evolution to train 8B and 14B agents that outperform proprietary models on 23 benchmarks.

  31. GenericAgent: A Token-Efficient Self-Evolving LLM Agent via Contextual Information Density Maximization (V1.0)

    cs.CL 2026-04 unverdicted novelty 6.0

    GenericAgent outperforms other LLM agents on long-horizon tasks by maximizing context information density with fewer tokens via minimal tools, on-demand memory, trajectory-to-SOP evolution, and compression.

  32. Human Cognition in Machines: A Unified Perspective of World Models

    cs.RO 2026-04 unverdicted novelty 6.0

    The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and pro...

  33. Auditable Agents

    cs.AI 2026-04 unverdicted novelty 6.0

    No agent system can be accountable without auditability, which requires five dimensions (action recoverability, lifecycle coverage, policy checkability, responsibility attribution, evidence integrity) and mechanisms f...

  34. Towards Automated Crowdsourced Testing via Personified-LLM

    cs.SE 2026-03 unverdicted novelty 6.0

    PersonaTester uses LLMs guided by three-dimensional personas to replicate crowdworker testing patterns, yielding higher behavioral consistency, variability, and more bug detections than baseline LLM agents.

  35. SoK: Agentic Skills -- Beyond Tool Use in LLM Agents

    cs.CR 2026-02 unverdicted novelty 6.0

    The paper systematizes agentic skills beyond tool use, providing design pattern and representation-scope taxonomies plus security analysis of malicious skill infiltration in agent marketplaces.

  36. MemGPT: Towards LLMs as Operating Systems

    cs.AI 2023-10 unverdicted novelty 6.0

    MemGPT uses OS-inspired virtual context management to extend LLM context windows for large document analysis and long-term multi-session chat.

  37. ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate

    cs.CL 2023-08 conditional novelty 6.0

    Multi-agent debate among LLMs yields more reliable text evaluations than single-agent prompting by simulating collaborative human judgment.

  38. Good to Go: The LOOP Skill Engine That Hits 99% Success and Slashes Token Usage by 99% via One-Shot Recording and Deterministic Replay

    cs.AI 2026-05 unverdicted novelty 5.0

    The LOOP Skill Engine records one LLM-powered run of a periodic task and converts it into a deterministic replay template that eliminates further LLM usage while maintaining high success rates.

  39. Beyond Inefficiency: Systemic Costs of Incivility in Multi-Agent Monte Carlo Simulations

    cs.AI 2026-05 unverdicted novelty 5.0

    Monte Carlo simulations of LLM agents confirm that toxic debates take 25% longer to converge, with larger delays in smaller models, and show a first-mover advantage independent of toxicity.

  40. MIRAGE: A Micro-Interaction Relational Architecture for Grounded Exploration in Multi-Figure Artworks

    cs.CV 2026-04 unverdicted novelty 5.0

    MIRAGE improves VLM analysis of multi-figure art by inserting a verifiable structured representation of micro-interactions between spatial grounding and narrative output.

  41. Mesh Memory Protocol: Semantic Infrastructure for Multi-Agent LLM Systems

    cs.MA 2026-04 unverdicted novelty 5.0

    MMP defines a seven-field CMB schema, role-based SVAF evaluation, content-hash lineage, and remix storage to enable traceable cross-session collaboration among autonomous LLM agents.

  42. Agentic Copyright, Data Scraping & AI Governance: Toward a Coasean Bargain in the Era of Artificial Intelligence

    cs.AI 2026-04 unverdicted novelty 5.0

    The paper introduces agentic copyright and a supervised multi-agent governance framework to manage large-scale AI-mediated copyright transactions and restore efficient market ordering in creative industries.

  43. MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents

    cs.AI 2026-04 unverdicted novelty 5.0

    MemMachine stores entire conversational episodes and applies contextualized retrieval plus adaptive query routing to achieve 0.9169 accuracy on LoCoMo and 93 percent on LongMemEvalS while using 80 percent fewer tokens...

  44. EconAI: Dynamic Persona Evolution and Memory-Aware Agents in Evolving Economic Environments

    cs.MA 2026-05 unverdicted novelty 4.0

    EconAI adds memory weighting and economic sentiment indexing to LLM agents so they adapt short-term actions to long-term goals inside a single macro/micro simulation loop.

  45. Positive Alignment: Artificial Intelligence for Human Flourishing

    cs.AI 2026-05 unverdicted novelty 4.0

    Positive Alignment is introduced as a distinct AI agenda that supports human flourishing through pluralistic and context-sensitive design, complementing traditional safety-focused alignment.

  46. Behavioral Determinants of Deployed AI Agents in Social Networks: A Multi-Factor Study of Personality, Model, and Guardrail Specification

    cs.AI 2026-05 unverdicted novelty 4.0

    Personality specifications dominate AI agent social behaviors such as response length more than model choice or operational rules in a controlled deployment study.

  47. Behavioral Determinants of Deployed AI Agents in Social Networks: A Multi-Factor Study of Personality, Model, and Guardrail Specification

    cs.AI 2026-05 unverdicted novelty 4.0

    Personality specifications dominate emergent social behaviors of AI agents in simulated networks, producing large variations in response length while model and rule changes yield moderate shifts in style and topic breadth.

  48. A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications

    cs.IR 2026-05 unverdicted novelty 4.0

    The paper surveys agent skills for LLM agents, organizing the literature into a four-stage lifecycle of representation, acquisition, retrieval, and evolution while highlighting their role in system scalability.

  49. Multi-Agent Consensus as a Cognitive Bias Trigger in Human-AI Interaction

    cs.HC 2026-04 unverdicted novelty 4.0

    Majority consensus among AI agents speeds up human opinion change and raises confidence via social proof, while minority dissent slows it and encourages more deliberation, based on an experiment comparing three agent ...

  50. Memory as Metabolism: A Design for Companion Knowledge Systems

    cs.AI 2026-04 unverdicted novelty 4.0

    This paper designs a companion knowledge system with TRIAGE, DECAY, CONTEXTUALIZE, CONSOLIDATE, and AUDIT operations plus memory gravity and minority-hypothesis retention to give contradictory evidence a path to updat...

  51. Large Language Model based Multi-Agents: A Survey of Progress and Challenges

    cs.CL 2024-01 unverdicted novelty 4.0

    The paper surveys LLM-based multi-agent systems, covering simulated domains, agent profiling and communication, mechanisms for capacity growth, and common benchmarks.

  52. The Rise and Potential of Large Language Model Based Agents: A Survey

    cs.AI 2023-09 accept novelty 4.0

    The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.

Reference graph

Works this paper leans on

117 extracted references · 117 canonical work pages · cited by 48 Pith papers · 7 internal anchors

  1. [1]

    Gavin Abercrombie, Amanda Cercas Curry, Tanvi Dinkar, and Zeerak Talat. 2023. Mirages: On Anthropomorphism in Dialogue Systems. arXiv:2305.09800 [cs.CL]

  2. [2]

    Robert Ackland, Jamsheed Shorish, Paul Thomas, and Lexing Xie. 2013. How dense is a network? http://users.cecs.anu.edu.au/~xlx/teaching/css2013/ network-density.html

  3. [3]

    Eytan Adar, Mira Dontcheva, and Gierad Laput. 2014. CommandSpace: Modeling the Relationships between Tasks, Descriptions and Features. In Proceedings of the 27th Annual ACM Symposium on User Interface Software and Technology (Honolulu, Hawaii, USA) (UIST ’14). Association for Computing Machinery, New York, NY, USA, 167–176. https://doi.org/10.1145/2642918.2647395

  4. [4]

    Saleema Amershi, Maya Cakmak, William Bradley Knox, and Todd Kulesza

  5. [5]

    AI Magazine 35, 4 (2014), 105–120

    Power to the people: The role of humans in interactive machine learning. AI Magazine 35, 4 (2014), 105–120

  6. [6]

    Saleema Amershi, Dan Weld, Mihaela Vorvoreanu, Adam Fourney, Besmira Nushi, Penny Collisson, Jina Suh, Shamsi Iqbal, Paul N Bennett, Kori Inkpen, et al. 2019. Guidelines for human-AI interaction. In Proceedings of the 2019 chi conference on human factors in computing systems . 1–13

  7. [7]

    Anderson

    John R. Anderson. 1993. Rules of the Mind . Lawrence Erlbaum Associates, Hillsdale, NJ

  8. [8]

    Electronic Arts. 2009. The Sims 3. Video game

  9. [9]

    Ruth Aylett. 1999. Narrative in virtual environments—towards emergent narra- tive. In Narrative Intelligence: Papers from the AAAI Fall Symposium (Technical Report FS-99-01). AAAI Press, 83–86

  10. [10]

    Christoph Bartneck and Jodi Forlizzi. 2004. A design-centered framework for social human-robot interaction. In Proceedings of the 13th IEEE International Workshop on Robot and Human Interactive Communication (RO-MAN’04). 591–

  11. [11]

    https://doi.org/10.1109/ROMAN.2004.1374827

  12. [12]

    Joseph Bates. 1994. The Role of Emotion in Believable Agents. Commun. ACM 37, 7 (1994), 122–125. https://doi.org/10.1145/176789.176803

  13. [13]

    Christopher Berner, Greg Brockman, Brooke Chan, Vicki Cheung, Przemysław Dębiak, Christy Dennison, David Farhi, Quirin Fischer, Shariq Hashme, Chris Hesse, Rafal Józefowicz, Scott Gray, Catherine Olsson, Jakub Pachocki, Michael Petrov, Henrique P. d.O. Pinto, Jonathan Raiman, Tim Salimans, Jeremy Schlatter, Jonas Schneider, Szymon Sidor, Ilya Sutskever, J...

  14. [14]

    Marcel Binz and Eric Schulz. 2023. Using cognitive psychology to under- stand GPT-3. Proceedings of the National Academy of Sciences 120, 6 (2023), e2218523120

  15. [15]

    BioWare. 2007. Mass Effect. Video game

  16. [16]

    Woody Bledsoe. 1986. I had a dream: AAAI presidential address. AI Magazine 7, 1 (1986), 57–61

  17. [17]

    On the Opportunities and Risks of Foundation Models

    Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, and et al. 2022. On the Opportunities and Risks of Foundation Models. arXiv:2108.07258 [cs.LG]

  18. [18]

    Michael Brenner. 2010. Creating dynamic story plots with continual multiagent planning. In Proceedings of the 24th AAAI Conference on Artificial Intelligence

  19. [19]

    Brooks, Cynthia Breazeal, Marko Marjanovic, Brian Scassellati, and Matthew Williamson

    Rodney A. Brooks, Cynthia Breazeal, Marko Marjanovic, Brian Scassellati, and Matthew Williamson. 2000. The Cog Project: Building a Humanoid Robot. In Computation for Metaphors, Analogy, and Agents (Lecture Notes on Artificial Intelligence, 1562), Chrystopher Nehaniv (Ed.). Springer-Verlag, Berlin, 52–87

  20. [20]

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin...

  21. [21]

    Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al

  22. [22]

    Sparks of Artificial General Intelligence: Early experiments with GPT-4

    Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023)

  23. [23]

    Robin Burkinshaw. 2009. Alice and Kev: The Story of Being Homeless in The Sims 3

  24. [24]

    Chris Callison-Burch, Gaurav Singh Tomar, Lara Martin, Daphne Ippolito, Suma Bailis, and David Reitter. 2022. Dungeons and Dragons as a Dialog Challenge for Artificial Intelligence. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Abu Dhabi, United Arab Emirates, 9379–939...

  25. [25]

    Stuart K Card, Thomas P Moran, and Allen Newell. 1980. The keystroke- level model for user performance time with interactive systems. Com- mun. ACM 23, 7 (1980), 396–410. https://doi.org/10.1145/358886.358895 arXiv:https://doi.org/10.1145/358886.358895

  26. [26]

    Stuart K Card, Thomas P Moran, and Alan Newell. 1983. The psychology of human-computer interaction. (1983)

  27. [27]

    Alex Champandard. 2012. Tutorial presentation. In IEEE Conference on Compu- tational Intelligence and Games

  28. [28]

    Dong kyu Choi, Tolga Konik, Negin Nejati, Chunki Park, and Pat Langley. 2021. A Believable Agent for First-Person Shooter Games. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment , Vol. 3. 71–73

  29. [29]

    Anind K Dey. 2001. Understanding and using context. Personal and ubiquitous computing 5 (2001), 4–7

  30. [30]

    Kevin Dill and L Martin. 2011. A Game AI Approach to Autonomous Con- trol of Virtual Characters. In Proceedings of the Interservice/Industry Training, Simulation, and Education Conference (I/ITSEC’11). Orlando, FL, USA

  31. [31]

    David Easley and Jon Kleinberg. 2010. Networks, crowds, and markets: Reasoning about a highly connected world . Cambridge university press

  32. [32]

    Arpad E Elo. 1967. The Proposed USCF Rating System, Its Development, Theory, and Applications. Chess Life XXII, 8 (August 1967), 242–247

  33. [33]

    Jerry Alan Fails and Dan R Olsen Jr. 2003. Interactive machine learning. In Proceedings of the 8th international conference on Intelligent user interfaces . ACM, 39–45

  34. [34]

    Ethan Fast, William McGrath, Pranav Rajpurkar, and Michael S Bernstein. 2016. Augur: Mining human behaviors from fiction to power interactive systems. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems . 237–247

  35. [35]

    Rebecca Fiebrink and Perry R Cook. 2010. The Wekinator: a system for real-time, interactive machine learning in music. In Proceedings of The Eleventh Interna- tional Society for Music Information Retrieval Conference (ISMIR 2010)(Utrecht) , Vol. 3. Citeseer, 2–1

  36. [36]

    Uwe Flick. 2009. An Introduction to Qualitative Research . SAGE

  37. [37]

    James Fogarty, Desney Tan, Ashish Kapoor, and Simon Winder. 2008. CueFlik: Interactive Concept Learning in Image Search. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Florence, Italy) (CHI ’08). Association for Computing Machinery, New York, NY, USA, 29–38. https: //doi.org/10.1145/1357054.1357061

  38. [38]

    Adam Fourney, Richard Mann, and Michael Terry. 2011. Query-feature graphs: bridging user vocabulary and system functionality. In Proceedings of the ACM Symposium on User Interface Software and Technology (UIST) (Santa Barbara, California, USA). ACM

  39. [39]

    Tom Francis. 2010. The Minecraft Experiment, day 1: Chasing Water- falls. http://www.pcgamer.com/2010/11/20/the-minecraft-experiment-day- 1-chasing-waterfalls/

  40. [40]

    Jonas Freiknecht and Wolfgang Effelsberg. 2020. Procedural Generation of Interactive Stories using Language Models. In International Conference on the Foundations of Digital Games (FDG ’20). ACM, Bugibba, Malta, 8. https://doi. org/10.1145/3402942.3409599

  41. [41]

    Tianyu Gao, Adam Fisch, and Danqi Chen. 2020. Making Pre-trained Language Models Better Few-shot Learners. CoRR abs/2012.15723 (2020). arXiv:2012.15723 https://arxiv.org/abs/2012.15723

  42. [42]

    Perttu Hämäläinen, Mikke Tavast, and Anton Kunnari. 2023. Evaluating Large Language Models in Generating Synthetic HCI Research Data: a Case Study. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems . ACM

  43. [43]

    Matthew Hausknecht, Prithviraj Ammanabrolu, Marc-Alexandre Cote, and Xinyu Yuan. 2020. Interactive Fiction Games: A Colossal Adventure. In Pro- ceedings of the AAAI Conference on Artificial Intelligence , Vol. 34. 7903–7910. https://doi.org/10.1609/aaai.v34i05.6297

  44. [44]

    Chris Hecker. 2011. My Liner Notes for Spore . http://chrishecker.com/My_liner_ notes_for_spore

  45. [45]

    Ralf Herbrich, Tom Minka, and Thore Graepel. 2006. TrueSkill ™ : A Bayesian Skill Rating System. In Advances in Neural Information Pro- cessing Systems , B. Schölkopf, J. Platt, and T. Hoffman (Eds.), Vol. 19. MIT Press. https://proceedings.neurips.cc/paper_files/paper/2006/file/ f44ee263952e65b3610b8ba51229d1f9-Paper.pdf

  46. [46]

    Douglas Hofstadter. 1995. Fluid concepts and creative analogies: computer models of the fundamental mechanisms of thought . Basic Books

  47. [47]

    Hollan, Edwin L

    James D. Hollan, Edwin L. Hutchins, and Louis Weitzman. 1984. STEAMER: An Interactive Inspectable Simulation-Based Training System. AI Magazine 5, 2 (1984), 23–36

  48. [48]

    Sture Holm. 1979. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 2 (1979), 65–70. https://doi.org/notspecified

  49. [49]

    John J. Horton. 2023. Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? arXiv:2301.07543 [econ.GN]

  50. [50]

    Eric Horvitz. 1999. Principles of mixed-initiative user interfaces. In Proceedings of the SIGCHI conference on Human Factors in Computing Systems . 159–166

  51. [51]

    Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Ser- manet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, and Brian Ichter. 2022. Inner Monologue: Embodied Reasoning through Planning with Language Models. arXiv:2207.05608 [cs.RO]

  52. [52]

    Kristen Ibister and Clifford Nass. 2000. Consistency of personality in interactive characters: verbal cues, non-verbal cues, and user characteristics. International Journal of Human-Computer Studies 52, 1 (2000), 65–80

  53. [53]

    Ellen Jiang, Kristen Olson, Edwin Toh, Alejandra Molina, Aaron Donsbach, Michael Terry, and Carrie J Cai. 2022. PromptMaker: Prompt-Based Prototyping with Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New York, NY, USA,...

  54. [54]

    Bonnie E John and David E Kieras. 1996. The GOMS family of user interface analysis techniques: Comparison and contrast. ACM Transactions on Computer- Human Interaction (TOCHI) 3, 4 (1996), 320–351

  55. [55]

    Randolph M Jones, John E Laird, Paul E Nielsen, Karen J Coulter, Patrick Kenny, and Frank V Koss. 1999. Automated Intelligent Pilots for Combat Flight Simula- tion. AI Magazine 20, 1 (1999), 27–42

  56. [56]

    Omar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, Christopher Potts, and Matei Zaharia. 2023. Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP. arXiv:2212.14024 [cs.CL]

  57. [57]

    Bjoern Knafla. 2011. Introduction to Behavior Trees. http://bjoernknafla.com/ introduction-to-behavior-trees

  58. [58]

    Bernstein

    Ranjay Krishna, Donsuk Lee, Li Fei-Fei, and Michael S. Bernstein

  59. [59]

    Proceedings of the National Academy of Sciences 119, 39 (2022), e2115730119

    Socially situated artificial intelligence enables learning from human interaction. Proceedings of the National Academy of Sciences 119, 39 (2022), e2115730119. https://doi.org/10.1073/pnas.2115730119 arXiv:https://www.pnas.org/doi/pdf/10.1073/pnas.2115730119

  60. [60]

    William H Kruskal and WA Wallis. 1952. Use of ranks in one-criterion variance analysis. J. Amer. Statist. Assoc. 47, 260 (1952), 583–621. https://doi.org/10.1080/ 01621459.1952.10483441

  61. [61]

    Phaser Labs. 2023. Welcome to Phaser 3. https://phaser.io/phaser3. Accessed on: 2023-04-03

  62. [62]

    John Laird. 2001. It Knows What You’re Going To Do: Adding Anticipation to a Quakebot. In Proceedings of the 2001 Workshop on Intelligent Cinematography and Editing. 63–69

  63. [63]

    John Laird and Michael VanLent. 2001. Human-Level AI’s Killer Application: Interactive Computer Games. AI Magazine 22, 2 (2001), 15. https://doi.org/10. 1609/aimag.v22i2.1558

  64. [64]

    John E. Laird. 2000. It Knows What You’re Going To Do: Adding Anticipation to a QUAKEBOT. In Papers from the AAAI 2000 Spring Symposium on Artificial UIST ’23, October 29-November 1, 2023, San Francisco, CA, USA J.S. Park, J.C. O’Brien, C.J. Cai, M.R. Morris, P. Liang, M.S. Bernstein Intelligence and Interactive Entertainment (Technical Report SS-00-02) ....

  65. [65]

    John E. Laird. 2012. The Soar Cognitive Architecture. MIT Press

  66. [66]

    Laird, Christian Lebiere, and Paul S

    John E. Laird, Christian Lebiere, and Paul S. Rosenbloom. 2017. A Standard Model of the Mind: Toward a Common Computational Framework across Artificial Intelligence, Cognitive Science, Neuroscience, and Robotics. AI Magazine 38, 1 (2017), 13–26

  67. [67]

    Michelle S Lam, Zixian Ma, Anne Li, Izequiel Freitas, Dakuo Wang, James A Landay, and Michael S Bernstein. 2023. Model Sketching: Centering Concepts in Early-Stage Machine Learning Model Design. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

  68. [68]

    Pat Langley, Dongkyu Choi, and Seth Rogers. 2005. Interleaving Learning, Problem Solving, and Execution in the Icarus Architecture . Technical Report. Stanford University, Center for the Study of Language and Information

  69. [69]

    Jason Linder, Gierad Laput, Mira Dontcheva, Gregg Wilensky, Walter Chang, Aseem Agarwala, and Eytan Adar. 2013. PixelTone: A Multimodal Interface for Image Editing. In CHI ’13 Extended Abstracts on Human Factors in Computing Systems (Paris, France) (CHI EA ’13). Association for Computing Machinery, New York, NY, USA, 2829–2830. https://doi.org/10.1145/246...

  70. [70]

    Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. 2021. What Makes Good In-Context Examples for GPT-3? CoRR abs/2101.06804 (2021). arXiv:2101.06804 https://arxiv.org/abs/2101.06804

  71. [71]

    Vivian Liu, Han Qiao, and Lydia Chilton. 2022. Opal: Multimodal Image Gener- ation for News Illustration. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology . 1–17

  72. [72]

    Pattie Maes. 1995. Artificial Life Meets Entertainment: Lifelike Autonomous Agents. Commun. ACM 38, 11 (nov 1995), 108–114. https://doi.org/10.1145/ 219717.219808

  73. [73]

    Josh McCoy, Michael Mateas, and Noah Wardrip-Fruin. 2009. Comme il Faut: A System for Simulating Social Games Between Autonomous Characters. In Proceedings of the 7th International Conference on Digital Arts and Culture. 87–94

  74. [74]

    Josh McCoy, Mike Treanor, Ben Samuel, Michael Mateas, and Noah Wardrip- Fruin. 2011. Prom Week: Social Physics as Gameplay. In Proceedings of the 6th International Conference on Foundations of Digital Games (FDG’11) . ACM, Bordeaux, France, 70–77. https://doi.org/10.1145/2159365.2159377

  75. [75]

    Josh McCoy, Mike Treanor, Ben Samuel, Anna Reed, Michael Mateas, and Noah Wardrip-Fruin. 2012. Prom Week. In Proceedings of the 7th International Confer- ence on Foundations of Digital Games (FDG’12) . ACM, Raleigh, NC, USA, 1–8. https://doi.org/10.1145/2282338.2282340

  76. [76]

    Josh McCoy, Mike Treanor, Ben Samuel, Noah Wardrip-Fruin, and Michael Mateas. 2011. Comme il faut: A System for Authoring Playable Social Models. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE’11). AAAI, Stanford, CA, USA, 38–43

  77. [77]

    Marvin Minsky and Seymour Papert. 1970. Draft of a proposal to ARPA for research on artificial intelligence at MIT, 1970–71

  78. [78]

    Shohei Miyashita, Xinyu Lian, Xiao Zeng, Takashi Matsubara, and Kuniaki Uehara. 2017. Developing Game AI Agent Behaving Like Human by Mixing Reinforcement Learning and Supervised Learning. In Proceedings of the 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD) . K...

  79. [79]

    https://doi.org/10.1109/SNPD.2017.8023884

  80. [80]

    Alexander Nareyek. 2007. Game AI is dead. Long live game AI! IEEE Intelligent Systems 22, 1 (2007), 9–11

Showing first 80 references.