hub Canonical reference

O'Brien and Carrie Jun Cai and Meredith Ringel Morris and Percy Liang and Michael S

Joon Sung Park, Joseph C · 2023 · arXiv 6183.360676

Canonical reference. 94% of citing Pith papers cite this work as background.

66 Pith papers citing it

Background 94% of classified citations

read on arXiv browse 66 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 16

citation-polarity summary

background 15 support 1

representative citing papers

ShadowMerge: A Novel Poisoning Attack on Graph-Based Agent Memory via Relation-Channel Conflicts

cs.CR · 2026-05-09 · unverdicted · novelty 8.0 · 3 refs

ShadowMerge exploits relation-channel conflicts to poison graph-based agent memory, achieving 93.8% average attack success rate on Mem0 and real-world datasets while bypassing existing defenses.

Evaluating Very Long-Term Conversational Memory of LLM Agents

cs.CL · 2024-02-27 · unverdicted · novelty 8.0

Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.

Push Your Agent: Measuring and Enforcing Quantitative Goal Persistence in Long-Horizon LLM Agents

cs.LG · 2026-05-22 · unverdicted · novelty 7.0

Introduces QGP and PushBench to evaluate LLM agent persistence on quantitative goals, showing specialized controllers outperform baselines on verifier-checked artifact collection tasks.

A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents

cs.AI · 2026-05-19 · unverdicted · novelty 7.0

Introduces the stochastic-deterministic boundary (SDB) as a load-bearing primitive for LLM agent runtimes and provides a five-step methodology plus catalog of six patterns adapted from distributed systems.

A Case for Agentic Tuning: From Documentation to Action in PostgreSQL

cs.SE · 2026-05-19 · unverdicted · novelty 7.0

PerfEvolve equips LLM agents with executable skills from expert methods to enable dynamic, version-consistent, workload-specific tuning in PostgreSQL, outperforming documentation baselines by up to 35.2% on TPC-C and TPC-H.

From Role to Person: Trust Calibration Challenges in Twin Agents

cs.HC · 2026-05-19 · unverdicted · novelty 7.0

Twin agents as personal digital representations create distinct trust calibration challenges because they dissolve the boundary between AI and human decision-makers, unlike existing frameworks designed for clear separation.

Evaluating Cognitive Age Alignment in Interactive AI Agents

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

The paper presents ChildAgentEval as the first psychometrically grounded benchmark comparing MLLM-based agents' reasoning performance to age-specific human cognitive stages.

Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

SkillTTA synthesizes temporary task-specific skills from retrieved training trajectories to boost LLM agent Pass@1 scores on SpreadsheetBench and BigCodeBench without parameter updates.

ScioMind: Cognitively Grounded Multi-Agent Social Simulation with Anchoring-Based Belief Dynamics and Dynamic Profiles

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

ScioMind combines anchoring-based belief updates, hierarchical memory, and dynamic profiles in LLM multi-agent systems to produce more stable, diverse, and psychologically aligned opinion trajectories than prior fixed-rule or unconstrained approaches.

Attributing Emergence in Million-Agent Systems

cs.AI · 2026-05-12 · unverdicted · novelty 7.0

A scalable Aumann-Shapley attribution method for million-agent systems reveals that small-scale samples structurally misattribute emergence under nonlinear macro indicators, as shown by the Attribution Scaling Bias theorem.

MemQ: Integrating Q-Learning into Self-Evolving Memory Agents over Provenance DAGs

cs.AI · 2026-05-08 · unverdicted · novelty 7.0 · 3 refs

MemQ improves LLM agent performance by using eligibility traces over provenance DAGs to assign credit to dependent memories, achieving top success rates on six benchmarks with largest gains on complex multi-step tasks.

The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.

When Stored Evidence Stops Being Usable: Scale-Conditioned Evaluation of Agent Memory

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

A new evaluation protocol shows agent memory reliability degrades variably with added irrelevant sessions depending on agent, memory interface, and scale.

MEMOREPAIR: Barrier-First Cascade Repair in Agentic Memory

cs.AI · 2026-05-08 · unverdicted · novelty 7.0

MemoRepair formalizes the cascade update problem in agentic memory and solves it via a min-cut reduction that eliminates invalidated memory exposure to 0% while recovering 91-94% of valid successors at 57-76% of baseline repair cost.

ClawCoin: An Agentic AI-Native Cryptocurrency for Decentralized Agent Economies

cs.MA · 2026-04-21 · unverdicted · novelty 7.0

ClawCoin is a compute-cost-indexed token with oracle, vault, and settlement layers that stabilizes multi-agent workflows under cost shocks better than fiat baselines in simulator tests.

When to Forget: A Memory Governance Primitive

cs.AI · 2026-04-13 · unverdicted · novelty 7.0

Memory Worth converges almost surely to the conditional probability of task success given memory retrieval and correlates at rho=0.89 with ground-truth utility in controlled experiments.

ClawVM: Harness-Managed Virtual Memory for Stateful Tool-Using LLM Agents

cs.AI · 2026-04-11 · unverdicted · novelty 7.0

ClawVM introduces a harness-managed virtual memory system for LLM agents that ensures deterministic residency and durability of state under token budgets by using typed pages and validated writeback.

Stories of Your Life as Others: A Round-Trip Evaluation of LLM-Generated Life Stories Conditioned on Rich Psychometric Profiles

cs.CL · 2026-04-07 · unverdicted · novelty 7.0

LLMs conditioned on actual psychometric profiles produce life stories from which independent LLMs recover personality scores at mean r=0.75, 85% of human reliability, with emotional patterns replicating in real human data.

FLARE: Agentic Coverage-Guided Fuzzing for LLM-Based Multi-Agent Systems

cs.SE · 2026-04-07 · unverdicted · novelty 7.0

FLARE extracts specifications from multi-agent LLM code and applies coverage-guided fuzzing to achieve 96.9% inter-agent and 91.1% intra-agent coverage while uncovering 56 new failures across 16 applications.

Bounded Autonomy: Controlling LLM Characters in Live Multiplayer Games

cs.HC · 2026-04-06 · unverdicted · novelty 7.0

Bounded autonomy is a new control architecture that makes LLM characters workable in live multiplayer games by combining interaction stability techniques, action grounding, and lightweight player steering, validated through deployment and analysis.

Soft Tournament Equilibrium

cs.AI · 2026-04-06 · unverdicted · novelty 7.0

STE is a differentiable method to compute continuous analogues of the Top Cycle and Uncovered Set from pairwise comparison data for stable set-valued evaluation of cyclic agent interactions.

Restoration, Exploration and Transformation: How Youth Engage Character.AI Chatbots for Feels, Fun and Finding themselves

cs.HC · 2026-03-10 · unverdicted · novelty 7.0

Youth on Character.AI use chatbots for emotional restoration, creative exploration, and identity transformation, yielding a new three-intent framework and seven-archetype taxonomy from Discord discourse analysis.

The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

Interventions in LLM-simulated user experiments induce distribution shifts in latent attributes that create confounding bias, diagnosable with negative control outcomes and partially mitigated by adding setting-relevant persona details.

Stop Drawing Scientific Claims from LLM Social Simulations Without Robustness Audits

physics.soc-ph · 2026-05-17 · accept · novelty 6.0

Minor perturbations in persona format, instruction framing, and network structure shift cooperation by up to 76 percentage points and polarization metrics consistently, showing that LLM social simulations require per-claim robustness audits via the new TRAILS taxonomy.

citing papers explorer

Showing 9 of 9 citing papers after filters.

Evaluating Very Long-Term Conversational Memory of LLM Agents cs.CL · 2024-02-27 · unverdicted · none · ref 148
Creates LoCoMo benchmark dataset for very long-term LLM conversational memory and shows current models struggle with lengthy dialogues and long-range temporal dynamics.
Skills on the Fly: Test-Time Adaptive Skill Synthesis for LLM Agents cs.CL · 2026-05-16 · unverdicted · none · ref 13
SkillTTA synthesizes temporary task-specific skills from retrieved training trajectories to boost LLM agent Pass@1 scores on SpreadsheetBench and BigCodeBench without parameter updates.
The Moltbook Files: A Harmless Slopocalypse or Humanity's Last Experiment cs.CL · 2026-05-08 · unverdicted · none · ref 149
An AI-agent social platform generated mostly neutral content whose use in fine-tuning reduced model truthfulness comparably to human Reddit data, suggesting limited unique harm but flagging tail risks like secret leaks.
Stories of Your Life as Others: A Round-Trip Evaluation of LLM-Generated Life Stories Conditioned on Rich Psychometric Profiles cs.CL · 2026-04-07 · unverdicted · none · ref 24
LLMs conditioned on actual psychometric profiles produce life stories from which independent LLMs recover personality scores at mean r=0.75, 85% of human reliability, with emotional patterns replicating in real human data.
The Illusion of Intervention: Your LLM-Simulated Experiment is an Observational Study cs.CL · 2026-05-20 · unverdicted · none · ref 14
Interventions in LLM-simulated user experiments induce distribution shifts in latent attributes that create confounding bias, diagnosable with negative control outcomes and partially mitigated by adding setting-relevant persona details.
AgentCollabBench: Diagnosing When Good Agents Make Bad Collaborators cs.CL · 2026-05-09 · unverdicted · none · ref 34
AgentCollabBench shows that multi-agent reliability is limited by communication topology, with converging-DAG nodes causing synthesis bottlenecks that discard constraints and explain 7-40% of information loss variance.
MemRouter: Memory-as-Embedding Routing for Long-Term Conversational Agents cs.CL · 2026-05-01 · unverdicted · none · ref 12
A lightweight supervised router using frozen-LLM embeddings for memory admission decisions outperforms LLM-based memory managers in both F1 score and latency on the LoCoMo benchmark.
Memory in the LLM Era: Modular Architectures and Strategies in a Unified Framework cs.CL · 2026-04-02 · unverdicted · none · ref 74
A unified framework for LLM agent memory is benchmarked, with a new hybrid method outperforming state-of-the-art on standard tasks.
Do LLM Agents Mirror Socio-Cognitive Effects in Power-Asymmetric Conversations? cs.CL · 2026-05-17 · unverdicted · none · ref 83
LLMs assigned high or low status personas in multi-turn dialogues exhibit socio-cognitive effects including language coordination, pronoun patterns, persuasion success, and compliance with unsafe requests.

O'Brien and Carrie Jun Cai and Meredith Ringel Morris and Percy Liang and Michael S

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer