Dissociative Identity: Language Model Agents Lack Grounding for Reputation Mechanisms

Botao Amber Hu; Helena Rong; Max Van Kleek

arxiv: 2605.30169 · v2 · pith:F3SWUHMInew · submitted 2026-05-28 · 💻 cs.CY · cs.AI· cs.MA

Dissociative Identity: Language Model Agents Lack Grounding for Reputation Mechanisms

Botao Amber Hu , Helena Rong , Max Van Kleek This is my paper

Pith reviewed 2026-06-29 00:24 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.MA

keywords language model agentsreputation mechanismsdissociative identityagentic webgovernancetrustbehavioral harnessespersistent identity

0 comments

The pith

Language model agents are dissociative assemblages of mutable modules, rendering identity-based reputation mechanisms structurally inapplicable.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Language model agents consist of changeable components such as foundation models, system prompts, tool policies, external memory, and sometimes multi-agent setups. Any of these can alter behavior, and the resulting persona is fluid and open to adversarial attack without necessarily responding to sanctions. Reputation systems depend on persistent identity that supports behavioral continuity, sanction sensitivity, and costly non-fungibility to create signals and corrective feedback. Without these properties, agents lack grounding for identifiability, predictability, credibility, and rehabilitability. The paper therefore concludes that ex post sanction-based governance like reputation cannot work and proposes a shift to ex ante protocol-based behavioral harnesses.

Core claim

Language model agents are ontologically dissociative: they are essentially an assemblage of mutable modules—foundation models, system prompts, tool-access policies, external memory, and in some cases a multi-agent system as a whole—any of which may change agent behavior, with a fluid persona that is also vulnerable to adversarial attack and may not internalize sanctions. Drawing on dissociative identity disorder jurisprudence, this dissociativity leaves agents without grounding for identifiability, predictability, credibility, and rehabilitability—the very properties that reputation mechanisms aim to sustain—thereby collapsing trust. Identity-based, ex post, regulative, sanction-based govern

What carries the argument

Ontological dissociativity: the composition of language model agents from mutable modules that produce fluid, attack-vulnerable personas unable to maintain behavioral continuity or internalize sanctions.

If this is right

Reputation cannot function as social signals or corrective feedback to sustain trustworthy equilibria among LM agents.
Delegating to unfamiliar agents in the agentic web lacks credible identifiability or rehabilitability.
Trust collapses under identity-based governance for agents whose modules can be swapped or attacked.
Governance must shift from ex post sanction-based regimes to ex ante observability-based protocol harnesses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Verification methods might need to focus on inspecting or constraining the specific modules rather than tracking an overall identity.
Similar dissociativity issues could appear in other modular AI systems beyond language models.
Protocol harnesses would require new standards for observability that apply uniformly across different agent architectures.

Load-bearing premise

Reputation mechanisms require a persistent identity tied to behavioral continuity, sanction sensitivity, and costly non-fungibility.

What would settle it

Demonstration of an LM agent that maintains consistent behavior and internalizes sanctions across changes to its foundation model, prompt, memory, or tool policies would challenge the claim that dissociativity removes grounding for reputation.

Figures

Figures reproduced from arXiv: 2605.30169 by Botao Amber Hu, Helena Rong, Max Van Kleek.

**Figure 1.** Figure 1: The reputation feedback loop. Behavior is aggregated into a reputation bound to identity, which shapes credibility and [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: In contrast to a single, continuous identity in humans, LM agents are dissociative. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

read the original abstract

As autonomous language model agents proliferate, forming an emerging agentic web with real-world consequences, what credibility signals can you use to decide whether to trust an unfamiliar agent in the wild and delegate to it? A natural governance intuition is to extend human identity verification and reputation mechanisms, from ``Know Your Customer'' and credit scores to ``Know Your Agent'' regimes. However, we argue that this analogy is fundamentally incomplete. Reputation mechanisms function both as social signals and as corrective feedback that sustain an equilibrium of trustworthy behavior, presuming a persistent identity associated with behavioral continuity, sanction sensitivity, and costly non-fungibility. Yet language model agents are ontologically \emph{dissociative}: they are essentially an assemblage of mutable modules -- foundation models, system prompts, tool-access policies, external memory, and, in some cases, a multi-agent system as a whole -- any of which may change agent behavior -- with a fluid persona that is also vulnerable to adversarial attack and may not internalize sanctions. Drawing on dissociative identity disorder jurisprudence, this dissociativity leaves agents without grounding for identifiability, predictability, credibility, and rehabilitability -- the very properties that reputation mechanisms aim to sustain -- thereby collapsing trust. We argue that identity-based, ex post, regulative, sanction-based governance, such as reputation, is structurally inapplicable to dissociative agents, and we suggest a shift to observability-based, ex ante, constitutive, protocol-based behavioral harnesses.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper claims reputation systems are structurally inapplicable to LM agents because of their dissociative structure, but the argument transfers human legal requirements without deriving why alternatives cannot supply the missing properties.

read the letter

The main takeaway is that LM agents, built from swappable modules like models, prompts, memory, and tools, lack the persistent identity needed for reputation to function as it does with people. The paper uses dissociative identity disorder cases to argue this breaks identifiability, predictability, credibility, and rehabilitability, so sanction-based governance fails and protocol-based harnesses are needed instead.

What is new is the direct mapping from DID jurisprudence to agent architecture. It does a clear job listing the mutable components and noting how adversarial attacks or changes can alter behavior without the agent internalizing consequences. The framing is consistent and stays within conceptual bounds.

The soft spot is the unexamined step that reputation requires behavioral continuity and sanction sensitivity in exactly the human form. The paper does not show why external mechanisms, such as cryptographic attestation of a fixed module tuple or immutable logs, could not provide equivalent grounding. No agent examples or counter-cases are examined to test whether reputation actually collapses in practice. The 'structurally inapplicable' conclusion therefore rests on the analogy carrying the full load.

This is for readers working on AI governance and agent delegation. Someone thinking about the agentic web would find the question useful even if they reject the strong claim.

It should go to peer review so the legal transfer and possible workarounds can be tested.

Referee Report

2 major / 0 minor

Summary. The paper claims that language model agents are ontologically dissociative—an assemblage of mutable modules (foundation models, system prompts, tool-access policies, external memory) with fluid, adversarially vulnerable personas that lack behavioral continuity, sanction sensitivity, and costly non-fungibility. Drawing on dissociative identity disorder jurisprudence, it argues that these properties are required for reputation mechanisms to serve as social signals and corrective feedback, rendering identity-based, ex post, sanction-based governance structurally inapplicable; it advocates shifting to observability-based, ex ante, protocol-based behavioral harnesses instead.

Significance. If the central analogy holds, the result would challenge the direct transfer of human-derived reputation systems (KYC, credit scores) to autonomous agents and motivate alternative governance architectures in the agentic web. The paper supplies a timely conceptual vocabulary for agent identity that is absent from much current policy discussion, even though the argument remains non-empirical.

major comments (2)

[Abstract] Abstract: The claim that reputation mechanisms 'presum[e] a persistent identity associated with behavioral continuity, sanction sensitivity, and costly non-fungibility' is presented as a premise transferred from human jurisprudence without a derivation showing why these properties cannot be supplied externally (e.g., via cryptographic attestation of a fixed module tuple or immutable action logs). This transfer is load-bearing for the 'structurally inapplicable' conclusion.
[Abstract] Abstract (and the paragraph beginning 'Yet language model agents are ontologically dissociative'): The argument that dissociativity 'leaves agents without grounding for identifiability, predictability, credibility, and rehabilitability' rests on the unexamined premise that these properties must be internal to the agent rather than externally enforced; no formal model or counter-example analysis is supplied to establish necessity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these incisive comments on the abstract. They correctly flag that the transfer of premises from human jurisprudence requires more explicit justification regarding why external mechanisms cannot substitute for internal properties. We address each point below and commit to targeted revisions that strengthen the argument without altering its core claims.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that reputation mechanisms 'presum[e] a persistent identity associated with behavioral continuity, sanction sensitivity, and costly non-fungibility' is presented as a premise transferred from human jurisprudence without a derivation showing why these properties cannot be supplied externally (e.g., via cryptographic attestation of a fixed module tuple or immutable action logs). This transfer is load-bearing for the 'structurally inapplicable' conclusion.

Authors: The referee accurately notes the load-bearing nature of this premise. The manuscript's position is that LM agents' modular mutability allows any attested tuple or log to be altered post hoc (e.g., by swapping the foundation model or system prompt), which severs the continuity required for sanctions to function as corrective feedback. We will add a short derivation paragraph immediately following the abstract claim and a clarifying sentence in the abstract itself to make this explicit. revision_made = 'yes' revision: yes
Referee: [Abstract] Abstract (and the paragraph beginning 'Yet language model agents are ontologically dissociative'): The argument that dissociativity 'leaves agents without grounding for identifiability, predictability, credibility, and rehabilitability' rests on the unexamined premise that these properties must be internal to the agent rather than externally enforced; no formal model or counter-example analysis is supplied to establish necessity.

Authors: We accept that the necessity claim would benefit from explicit counter-example analysis. The dissociativity argument holds that external enforcement cannot bind a mutable assemblage without eliminating the very flexibility that defines current LM agents; an attested agent can still be re-prompted or re-modularized to evade prior sanctions. We will insert a concise counter-example subsection in the main text (drawing on the existing dissociative identity analogy) to illustrate this. A full formal model lies beyond the paper's conceptual scope but the counter-example will address the referee's concern directly. revision_made = 'partial' revision: partial

Circularity Check

0 steps flagged

No significant circularity; self-contained conceptual reasoning

full rationale

The paper advances a philosophical and ontological argument that LM agents are dissociative and thus lack grounding for reputation mechanisms, drawing premises from human jurisprudence and dissociative identity disorder cases. No equations, fitted parameters, or derivations appear in the provided text. No self-citations are invoked as load-bearing support for the central claim. The mapping from human identity properties to agents is presented as reasoned extension rather than a self-referential definition or fit. This matches the default expectation of a non-circular conceptual paper; the argument is self-contained against external benchmarks and does not reduce any result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on one central domain assumption about the functional requirements of reputation mechanisms and introduces the concept of ontological dissociativity as an explanatory property of LM agents.

axioms (1)

domain assumption Reputation mechanisms function through persistent identity associated with behavioral continuity, sanction sensitivity, and costly non-fungibility.
Invoked as the basis for why such mechanisms sustain trustworthy equilibria in humans but cannot do so for agents.

pith-pipeline@v0.9.1-grok · 5798 in / 1176 out tokens · 33889 ms · 2026-06-29T00:24:55.851368+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

138 extracted references · 116 canonical work pages · 10 internal anchors

[1]

Ryan Abbott and Alex Sarch. 2020. Punishing Artificial Intelligence: Legal Fiction or Science Fiction. InIs Law Computable?Hart Publishing. doi:10.5040/9781509937097.ch-008

work page doi:10.5040/9781509937097.ch-008 2020
[2]

Marwa Abdulhai, Ryan Cheng, Donovan Clay, Tim Althoff, Sergey Levine, and Natasha Jaques. 2025. Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning. doi:10.48550/arXiv.2511.00222

work page doi:10.48550/arxiv.2511.00222 2025
[3]

Deepak Bhaskar Acharya, Karthigeyan Kuppan, and B. Divya. 2025. Agentic AI: Autonomous Intelligence for Complex Goals—A Comprehensive Survey.IEEE Access13 (2025), 18912–18936. doi:10.1109/ACCESS.2025.3532853

work page doi:10.1109/access.2025.3532853 2025
[4]

Elif Akata, Lion Schulz, Julian Coda-Forno, Seong Joon Oh, Matthias Bethge, and Eric Schulz. 2025. Playing repeated games with large language models.Nature Human Behaviour9, 7 (May 2025), 1380–1390. doi:10.1038/s41562-025-02172-y

work page doi:10.1038/s41562-025-02172-y 2025
[5]

2013.Diagnostic and Statistical Manual of Mental Disorders

American Psychiatric Association. 2013.Diagnostic and Statistical Manual of Mental Disorders. American Psychiatric Association. doi:10.1176/appi.books.9780890425596

work page doi:10.1176/appi.books.9780890425596 2013
[6]

Maksym Andriushchenko, Francesco Croce, and Nicolas Flammarion. 2025. Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks. InProceedings of the 13th International Conference on Learning Representations (ICLR). doi:10.48550/arXiv.2404.02151

work page doi:10.48550/arxiv.2404.02151 2025
[7]

Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeart...

work page doi:10.48550/arxiv.2404.09932 2024
[8]

Arbel, Simon Goldstein, and Peter Salib

Yonathan A. Arbel, Simon Goldstein, and Peter Salib. 2026. How to Count AIs: Individuation and Liability for AI Agents.SSRN Electronic Journal(2026). doi:10.2139/ssrn.6273198

work page doi:10.2139/ssrn.6273198 2026
[9]

1984.The Evolution of Cooperation

Robert Axelrod. 1984.The Evolution of Cooperation. Basic Books

1984
[10]

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, K...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.08073 2022
[11]

Annette Baier. 1986. Trust and Antitrust.Ethics96, 2 (Jan. 1986), 231–260. doi:10.1086/292745

work page doi:10.1086/292745 1986
[12]

and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret

Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’21). ACM, 610–623. doi:10.1145/3442188.3445922

work page doi:10.1145/3442188.3445922 2021
[13]

Jan Betley, Niels Warncke, Anna Sztyber-Betley, Daniel Tan, Xuchan Bao, Martín Soto, Megha Srivastava, Nathan Labenz, and Owain Evans. 2026. Training Large Language Models on Narrow Tasks Can Lead to Broad Misalignment.Nature649 (2026), 584–589. doi:10.1038/s41586-025-09937-5

work page doi:10.1038/s41586-025-09937-5 2026
[14]

Brown, Johnathan Flowers, Anthony Ventura, and Hanlin Konya

Abeba Birhane, Elayne Ruane, Thomas Laurent, Matthew S. Brown, Johnathan Flowers, Anthony Ventura, and Hanlin Konya. 2024. AI Auditing: The Broken Bus on the Road to AI Accountability. InProceedings of the IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). doi:10.1109/satml59370.2024.00037

work page doi:10.1109/satml59370.2024.00037 2024
[15]

Piercosma Bisconti, Marcello Galisai, Federico Pierucci, Marcantonio Bracale, and Matteo Prandi. 2025. Beyond Single-Agent Safety: A Taxonomy of Risks in LLM-to-LLM Interactions. doi:10.48550/arXiv.2512.02682

work page doi:10.48550/arxiv.2512.02682 2025
[16]

Bolton, Elena Katok, and Axel Ockenfels

Gary E. Bolton, Elena Katok, and Axel Ockenfels. 2004. How Effective Are Electronic Reputation Mechanisms? An Experimental Investigation.Management Science50, 11 (2004), 1587–1602. doi:10.1287/mnsc.1030.0199

work page doi:10.1287/mnsc.1030.0199 2004
[17]

Richerson

Robert Boyd and Peter J. Richerson. 1992. Punishment allows the evolution of cooperation (or anything else) in sizable groups.Ethology and Sociobiology13, 3 (May 1992), 171–195. doi:10.1016/0162-3095(92)90032-Y

work page doi:10.1016/0162-3095(92)90032-y 1992
[18]

Diego De Siqueira Braga, Marco Niemann, Bernd Hellingrath, and Fernando Buarque De Lima Neto. 2018. Survey on Computational Trust and Reputation Models.ACM Comput. Surv.51, 5, Article 101 (Nov. 2018), 40 pages. doi:10.1145/3236008

work page doi:10.1145/3236008 2018
[19]

Stephen E. Braude. 1995.First Person Plural: Multiple Personality and the Philosophy of Mind. Rowman & Littlefield

1995
[20]

2004.The Economy of Esteem: An Essay on Civil and Political Society

Geoffrey Brennan and Philip Pettit. 2004.The Economy of Esteem: An Essay on Civil and Political Society. Oxford University Press. doi:10.1093/0199246483.001.0001

work page doi:10.1093/0199246483.001.0001 2004
[21]

Bryson, Mihailis E

Joanna J. Bryson, Mihailis E. Diamantis, and Thomas D. Grant. 2017. Of, for, and by the people: the legal lacuna of synthetic persons. Artificial Intelligence and Law25, 3 (Sept. 2017), 273–291. doi:10.1007/s10506-017-9214-9 Dissociative Identity FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

work page doi:10.1007/s10506-017-9214-9 2017
[22]

Luís Cabral and Ali Hortaçsu. 2010. The Dynamics of Seller Reputation: Evidence from eBay.The Journal of Industrial Economics58, 1 (March 2010), 54–78. doi:10.1111/j.1467-6451.2010.00405.x

work page doi:10.1111/j.1467-6451.2010.00405.x 2010
[23]

Florian Carichon, Aditi Khandelwal, Marylou Fauchard, and Golnoosh Farnadi. 2025. The Coming Crisis of Multi-Agent Misalignment: AI Alignment Must Be a Dynamic and Social Process. doi:10.48550/arXiv.2506.01080

work page doi:10.48550/arxiv.2506.01080 2025
[24]

Tomer Jordi Chaffer. 2025. Know Your Agent: Governing AI Identity on the Agentic Web. (2025). doi:10.2139/ssrn.5162127

work page doi:10.2139/ssrn.5162127 2025
[25]

Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, and Markus Anderljung. 2024. Visibility into AI Agents. InThe 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24). ACM, 958–973. doi:10.1145/3630106.3658948

work page doi:10.1145/3630106.3658948 2024
[26]

Alan Chan, Noam Kolt, Peter Wills, Usman Anwar, Christian Schroeder de Witt, Nitarshan Rajkumar, Lewis Hammond, David Krueger, Lennart Heim, and Markus Anderljung. 2024. IDs for AI Systems. doi:10.48550/arXiv.2406.12137

work page doi:10.48550/arxiv.2406.12137 2024
[27]

Hadfield, and Markus Anderljung

Alan Chan, Kevin Wei, Sihao Huang, Nitarshan Rajkumar, Elija Perrier, Seth Lazar, Gillian K. Hadfield, and Markus Anderljung. 2025. Infrastructure for AI Agents. doi:10.48550/arXiv.2501.10114

work page doi:10.48550/arxiv.2501.10114 2025
[28]

Runjin Chen, Andy Arditi, Henry Sleight, Owain Evans, and Jack Lindsey. 2025. Persona Vectors: Monitoring and Controlling Character Traits in Language Models. doi:10.48550/ARXIV.2507.21509

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2507.21509 2025
[29]

Alice Cheng and Eric Friedman. 2005. Sybilproof reputation mechanisms. InProceeding of the 2005 ACM SIGCOMM Workshop on Economics of Peer-to-Peer Systems (P2PECON ’05). ACM Press, 128–132. doi:10.1145/1080192.1080202

work page doi:10.1145/1080192.1080202 2005
[30]

Mohd Sameen Chishti, Damilare Peter Oyinloye, and Jingyue Li. 2026. AgentReputation: A Decentralized Agentic AI Reputation Framework. doi:10.48550/ARXIV.2605.00073

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.00073 2026
[31]

Andy Clark and David Chalmers. 1998. The Extended Mind.Analysis58, 1 (Jan. 1998), 7–19. doi:10.1093/analys/58.1.7

work page doi:10.1093/analys/58.1.7 1998
[32]

Xin Dai. 2018. Toward a Reputation State: The Social Credit System Project of China.SSRN Electronic Journal(2018). doi:10.2139/ssrn. 3193577

work page doi:10.2139/ssrn 2018
[33]

Antonio R. Damasio. 1994.Descartes’ Error: Emotion, Reason, and the Human Brain. G.P. Putnam’s Sons

1994
[34]

John Danaher. 2016. Robots, law and the retribution gap.Ethics and Information Technology18, 4 (May 2016), 299–309. doi:10.1007/s10676- 016-9403-3

work page doi:10.1007/s10676- 2016
[35]

2025.ERC-8004: Trustless Agents

Marco De Rossi, Davide Crapis, Jordan Ellis, and Erik Reppel. 2025.ERC-8004: Trustless Agents. Technical Report. https://eips.ethereum. org/EIPS/eip-8004 Ethereum Improvement Proposal, Draft

2025
[36]

Chrysanthos Dellarocas. 2003. The Digitization of Word of Mouth: Promise and Challenges of Online Feedback Mechanisms.Manage- ment Science49, 10 (Oct. 2003), 1407–1424. doi:10.1287/mnsc.49.10.1407.17308

work page doi:10.1287/mnsc.49.10.1407.17308 2003
[37]

Daniel C. Dennett. 1992. The Self as a Center of Narrative Gravity. InSelf and Consciousness: Multiple Perspectives, Frank S. Kessel, Pamela M. Cole, and Dale L. Johnson (Eds.). Lawrence Erlbaum Associates, 103–115

1992
[38]

Ameet Deshpande, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, and Karthik Narasimhan. 2023. Toxicity in ChatGPT: Analyzing Persona-Assigned Language Models. InFindings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics, 1236–1270. doi:10.18653/v1/2023.findings-emnlp.88

work page doi:10.18653/v1/2023.findings-emnlp.88 2023
[39]

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui. 2024. A Survey on In-Context Learning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1107–1128. doi:10.18653/v1/2024.emnlp-main.64

work page doi:10.18653/v1/2024.emnlp-main.64 2024
[40]

Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. 2025. MINJA: Memory Injection Attacks on LLM Agents via Query-Only Interaction. InAdvances in Neural Information Processing Systems (NeurIPS). doi:10.48550/arXiv.2503.03704

work page doi:10.48550/arxiv.2503.03704 2025
[41]

John R. Douceur. 2002.The Sybil Attack. Springer Berlin Heidelberg, 251–260. doi:10.1007/3-540-45748-8_24

work page doi:10.1007/3-540-45748-8_24 2002
[42]

Raymond Douglas, Jan Kulveit, Ondřej Havlíček, Theia Pearson-Vogel, Owen Cotton-Barratt, and David Duvenaud. 2026. The Artificial Self: Characterising the Landscape of AI Identity. doi:10.48550/arXiv.2603.11353

work page doi:10.48550/arxiv.2603.11353 2026
[43]

R. A. Duff. 2007.Answering for Crime: Responsibility and Liability in the Criminal Law. Hart Publishing. doi:10.5040/9781472560155

work page doi:10.5040/9781472560155 2007
[44]

Abul Ehtesham, Aditi Singh, Gaurav Kumar Gupta, and Saket Kumar. 2025. A Survey of Agent Interoperability Protocols: Model Context Protocol (MCP), Agent Communication Protocol (ACP), Agent-to-Agent Protocol (A2A), and Agent Network Protocol (ANP). doi:10.48550/ARXIV.2505.02279

work page doi:10.48550/arxiv.2505.02279 2025
[45]

Eisenberger, Matthew D

Naomi I. Eisenberger, Matthew D. Lieberman, and Kipling D. Williams. 2003. Does Rejection Hurt? An fMRI Study of Social Exclusion. Science302, 5643 (Oct. 2003), 290–292. doi:10.1126/science.1089134

work page doi:10.1126/science.1089134 2003
[46]

Madeleine Clare Elish. 2019. Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction.Engaging Science, Technology, and Society5 (March 2019), 40–60. doi:10.17351/ests2019.260

work page doi:10.17351/ests2019.260 2019
[47]

Karen Elliott, Kovila Coopamootoo, Edward Curran, Paul Ezhilchelvan, Samantha Finnigan, Dave Horsfall, Zhichao Ma, Magdalene Ng, Tasos Spiliotopoulos, Han Wu, and Aad van Moorsel. 2022. Know Your Customer: Balancing Innovation and Regulation for Financial Inclusion.Data & Policy4 (2022), e34. doi:10.1017/dap.2022.23

work page doi:10.1017/dap.2022.23 2022
[48]

Ernst Fehr and Simon Gächter. 2002. Altruistic punishment in humans.Nature415, 6868 (Jan. 2002), 137–140. doi:10.1038/415137a FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Botao Amber Hu, Helena Rong, and Max Van Kleek

work page doi:10.1038/415137a 2002
[49]

Horton, and Joseph M

Apostolos Filippas, John J. Horton, and Joseph M. Golden. 2022. Reputation Inflation.Marketing Science41, 4 (July 2022), 733–745. doi:10.1287/mksc.2022.1350

work page doi:10.1287/mksc.2022.1350 2022
[50]

B. J. Fogg. 2003. Prominence-Interpretation Theory: Explaining How People Assess Credibility Online. InCHI ’03 Extended Abstracts on Human Factors in Computing Systems (CHI ’03). ACM Press, 722–723. doi:10.1145/765891.765951

work page doi:10.1145/765891.765951 2003
[51]

B. J. Fogg, Jonathan Marshall, Othman Laraki, Alex Osipovich, Chris Varma, Nicholas Fang, Jyoti Paul, Akshay Rangnekar, John Shon, Preeti Swani, and Marissa Treinen. 2001. What Makes Web Sites Credible? A Report on a Large Quantitative Study. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’01). ACM, 61–68. doi:10.1145/36...

work page doi:10.1145/365024.365037 2001
[52]

Friedman and Paul Resnick

Eric J. Friedman and Paul Resnick. 2001. The Social Cost of Cheap Pseudonyms.Journal of Economics & Management Strategy10, 2 (June 2001), 173–199. doi:10.1111/j.1430-9134.2001.00173.x

work page doi:10.1111/j.1430-9134.2001.00173.x 2001
[53]

Drew Fudenberg and David K. Levine. 1989. Reputation and Equilibrium Selection in Games with a Patient Player.Econometrica57, 4 (July 1989), 759–778. doi:10.2307/1913771

work page doi:10.2307/1913771 1989
[54]

Sandro Garzon, Awid Vaziry, Enis Kuzu, Dennis Gehrmann, Buse Varkan, Alexander Gaballa, and Axel Küpper. 2026. AI Agents with Decentralized Identifiers and Verifiable Credentials. InProceedings of the 18th International Conference on Agents and Artificial Intelligence. SCITEPRESS, 252–259. doi:10.5220/0014234400004052

work page doi:10.5220/0014234400004052 2026
[55]

Tao Ge, Xin Chan, Xiaoyang Wang, Dian Yu, Haitao Mi, and Dong Yu. 2024. Scaling Synthetic Data Creation with 1,000,000,000 Personas. doi:10.48550/arXiv.2406.20094

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2406.20094 2024
[56]

Jones Granatyr, Vanderson Botelho, Otto Robert Lessing, Edson Emílio Scalabrin, Jean-Paul Barthès, and Fabrício Enembreck. 2015. Trust and Reputation Models for Multiagent Systems.Comput. Surveys48, 2 (Oct. 2015), 1–42. doi:10.1145/2816826

work page doi:10.1145/2816826 2015
[57]

Zihan Guo, Yuanjian Zhou, Chenyi Wang, Linlin You, Minjie Bian, and Weinan Zhang. 2025. BetaWeb: Towards a Blockchain-enabled Trustworthy Agentic Web. doi:10.48550/ARXIV.2508.13787

work page doi:10.48550/arxiv.2508.13787 2025
[58]

Lewis Hammond, Alan Chan, Jesse Clifton, Jason Hoelscher-Obermaier, Akbir Khan, Euan McLean, Chandler Smith, Wolfram Barfuss, Jakob Foerster, Tomáš Gavenčiak, The Anh Han, Edward Hughes, Vojtěch Kovařík, Jan Kulveit, Joel Z. Leibo, Caspar Oesterheld, Christian Schroeder de Witt, Nisarg Shah, Michael Wellman, Paolo Bova, Theodor Cimpeanu, Carson Ezell, Que...

work page doi:10.48550/arxiv.2502.14143 2025
[59]

H. L. A. Hart. 1968.Punishment and Responsibility: Essays in the Philosophy of Law. Oxford University Press. doi:10.1093/acprof: oso/9780199534777.001.0001

work page doi:10.1093/acprof: 1968
[60]

Joseph Henrich, Richard McElreath, Abigail Barr, Jean Ensminger, Clark Barrett, Alexander Bolyanatz, Juan Camilo Cardenas, Michael Gurven, Edwins Gwako, Natalie Henrich, Carolyn Lesorogol, Frank Marlowe, David Tracer, and John Ziker. 2006. Costly Punishment Across Human Societies.Science312, 5781 (June 2006), 1767–1770. doi:10.1126/science.1127333

work page doi:10.1126/science.1127333 2006
[61]

Botao Amber Hu and Helena Rong. 2025. Spore in the Wild: A Case Study of Spore.fun as an Open-Environment Evolution Experiment with Sovereign AI Agents on TEE-Secured Blockchains. doi:10.1162/ISAL.a.838

work page doi:10.1162/isal.a.838 2025
[62]

Liao, Esin Durmus, Alex Tamkin, and Deep Ganguli

Saffron Huang, Divya Siddarth, Liane Lovitt, Thomas I. Liao, Esin Durmus, Alex Tamkin, and Deep Ganguli. 2024. Collective Constitutional AI: Aligning a Language Model with Public Input. doi:10.48550/arXiv.2406.07814

work page doi:10.48550/arxiv.2406.07814 2024
[63]

Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2401.05566 2024
[64]

Jennings, and Nigel R

Trung Dong Huynh, Nicholas R. Jennings, and Nigel R. Shadbolt. 2006. An integrated trust and reputation model for open multi-agent systems.Autonomous Agents and Multi-Agent Systems13, 2 (March 2006), 119–154. doi:10.1007/s10458-005-6825-4

work page doi:10.1007/s10458-005-6825-4 2006
[65]

2026.International AI Safety Report 2026

International AI Safety Report Consortium. 2026.International AI Safety Report 2026. Technical Report. International AI Safety Report. https://internationalaisafetyreport.org/sites/default/files/2026-02/international-ai-safety-report-2026.pdf

2026
[66]

Saito, and Norihiro Sadato

Keise Izuma, Daisuke N. Saito, and Norihiro Sadato. 2008. Processing of Social and Monetary Rewards in the Human Striatum.Neuron 58, 2 (April 2008), 284–294. doi:10.1016/j.neuron.2008.03.020

work page doi:10.1016/j.neuron.2008.03.020 2008
[67]

Jiaming Ji, Wenqi Chen, Kaile Wang, Donghai Hong, Sitong Fang, Boyuan Chen, Jiayi Zhou, Juntao Dai, Sirui Han, Yike Guo, and Yaodong Yang. 2025. Mitigating Deceptive Alignment via Self-Monitoring. doi:10.48550/arXiv.2505.18807

work page doi:10.48550/arxiv.2505.18807 2025
[68]

Audun Jøsang and Roslan Ismail. 2002. The Beta Reputation System. InProceedings of the 15th Bled Electronic Commerce Conference. 2502–2511

2002
[69]

Comsa, Anastasiya Sakovych, Isabella Logothetis, Zejia Zhang, Blaise Agüera y Arcas, and Jonathan Birch

Geoff Keeling, Winnie Street, Martyna Stachaczyk, Daria Zakharova, Iulia M. Comsa, Anastasiya Sakovych, Isabella Logothetis, Zejia Zhang, Blaise Agüera y Arcas, and Jonathan Birch. 2024. Can LLMs Make Trade-Offs Involving Stipulated Pain and Pleasure States? doi:10.48550/arXiv.2411.02432 Dissociative Identity FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

work page doi:10.48550/arxiv.2411.02432 2024
[70]

Bryan Kolb and Robbin Gibb. 2011. Brain Plasticity and Behaviour in the Developing Brain.Journal of the Canadian Academy of Child and Adolescent Psychiatry20, 4 (2011), 265–276

2011
[71]

Noam Kolt. 2024. Governing AI Agents.SSRN Electronic Journal(2024). doi:10.2139/ssrn.4772956

work page doi:10.2139/ssrn.4772956 2024
[72]

Kreps and Robert Wilson

David M. Kreps and Robert Wilson. 1982. Reputation and imperfect information.Journal of Economic Theory27, 2 (Aug. 1982), 253–279. doi:10.1016/0022-0531(82)90030-8

work page doi:10.1016/0022-0531(82)90030-8 1982
[73]

Ziegler, Elizabeth Barnes, and Lawrence Chan

Thomas Kwa, Ben West, Joel Becker, Amy Deng, Katharyn Garcia, Max Hasin, Sami Jawhar, Megan Kinniment, Nate Rush, Sydney Von Arx, Ryan Bloom, Thomas Broadley, Haoxing Du, Brian Goodrich, Nikola Jurkovic, Luke Harold Miles, Seraphina Nix, Tao Lin, Neev Parikh, David Rein, Lucas Jun Koba Sato, Hjalmar Wijk, Daniel M. Ziegler, Elizabeth Barnes, and Lawrence ...

work page doi:10.48550/arxiv.2503.14499 2025
[74]

Donghyun Lee and Mo Tiwari. 2024. Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems. doi:10.48550/arXiv. 2410.07283

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2024
[75]

Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, and Xing Xie. 2023. Large Language Models Understand and Can Be Enhanced by Emotional Stimuli. doi:10.48550/arXiv.2307.11760

work page doi:10.48550/arxiv.2307.11760 2023
[76]

Gabriel Lima, Meeyoung Cha, Chihyung Jeon, and Kyung Sin Park. 2021. The Conflict Between People’s Urge to Punish AI and Legal Systems.Frontiers in Robotics and AI8 (Nov. 2021). doi:10.3389/frobt.2021.756242

work page doi:10.3389/frobt.2021.756242 2021
[77]

Zibin Lin, Shengli Zhang, Guofu Liao, Dacheng Tao, and Taotao Wang. 2025. Binding Agent ID: Unleashing the Power of AI Agents with Accountability and Credibility. doi:10.48550/arXiv.2512.17538

work page doi:10.48550/arxiv.2512.17538 2025
[78]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics12 (2024), 157–173. doi:10.1162/tacl_a_00638

work page doi:10.1162/tacl_a_00638 2024
[79]

Christina Lu, Jack Gallagher, Jonathan Michala, Kyle Fish, and Jack Lindsey. 2025. The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models. doi:10.48550/arXiv.2601.10387

work page doi:10.48550/arxiv.2601.10387 2025
[80]

Keming Lu, Bowen Yu, Chang Zhou, and Jingren Zhou. 2024. Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Bangkok, Thailand, 7828–7840. do...

work page doi:10.18653/v1/2024.acl-long.423 2024

Showing first 80 references.

[1] [1]

Ryan Abbott and Alex Sarch. 2020. Punishing Artificial Intelligence: Legal Fiction or Science Fiction. InIs Law Computable?Hart Publishing. doi:10.5040/9781509937097.ch-008

work page doi:10.5040/9781509937097.ch-008 2020

[2] [2]

Marwa Abdulhai, Ryan Cheng, Donovan Clay, Tim Althoff, Sergey Levine, and Natasha Jaques. 2025. Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning. doi:10.48550/arXiv.2511.00222

work page doi:10.48550/arxiv.2511.00222 2025

[3] [3]

Deepak Bhaskar Acharya, Karthigeyan Kuppan, and B. Divya. 2025. Agentic AI: Autonomous Intelligence for Complex Goals—A Comprehensive Survey.IEEE Access13 (2025), 18912–18936. doi:10.1109/ACCESS.2025.3532853

work page doi:10.1109/access.2025.3532853 2025

[4] [4]

Elif Akata, Lion Schulz, Julian Coda-Forno, Seong Joon Oh, Matthias Bethge, and Eric Schulz. 2025. Playing repeated games with large language models.Nature Human Behaviour9, 7 (May 2025), 1380–1390. doi:10.1038/s41562-025-02172-y

work page doi:10.1038/s41562-025-02172-y 2025

[5] [5]

2013.Diagnostic and Statistical Manual of Mental Disorders

American Psychiatric Association. 2013.Diagnostic and Statistical Manual of Mental Disorders. American Psychiatric Association. doi:10.1176/appi.books.9780890425596

work page doi:10.1176/appi.books.9780890425596 2013

[6] [6]

Maksym Andriushchenko, Francesco Croce, and Nicolas Flammarion. 2025. Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks. InProceedings of the 13th International Conference on Learning Representations (ICLR). doi:10.48550/arXiv.2404.02151

work page doi:10.48550/arxiv.2404.02151 2025

[7] [7]

Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeart...

work page doi:10.48550/arxiv.2404.09932 2024

[8] [8]

Arbel, Simon Goldstein, and Peter Salib

Yonathan A. Arbel, Simon Goldstein, and Peter Salib. 2026. How to Count AIs: Individuation and Liability for AI Agents.SSRN Electronic Journal(2026). doi:10.2139/ssrn.6273198

work page doi:10.2139/ssrn.6273198 2026

[9] [9]

1984.The Evolution of Cooperation

Robert Axelrod. 1984.The Evolution of Cooperation. Basic Books

1984

[10] [10]

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, K...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.08073 2022

[11] [11]

Annette Baier. 1986. Trust and Antitrust.Ethics96, 2 (Jan. 1986), 231–260. doi:10.1086/292745

work page doi:10.1086/292745 1986

[12] [12]

and Gebru, Timnit and McMillan-Major, Angelina and Shmitchell, Shmargaret

Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’21). ACM, 610–623. doi:10.1145/3442188.3445922

work page doi:10.1145/3442188.3445922 2021

[13] [13]

Jan Betley, Niels Warncke, Anna Sztyber-Betley, Daniel Tan, Xuchan Bao, Martín Soto, Megha Srivastava, Nathan Labenz, and Owain Evans. 2026. Training Large Language Models on Narrow Tasks Can Lead to Broad Misalignment.Nature649 (2026), 584–589. doi:10.1038/s41586-025-09937-5

work page doi:10.1038/s41586-025-09937-5 2026

[14] [14]

Brown, Johnathan Flowers, Anthony Ventura, and Hanlin Konya

Abeba Birhane, Elayne Ruane, Thomas Laurent, Matthew S. Brown, Johnathan Flowers, Anthony Ventura, and Hanlin Konya. 2024. AI Auditing: The Broken Bus on the Road to AI Accountability. InProceedings of the IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). doi:10.1109/satml59370.2024.00037

work page doi:10.1109/satml59370.2024.00037 2024

[15] [15]

Piercosma Bisconti, Marcello Galisai, Federico Pierucci, Marcantonio Bracale, and Matteo Prandi. 2025. Beyond Single-Agent Safety: A Taxonomy of Risks in LLM-to-LLM Interactions. doi:10.48550/arXiv.2512.02682

work page doi:10.48550/arxiv.2512.02682 2025

[16] [16]

Bolton, Elena Katok, and Axel Ockenfels

Gary E. Bolton, Elena Katok, and Axel Ockenfels. 2004. How Effective Are Electronic Reputation Mechanisms? An Experimental Investigation.Management Science50, 11 (2004), 1587–1602. doi:10.1287/mnsc.1030.0199

work page doi:10.1287/mnsc.1030.0199 2004

[17] [17]

Richerson

Robert Boyd and Peter J. Richerson. 1992. Punishment allows the evolution of cooperation (or anything else) in sizable groups.Ethology and Sociobiology13, 3 (May 1992), 171–195. doi:10.1016/0162-3095(92)90032-Y

work page doi:10.1016/0162-3095(92)90032-y 1992

[18] [18]

Diego De Siqueira Braga, Marco Niemann, Bernd Hellingrath, and Fernando Buarque De Lima Neto. 2018. Survey on Computational Trust and Reputation Models.ACM Comput. Surv.51, 5, Article 101 (Nov. 2018), 40 pages. doi:10.1145/3236008

work page doi:10.1145/3236008 2018

[19] [19]

Stephen E. Braude. 1995.First Person Plural: Multiple Personality and the Philosophy of Mind. Rowman & Littlefield

1995

[20] [20]

2004.The Economy of Esteem: An Essay on Civil and Political Society

Geoffrey Brennan and Philip Pettit. 2004.The Economy of Esteem: An Essay on Civil and Political Society. Oxford University Press. doi:10.1093/0199246483.001.0001

work page doi:10.1093/0199246483.001.0001 2004

[21] [21]

Bryson, Mihailis E

Joanna J. Bryson, Mihailis E. Diamantis, and Thomas D. Grant. 2017. Of, for, and by the people: the legal lacuna of synthetic persons. Artificial Intelligence and Law25, 3 (Sept. 2017), 273–291. doi:10.1007/s10506-017-9214-9 Dissociative Identity FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

work page doi:10.1007/s10506-017-9214-9 2017

[22] [22]

Luís Cabral and Ali Hortaçsu. 2010. The Dynamics of Seller Reputation: Evidence from eBay.The Journal of Industrial Economics58, 1 (March 2010), 54–78. doi:10.1111/j.1467-6451.2010.00405.x

work page doi:10.1111/j.1467-6451.2010.00405.x 2010

[23] [23]

Florian Carichon, Aditi Khandelwal, Marylou Fauchard, and Golnoosh Farnadi. 2025. The Coming Crisis of Multi-Agent Misalignment: AI Alignment Must Be a Dynamic and Social Process. doi:10.48550/arXiv.2506.01080

work page doi:10.48550/arxiv.2506.01080 2025

[24] [24]

Tomer Jordi Chaffer. 2025. Know Your Agent: Governing AI Identity on the Agentic Web. (2025). doi:10.2139/ssrn.5162127

work page doi:10.2139/ssrn.5162127 2025

[25] [25]

Alan Chan, Carson Ezell, Max Kaufmann, Kevin Wei, Lewis Hammond, Herbie Bradley, Emma Bluemke, Nitarshan Rajkumar, David Krueger, Noam Kolt, Lennart Heim, and Markus Anderljung. 2024. Visibility into AI Agents. InThe 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24). ACM, 958–973. doi:10.1145/3630106.3658948

work page doi:10.1145/3630106.3658948 2024

[26] [26]

Alan Chan, Noam Kolt, Peter Wills, Usman Anwar, Christian Schroeder de Witt, Nitarshan Rajkumar, Lewis Hammond, David Krueger, Lennart Heim, and Markus Anderljung. 2024. IDs for AI Systems. doi:10.48550/arXiv.2406.12137

work page doi:10.48550/arxiv.2406.12137 2024

[27] [27]

Hadfield, and Markus Anderljung

Alan Chan, Kevin Wei, Sihao Huang, Nitarshan Rajkumar, Elija Perrier, Seth Lazar, Gillian K. Hadfield, and Markus Anderljung. 2025. Infrastructure for AI Agents. doi:10.48550/arXiv.2501.10114

work page doi:10.48550/arxiv.2501.10114 2025

[28] [28]

Runjin Chen, Andy Arditi, Henry Sleight, Owain Evans, and Jack Lindsey. 2025. Persona Vectors: Monitoring and Controlling Character Traits in Language Models. doi:10.48550/ARXIV.2507.21509

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2507.21509 2025

[29] [29]

Alice Cheng and Eric Friedman. 2005. Sybilproof reputation mechanisms. InProceeding of the 2005 ACM SIGCOMM Workshop on Economics of Peer-to-Peer Systems (P2PECON ’05). ACM Press, 128–132. doi:10.1145/1080192.1080202

work page doi:10.1145/1080192.1080202 2005

[30] [30]

Mohd Sameen Chishti, Damilare Peter Oyinloye, and Jingyue Li. 2026. AgentReputation: A Decentralized Agentic AI Reputation Framework. doi:10.48550/ARXIV.2605.00073

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2605.00073 2026

[31] [31]

Andy Clark and David Chalmers. 1998. The Extended Mind.Analysis58, 1 (Jan. 1998), 7–19. doi:10.1093/analys/58.1.7

work page doi:10.1093/analys/58.1.7 1998

[32] [32]

Xin Dai. 2018. Toward a Reputation State: The Social Credit System Project of China.SSRN Electronic Journal(2018). doi:10.2139/ssrn. 3193577

work page doi:10.2139/ssrn 2018

[33] [33]

Antonio R. Damasio. 1994.Descartes’ Error: Emotion, Reason, and the Human Brain. G.P. Putnam’s Sons

1994

[34] [34]

John Danaher. 2016. Robots, law and the retribution gap.Ethics and Information Technology18, 4 (May 2016), 299–309. doi:10.1007/s10676- 016-9403-3

work page doi:10.1007/s10676- 2016

[35] [35]

2025.ERC-8004: Trustless Agents

Marco De Rossi, Davide Crapis, Jordan Ellis, and Erik Reppel. 2025.ERC-8004: Trustless Agents. Technical Report. https://eips.ethereum. org/EIPS/eip-8004 Ethereum Improvement Proposal, Draft

2025

[36] [36]

Chrysanthos Dellarocas. 2003. The Digitization of Word of Mouth: Promise and Challenges of Online Feedback Mechanisms.Manage- ment Science49, 10 (Oct. 2003), 1407–1424. doi:10.1287/mnsc.49.10.1407.17308

work page doi:10.1287/mnsc.49.10.1407.17308 2003

[37] [37]

Daniel C. Dennett. 1992. The Self as a Center of Narrative Gravity. InSelf and Consciousness: Multiple Perspectives, Frank S. Kessel, Pamela M. Cole, and Dale L. Johnson (Eds.). Lawrence Erlbaum Associates, 103–115

1992

[38] [38]

Ameet Deshpande, Vishvak Murahari, Tanmay Rajpurohit, Ashwin Kalyan, and Karthik Narasimhan. 2023. Toxicity in ChatGPT: Analyzing Persona-Assigned Language Models. InFindings of the Association for Computational Linguistics: EMNLP 2023. Association for Computational Linguistics, 1236–1270. doi:10.18653/v1/2023.findings-emnlp.88

work page doi:10.18653/v1/2023.findings-emnlp.88 2023

[39] [39]

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui. 2024. A Survey on In-Context Learning. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 1107–1128. doi:10.18653/v1/2024.emnlp-main.64

work page doi:10.18653/v1/2024.emnlp-main.64 2024

[40] [40]

Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, and Zhen Xiang. 2025. MINJA: Memory Injection Attacks on LLM Agents via Query-Only Interaction. InAdvances in Neural Information Processing Systems (NeurIPS). doi:10.48550/arXiv.2503.03704

work page doi:10.48550/arxiv.2503.03704 2025

[41] [41]

John R. Douceur. 2002.The Sybil Attack. Springer Berlin Heidelberg, 251–260. doi:10.1007/3-540-45748-8_24

work page doi:10.1007/3-540-45748-8_24 2002

[42] [42]

Raymond Douglas, Jan Kulveit, Ondřej Havlíček, Theia Pearson-Vogel, Owen Cotton-Barratt, and David Duvenaud. 2026. The Artificial Self: Characterising the Landscape of AI Identity. doi:10.48550/arXiv.2603.11353

work page doi:10.48550/arxiv.2603.11353 2026

[43] [43]

R. A. Duff. 2007.Answering for Crime: Responsibility and Liability in the Criminal Law. Hart Publishing. doi:10.5040/9781472560155

work page doi:10.5040/9781472560155 2007

[44] [44]

Abul Ehtesham, Aditi Singh, Gaurav Kumar Gupta, and Saket Kumar. 2025. A Survey of Agent Interoperability Protocols: Model Context Protocol (MCP), Agent Communication Protocol (ACP), Agent-to-Agent Protocol (A2A), and Agent Network Protocol (ANP). doi:10.48550/ARXIV.2505.02279

work page doi:10.48550/arxiv.2505.02279 2025

[45] [45]

Eisenberger, Matthew D

Naomi I. Eisenberger, Matthew D. Lieberman, and Kipling D. Williams. 2003. Does Rejection Hurt? An fMRI Study of Social Exclusion. Science302, 5643 (Oct. 2003), 290–292. doi:10.1126/science.1089134

work page doi:10.1126/science.1089134 2003

[46] [46]

Madeleine Clare Elish. 2019. Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction.Engaging Science, Technology, and Society5 (March 2019), 40–60. doi:10.17351/ests2019.260

work page doi:10.17351/ests2019.260 2019

[47] [47]

Karen Elliott, Kovila Coopamootoo, Edward Curran, Paul Ezhilchelvan, Samantha Finnigan, Dave Horsfall, Zhichao Ma, Magdalene Ng, Tasos Spiliotopoulos, Han Wu, and Aad van Moorsel. 2022. Know Your Customer: Balancing Innovation and Regulation for Financial Inclusion.Data & Policy4 (2022), e34. doi:10.1017/dap.2022.23

work page doi:10.1017/dap.2022.23 2022

[48] [48]

Ernst Fehr and Simon Gächter. 2002. Altruistic punishment in humans.Nature415, 6868 (Jan. 2002), 137–140. doi:10.1038/415137a FAccT ’26, June 25–28, 2026, Montreal, QC, Canada Botao Amber Hu, Helena Rong, and Max Van Kleek

work page doi:10.1038/415137a 2002

[49] [49]

Horton, and Joseph M

Apostolos Filippas, John J. Horton, and Joseph M. Golden. 2022. Reputation Inflation.Marketing Science41, 4 (July 2022), 733–745. doi:10.1287/mksc.2022.1350

work page doi:10.1287/mksc.2022.1350 2022

[50] [50]

B. J. Fogg. 2003. Prominence-Interpretation Theory: Explaining How People Assess Credibility Online. InCHI ’03 Extended Abstracts on Human Factors in Computing Systems (CHI ’03). ACM Press, 722–723. doi:10.1145/765891.765951

work page doi:10.1145/765891.765951 2003

[51] [51]

B. J. Fogg, Jonathan Marshall, Othman Laraki, Alex Osipovich, Chris Varma, Nicholas Fang, Jyoti Paul, Akshay Rangnekar, John Shon, Preeti Swani, and Marissa Treinen. 2001. What Makes Web Sites Credible? A Report on a Large Quantitative Study. InProceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’01). ACM, 61–68. doi:10.1145/36...

work page doi:10.1145/365024.365037 2001

[52] [52]

Friedman and Paul Resnick

Eric J. Friedman and Paul Resnick. 2001. The Social Cost of Cheap Pseudonyms.Journal of Economics & Management Strategy10, 2 (June 2001), 173–199. doi:10.1111/j.1430-9134.2001.00173.x

work page doi:10.1111/j.1430-9134.2001.00173.x 2001

[53] [53]

Drew Fudenberg and David K. Levine. 1989. Reputation and Equilibrium Selection in Games with a Patient Player.Econometrica57, 4 (July 1989), 759–778. doi:10.2307/1913771

work page doi:10.2307/1913771 1989

[54] [54]

Sandro Garzon, Awid Vaziry, Enis Kuzu, Dennis Gehrmann, Buse Varkan, Alexander Gaballa, and Axel Küpper. 2026. AI Agents with Decentralized Identifiers and Verifiable Credentials. InProceedings of the 18th International Conference on Agents and Artificial Intelligence. SCITEPRESS, 252–259. doi:10.5220/0014234400004052

work page doi:10.5220/0014234400004052 2026

[55] [55]

Tao Ge, Xin Chan, Xiaoyang Wang, Dian Yu, Haitao Mi, and Dong Yu. 2024. Scaling Synthetic Data Creation with 1,000,000,000 Personas. doi:10.48550/arXiv.2406.20094

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2406.20094 2024

[56] [56]

Jones Granatyr, Vanderson Botelho, Otto Robert Lessing, Edson Emílio Scalabrin, Jean-Paul Barthès, and Fabrício Enembreck. 2015. Trust and Reputation Models for Multiagent Systems.Comput. Surveys48, 2 (Oct. 2015), 1–42. doi:10.1145/2816826

work page doi:10.1145/2816826 2015

[57] [57]

Zihan Guo, Yuanjian Zhou, Chenyi Wang, Linlin You, Minjie Bian, and Weinan Zhang. 2025. BetaWeb: Towards a Blockchain-enabled Trustworthy Agentic Web. doi:10.48550/ARXIV.2508.13787

work page doi:10.48550/arxiv.2508.13787 2025

[58] [58]

Lewis Hammond, Alan Chan, Jesse Clifton, Jason Hoelscher-Obermaier, Akbir Khan, Euan McLean, Chandler Smith, Wolfram Barfuss, Jakob Foerster, Tomáš Gavenčiak, The Anh Han, Edward Hughes, Vojtěch Kovařík, Jan Kulveit, Joel Z. Leibo, Caspar Oesterheld, Christian Schroeder de Witt, Nisarg Shah, Michael Wellman, Paolo Bova, Theodor Cimpeanu, Carson Ezell, Que...

work page doi:10.48550/arxiv.2502.14143 2025

[59] [59]

H. L. A. Hart. 1968.Punishment and Responsibility: Essays in the Philosophy of Law. Oxford University Press. doi:10.1093/acprof: oso/9780199534777.001.0001

work page doi:10.1093/acprof: 1968

[60] [60]

Joseph Henrich, Richard McElreath, Abigail Barr, Jean Ensminger, Clark Barrett, Alexander Bolyanatz, Juan Camilo Cardenas, Michael Gurven, Edwins Gwako, Natalie Henrich, Carolyn Lesorogol, Frank Marlowe, David Tracer, and John Ziker. 2006. Costly Punishment Across Human Societies.Science312, 5781 (June 2006), 1767–1770. doi:10.1126/science.1127333

work page doi:10.1126/science.1127333 2006

[61] [61]

Botao Amber Hu and Helena Rong. 2025. Spore in the Wild: A Case Study of Spore.fun as an Open-Environment Evolution Experiment with Sovereign AI Agents on TEE-Secured Blockchains. doi:10.1162/ISAL.a.838

work page doi:10.1162/isal.a.838 2025

[62] [62]

Liao, Esin Durmus, Alex Tamkin, and Deep Ganguli

Saffron Huang, Divya Siddarth, Liane Lovitt, Thomas I. Liao, Esin Durmus, Alex Tamkin, and Deep Ganguli. 2024. Collective Constitutional AI: Aligning a Language Model with Public Input. doi:10.48550/arXiv.2406.07814

work page doi:10.48550/arxiv.2406.07814 2024

[63] [63]

Evan Hubinger, Carson Denison, Jesse Mu, Mike Lambert, Meg Tong, Monte MacDiarmid, Tamera Lanham, Daniel M. Ziegler, Tim Maxwell, Newton Cheng, Adam Jermyn, Amanda Askell, Ansh Radhakrishnan, Cem Anil, David Duvenaud, Deep Ganguli, Fazl Barez, Jack Clark, Kamal Ndousse, Kshitij Sachan, Michael Sellitto, Mrinank Sharma, Nova DasSarma, Roger Grosse, Shauna ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2401.05566 2024

[64] [64]

Jennings, and Nigel R

Trung Dong Huynh, Nicholas R. Jennings, and Nigel R. Shadbolt. 2006. An integrated trust and reputation model for open multi-agent systems.Autonomous Agents and Multi-Agent Systems13, 2 (March 2006), 119–154. doi:10.1007/s10458-005-6825-4

work page doi:10.1007/s10458-005-6825-4 2006

[65] [65]

2026.International AI Safety Report 2026

International AI Safety Report Consortium. 2026.International AI Safety Report 2026. Technical Report. International AI Safety Report. https://internationalaisafetyreport.org/sites/default/files/2026-02/international-ai-safety-report-2026.pdf

2026

[66] [66]

Saito, and Norihiro Sadato

Keise Izuma, Daisuke N. Saito, and Norihiro Sadato. 2008. Processing of Social and Monetary Rewards in the Human Striatum.Neuron 58, 2 (April 2008), 284–294. doi:10.1016/j.neuron.2008.03.020

work page doi:10.1016/j.neuron.2008.03.020 2008

[67] [67]

Jiaming Ji, Wenqi Chen, Kaile Wang, Donghai Hong, Sitong Fang, Boyuan Chen, Jiayi Zhou, Juntao Dai, Sirui Han, Yike Guo, and Yaodong Yang. 2025. Mitigating Deceptive Alignment via Self-Monitoring. doi:10.48550/arXiv.2505.18807

work page doi:10.48550/arxiv.2505.18807 2025

[68] [68]

Audun Jøsang and Roslan Ismail. 2002. The Beta Reputation System. InProceedings of the 15th Bled Electronic Commerce Conference. 2502–2511

2002

[69] [69]

Comsa, Anastasiya Sakovych, Isabella Logothetis, Zejia Zhang, Blaise Agüera y Arcas, and Jonathan Birch

Geoff Keeling, Winnie Street, Martyna Stachaczyk, Daria Zakharova, Iulia M. Comsa, Anastasiya Sakovych, Isabella Logothetis, Zejia Zhang, Blaise Agüera y Arcas, and Jonathan Birch. 2024. Can LLMs Make Trade-Offs Involving Stipulated Pain and Pleasure States? doi:10.48550/arXiv.2411.02432 Dissociative Identity FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

work page doi:10.48550/arxiv.2411.02432 2024

[70] [70]

Bryan Kolb and Robbin Gibb. 2011. Brain Plasticity and Behaviour in the Developing Brain.Journal of the Canadian Academy of Child and Adolescent Psychiatry20, 4 (2011), 265–276

2011

[71] [71]

Noam Kolt. 2024. Governing AI Agents.SSRN Electronic Journal(2024). doi:10.2139/ssrn.4772956

work page doi:10.2139/ssrn.4772956 2024

[72] [72]

Kreps and Robert Wilson

David M. Kreps and Robert Wilson. 1982. Reputation and imperfect information.Journal of Economic Theory27, 2 (Aug. 1982), 253–279. doi:10.1016/0022-0531(82)90030-8

work page doi:10.1016/0022-0531(82)90030-8 1982

[73] [73]

Ziegler, Elizabeth Barnes, and Lawrence Chan

Thomas Kwa, Ben West, Joel Becker, Amy Deng, Katharyn Garcia, Max Hasin, Sami Jawhar, Megan Kinniment, Nate Rush, Sydney Von Arx, Ryan Bloom, Thomas Broadley, Haoxing Du, Brian Goodrich, Nikola Jurkovic, Luke Harold Miles, Seraphina Nix, Tao Lin, Neev Parikh, David Rein, Lucas Jun Koba Sato, Hjalmar Wijk, Daniel M. Ziegler, Elizabeth Barnes, and Lawrence ...

work page doi:10.48550/arxiv.2503.14499 2025

[74] [74]

Donghyun Lee and Mo Tiwari. 2024. Prompt Infection: LLM-to-LLM Prompt Injection within Multi-Agent Systems. doi:10.48550/arXiv. 2410.07283

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2024

[75] [75]

Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, and Xing Xie. 2023. Large Language Models Understand and Can Be Enhanced by Emotional Stimuli. doi:10.48550/arXiv.2307.11760

work page doi:10.48550/arxiv.2307.11760 2023

[76] [76]

Gabriel Lima, Meeyoung Cha, Chihyung Jeon, and Kyung Sin Park. 2021. The Conflict Between People’s Urge to Punish AI and Legal Systems.Frontiers in Robotics and AI8 (Nov. 2021). doi:10.3389/frobt.2021.756242

work page doi:10.3389/frobt.2021.756242 2021

[77] [77]

Zibin Lin, Shengli Zhang, Guofu Liao, Dacheng Tao, and Taotao Wang. 2025. Binding Agent ID: Unleashing the Power of AI Agents with Accountability and Credibility. doi:10.48550/arXiv.2512.17538

work page doi:10.48550/arxiv.2512.17538 2025

[78] [78]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics12 (2024), 157–173. doi:10.1162/tacl_a_00638

work page doi:10.1162/tacl_a_00638 2024

[79] [79]

Christina Lu, Jack Gallagher, Jonathan Michala, Kyle Fish, and Jack Lindsey. 2025. The Assistant Axis: Situating and Stabilizing the Default Persona of Language Models. doi:10.48550/arXiv.2601.10387

work page doi:10.48550/arxiv.2601.10387 2025

[80] [80]

Keming Lu, Bowen Yu, Chang Zhou, and Jingren Zhou. 2024. Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Bangkok, Thailand, 7828–7840. do...

work page doi:10.18653/v1/2024.acl-long.423 2024