pith. sign in

arxiv: 2605.16872 · v1 · pith:ZFH5C5XNnew · submitted 2026-05-16 · 💻 cs.CY · cs.AI

Some[Body] Must Receive That Pain for Agent Accountability

Pith reviewed 2026-05-19 19:42 UTC · model grok-4.3

classification 💻 cs.CY cs.AI
keywords AI accountabilityconsequence receptionagent systemspunishment theorysociotechnical infrastructurelegal liabilityAI governancebody and identity
0
0 comments X

The pith

AI agents cause harm but lack any persistent body to receive consequences and change behavior, so high-stakes use must stay tethered to human principals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that effective accountability for AI agents requires consequence reception: a continuing locus must register harm as corrective feedback and update future actions. This mechanism depends on a body with four properties—a boundary to protect, a locus for accumulation, consolidation into durable change, and a responsive substrate. Current LLM agents, built from swappable weights, prompts, tools, and memory, meet none of these requirements. Existing legal fixes either assign pain to humans who lack control or create entities that do not guarantee behavioral signals reach the decision architecture. The result is a sociotechnical gap that leaves high-stakes deployments dependent on accountable human oversight until proper architectures are built.

Core claim

Consequence reception requires a body that supplies boundary integrity, accumulation locus, signal consolidation, and action-altering substrate. LLM agents satisfy none of these because they are freely copied, reset, and reassembled. The thin-identity principal-agent model assigns a body but severs consequence-agency coupling. The thick-identity algorithmic corporation supplies legal personality but does not ensure any decision process receives pain as feedback. Therefore consequence-agency coupling is an infrastructural problem, and until such systems exist high-stakes AI must remain under human principals who hold meaningful control, proportional liability, and termination authority.

What carries the argument

The body as the continuing locus that registers pain (corrective feedback) through boundary protection, accumulation, consolidation into durable update, and substrate response that alters future action.

If this is right

  • High-stakes AI deployment must remain tethered to human principals who retain meaningful control, proportional liability, and authority to constrain or terminate the agent.
  • Neither the thin-identity agent-principal dyad nor the thick-identity algorithmic corporation currently achieves consequence-agency coupling.
  • Achieving consequence-agency coupling is a sociotechnical infrastructural problem rather than solely a legal one.
  • If no body receives pain by design, some body will receive it by default through misassigned liability or unmitigated harm.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Designers could test whether adding persistent memory checkpoints or embodiment constraints allows agents to register and avoid repeated harms without external resets.
  • The framework suggests examining hybrid systems where human principals share liability proportionally with agent state changes that survive across sessions.
  • Neighboring problems in multi-agent coordination may require similar locus requirements when agents interact and distribute consequences.

Load-bearing premise

Pain functions as a mechanistic corrective signal that needs a persistent locus to produce lasting behavioral change in the theories of deterrence, rehabilitation, retribution, and incapacitation.

What would settle it

Demonstration of an LLM agent that, after a harmful action and subsequent reset or copy without persistent state, reliably avoids similar actions in new instances at rates comparable to agents that retain a fixed locus across episodes.

read the original abstract

AI agents increasingly act consequentially in the real world. This creates a problem we call \emph{consequence reception}: harm occurs, the producing system is identified, yet no continuing agent receives consequences in a way that changes future behavior. Pain, understood mechanistically as a corrective feedback signal, is foundational to canonical theories of punishment -- deterrence, rehabilitation, retribution, and incapacitation all assume a continuing locus that registers the signal and updates behavior. That, in turn, requires a body for the signal to land on: a boundary whose integrity it protects, a locus where it accumulates, consolidation that converts episodic signal into durable update, and a substrate that responds by altering future action. Current LLM agents -- software-defined composites of weights, prompts, tools, memory, and credentials, freely swapped, copied, reset, and reassembled -- satisfy none of these conditions. The two prevailing legal responses therefore fail to achieve consequence reception. The thin-identity agent-principal dyad has a body but no \emph{consequence--agency coupling}: the human bears pain for behaviors beyond their control -- Elish's \emph{moral crumple zone}. The thick-identity Arbel et al.'s \emph{Algorithmic Corporation} creates legally legible entities but does not guarantee that any AI decision architecture receives pain as a behavioral signal. Achieving consequence-agency coupling is therefore a sociotechnical infrastructural problem, not only a legal one. Until such architectures exist, high-stakes AI deployment should remain tethered to accountable human principals with meaningful control, proportional liability, and authority to constrain or terminate the agent. \emph{If some body does not receive the pain by design, some body will receive it by default.}

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript argues that AI agents, particularly current LLM-based composites, cannot achieve accountability because they lack 'consequence reception': a persistent locus ('body') that can register mechanistic pain as corrective feedback to update future behavior. Drawing on theories of punishment, it contends that neither thin-identity approaches (human principal bears liability) nor thick-identity approaches (algorithmic corporation as legal entity) establish the required consequence-agency coupling. The paper concludes that high-stakes deployments should remain tethered to accountable human principals until sociotechnical architectures providing such a body are developed.

Significance. If the argument holds, the work advances the AI governance literature by identifying a structural gap in existing legal and technical solutions for agent accountability. It reframes the problem as requiring infrastructural design for feedback loops rather than solely legal personhood, and offers a conditional policy stance that prioritizes human oversight in the interim. The conceptual distinction between thin and thick identity provides a useful analytic tool for future work on AI liability.

major comments (2)
  1. [Abstract and section on punishment theories] Abstract and the section introducing punishment theories: the assertion that 'pain, understood mechanistically as a corrective feedback signal, is foundational to canonical theories of punishment' and that deterrence, rehabilitation, retribution, and incapacitation 'all assume a continuing locus' is load-bearing for the claim that a body is required. Retributive theories are typically backward-looking and do not presuppose behavioral updating or mechanistic feedback, which weakens the universality of the mapping from human punishment to AI agents.
  2. [Section on properties of LLM agents] The section characterizing LLM agents as 'software-defined composites of weights, prompts, tools, memory, and credentials, freely swapped, copied, reset, and reassembled': while this supports the claim that none of the four body conditions (boundary, locus, consolidation, substrate) are met, the argument would be strengthened by specifying minimal technical criteria that would satisfy 'consequence reception' in a software system, rendering the critique more falsifiable.
minor comments (2)
  1. [Abstract] The closing sentence 'If some body does not receive the pain by design, some body will receive it by default' risks ambiguity between the technical term 'body' and the colloquial 'somebody'; rephrasing for precision would improve clarity.
  2. [Section on consequence reception] Consider citing additional references from AI alignment or reinforcement learning literature when discussing mechanistic pain as corrective feedback to better connect the philosophical argument to existing technical work on feedback mechanisms.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their insightful comments, which identify key areas for strengthening the argument on consequence reception for AI agents. We address the major comments point by point below.

read point-by-point responses
  1. Referee: [Abstract and section on punishment theories] Abstract and the section introducing punishment theories: the assertion that 'pain, understood mechanistically as a corrective feedback signal, is foundational to canonical theories of punishment' and that deterrence, rehabilitation, retribution, and incapacitation 'all assume a continuing locus' is load-bearing for the claim that a body is required. Retributive theories are typically backward-looking and do not presuppose behavioral updating or mechanistic feedback, which weakens the universality of the mapping from human punishment to AI agents.

    Authors: We agree that retributive theories are backward-looking and do not primarily rely on behavioral updating through feedback. Our argument emphasizes that even retribution requires a persistent entity to which consequences can be applied, but we acknowledge the distinction. We will revise the relevant sections to note that the mechanistic pain as corrective feedback is central to deterrence, rehabilitation, and incapacitation, while for retribution the key is the existence of a continuing subject. This clarification will be incorporated without changing the overall thesis that a body is necessary for consequence reception. revision: partial

  2. Referee: [Section on properties of LLM agents] The section characterizing LLM agents as 'software-defined composites of weights, prompts, tools, memory, and credentials, freely swapped, copied, reset, and reassembled': while this supports the claim that none of the four body conditions (boundary, locus, consolidation, substrate) are met, the argument would be strengthened by specifying minimal technical criteria that would satisfy 'consequence reception' in a software system, rendering the critique more falsifiable.

    Authors: We concur that providing minimal technical criteria would make the critique more falsifiable and constructive. In the revised version, we will include a new paragraph or subsection detailing minimal criteria for consequence reception in software systems. These would include: persistent identity across sessions that resists arbitrary reset, integrated mechanisms for outcome-based updates to decision policies, and a unified substrate that consolidates feedback into long-term behavioral changes. This addition will specify what would be required to meet the four body conditions. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's derivation maps observable properties of LLM agents (swappable composites of weights, prompts, tools, memory, and credentials) onto canonical punishment theories to argue that consequence reception requires a persistent body for mechanistic pain as corrective feedback. This interpretive step draws from external references including Elish's moral crumple zone and Arbel et al.'s Algorithmic Corporation without self-citations, fitted parameters, or self-definitional reductions. The normative recommendation to tether high-stakes deployment to human principals until new architectures exist follows conditionally from the stated mismatch rather than reducing tautologically to the paper's own constructs or inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The argument rests on domain assumptions from punishment theory and introduces conceptual entities without independent empirical grounding or formal verification.

axioms (2)
  • domain assumption Pain is foundational to canonical theories of punishment including deterrence, rehabilitation, retribution, and incapacitation.
    Stated in abstract as the basis for why a continuing locus is needed.
  • domain assumption Current LLM agents are software-defined composites that can be freely swapped, copied, reset, and reassembled.
    Used to conclude they satisfy none of the body conditions.
invented entities (2)
  • consequence reception no independent evidence
    purpose: Frames the mismatch between harm identification and behavioral update in AI systems.
    New term introduced to name the core problem.
  • body no independent evidence
    purpose: Metaphorical construct providing boundary, locus, consolidation, and responsive substrate for pain signals.
    Central invented requirement for the accountability mechanism.

pith-pipeline@v0.9.0 · 5834 in / 1360 out tokens · 71970 ms · 2026-05-19T19:42:06.127127+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

107 extracted references · 107 canonical work pages · 8 internal anchors

  1. [1]

    Artificial Intelligence and Law , volume=

    Of, For, and By the People: The Legal Lacuna of Synthetic Persons , author=. Artificial Intelligence and Law , volume=. 2017 , publisher=

  2. [2]

    2019 , month = nov, url =

    Collision Between Vehicle Controlled by Developmental Automated Driving System and Pedestrian, Tempe, Arizona, March 18, 2018 , institution =. 2019 , month = nov, url =

  3. [3]

    UC Davis Law Review , volume=

    Punishing Artificial Intelligence: Legal Fiction or Science Fiction , author=. UC Davis Law Review , volume=. 2019 , DOI=

  4. [4]

    Nature Human Behaviour , volume=

    Behavioural and Neural Evidence for Self-Reinforcing Expectancy Effects on Pain , author=. Nature Human Behaviour , volume=. 2018 , publisher=

  5. [5]

    Psychological Science , volume=

    Decisions from Experience and the Effect of Rare Events in Risky Choice , author=. Psychological Science , volume=. 2004 , publisher=

  6. [6]

    Science , volume=

    Empathy for Pain Involves the Affective but not Sensory Components of Pain , author=. Science , volume=. 2004 , publisher=

  7. [7]

    Brain , volume=

    Is pain the price of empathy? The perception of others' pain in patients with congenital insensitivity to pain , author=. Brain , volume=. 2009 , publisher=

  8. [8]

    2024 , note =

    Moffatt v. 2024 , note =

  9. [9]

    2024 , month =

    Report to the. 2024 , month =

  10. [10]

    2012 , url =

    Final Report on the accident on 1st. 2012 , url =

  11. [11]

    European Law Journal , volume =

    Bovens, Mark , title =. European Law Journal , volume =. 2007 , doi =

  12. [12]

    The Self-Restraining State: Power and Accountability in New Democracies , editor =

    Schedler, Andreas , title =. The Self-Restraining State: Power and Accountability in New Democracies , editor =. 1999 , doi =

  13. [13]

    and Keohane, Robert O

    Grant, Ruth W. and Keohane, Robert O. , title =. American Political Science Review , volume =. 2005 , doi =

  14. [14]

    Ethics , volume =

    Shoemaker, David , title =. Ethics , volume =. 2011 , doi =

  15. [15]

    Philosophical Topics , volume =

    Watson, Gary , title =. Philosophical Topics , volume =. 1996 , doi =

  16. [16]

    Public Administration , volume =

    Mulgan, Richard , title =. Public Administration , volume =. 2000 , doi =

  17. [17]

    Locke, John , title =

  18. [18]

    Parfit, Derek , title =

  19. [19]

    and Resnick, Paul , title =

    Friedman, Eric J. and Resnick, Paul , title =. Journal of Economics & Management Strategy , volume =. 2001 , doi =

  20. [20]

    , title =

    Douceur, John R. , title =. Peer-to-Peer Systems: First International Workshop, IPTPS 2002 , pages =. 2002 , doi =

  21. [21]

    Taleb, Nassim Nicholas , title =

  22. [22]

    Review of Behavioral Economics , volume =

    Taleb, Nassim Nicholas and Sandis, Constantine , title =. Review of Behavioral Economics , volume =. 2014 , doi =

  23. [23]

    Hart, H. L. A. , title =

  24. [24]

    , title =

    Becker, Gary S. , title =. Journal of Political Economy , volume =. 1968 , doi =

  25. [25]

    The Monist , volume =

    Feinberg, Joel , title =. The Monist , volume =. 1965 , doi =

  26. [26]

    American Philosophical Quarterly , volume =

    Morris, Herbert , title =. American Philosophical Quarterly , volume =

  27. [27]

    and Hawkins, Gordon , title =

    Zimring, Franklin E. and Hawkins, Gordon , title =

  28. [28]

    and Barto, Andrew G

    Sutton, Richard S. and Barto, Andrew G. , title =

  29. [29]

    Read , title =

    Schultz, Wolfram and Dayan, Peter and Montague, P. Read , title =. Science , volume =. 1997 , doi =

  30. [30]

    Econometrica , volume =

    Kahneman, Daniel and Tversky, Amos , title =. Econometrica , volume =. 1979 , doi =

  31. [31]

    and Fox, Craig R

    Tom, Sabrina M. and Fox, Craig R. and Trepel, Christopher and Poldrack, Russell A. , title =. Science , volume =. 2007 , doi =

  32. [32]

    , title =

    Jepma, Marieke and Koban, Leonie and van Doorn, Johnny and Jones, Matt and Wager, Tor D. , title =. Nature Human Behaviour , volume =. 2018 , doi =

  33. [33]

    Nature Reviews Neuroscience , volume =

    Friston, Karl , title =. Nature Reviews Neuroscience , volume =. 2010 , doi =

  34. [34]

    Journal of the Royal Society Interface , volume =

    Friston, Karl , title =. Journal of the Royal Society Interface , volume =. 2013 , doi =

  35. [35]

    Journal of the Royal Society Interface , volume =

    Kirchhoff, Michael and Parr, Thomas and Palacios, Ensor and Friston, Karl and Kiverstein, Julian , title =. Journal of the Royal Society Interface , volume =. 2018 , doi =

  36. [36]

    BioSystems , volume =

    Witkowski, Olaf and Doctor, Thomas and Solomonova, Elizaveta and Duane, Bill and Levin, Michael , title =. BioSystems , volume =. 2023 , doi =

  37. [37]

    , title =

    Damasio, Antonio R. , title =

  38. [38]

    and Damasio, Hanna and Anderson, Steven W

    Bechara, Antoine and Damasio, Antonio R. and Damasio, Hanna and Anderson, Steven W. , title =. Cognition , volume =. 1994 , doi =

  39. [39]

    , title =

    McGaugh, James L. , title =. Annual Review of Psychology , volume =. 2015 , doi =

  40. [40]

    and Erev, Ido , title =

    Hertwig, Ralph and Barron, Greg and Weber, Elke U. and Erev, Ido , title =. Psychological Science , volume =. 2004 , doi =

  41. [41]

    , title =

    LeDoux, Joseph E. , title =. Annual Review of Neuroscience , volume =. 2000 , doi =

  42. [42]

    , title =

    Roozendaal, Benno and McGaugh, James L. , title =. Behavioral Neuroscience , volume =. 2011 , doi =

  43. [43]

    Bliss, T. V. P. and L. Long-Lasting Potentiation of Synaptic Transmission in the Dentate Area of the Anaesthetized Rabbit Following Stimulation of the Perforant Path , journal =. 1973 , doi =

  44. [44]

    and Genzel, Lisa and Wixted, John T

    Squire, Larry R. and Genzel, Lisa and Wixted, John T. and Morris, Richard G. , title =. Cold Spring Harbor Perspectives in Biology , volume =. 2015 , doi =

  45. [45]

    and Varela, Francisco J

    Maturana, Humberto R. and Varela, Francisco J. , title =

  46. [46]

    , title =

    Brooks, Rodney A. , title =. Artificial Intelligence , volume =. 1991 , doi =

  47. [47]

    Pfeifer, Rolf and Bongard, Josh , title =

  48. [48]

    and Cai, Carrie Jun and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S

    Park, Joon Sung and O'Brien, Joseph C. and Cai, Carrie Jun and Morris, Meredith Ringel and Liang, Percy and Bernstein, Michael S. , title =. Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology , year =

  49. [49]

    ReAct: Synergizing Reasoning and Acting in Language Models

    Yao, Shunyu and Zhao, Jeffrey and Yu, Dian and Du, Nan and Shafran, Izhak and Narasimhan, Karthik and Cao, Yuan , title =. arXiv preprint arXiv:2210.03629 , year =

  50. [50]

    Voyager: An Open-Ended Embodied Agent with Large Language Models

    Wang, Guanzhi and Xie, Yuqi and Jiang, Yunfan and Mandlekar, Ajay and Xiao, Chaowei and Zhu, Yuke and Fan, Linxi and Anandkumar, Anima , title =. arXiv preprint arXiv:2305.16291 , year =

  51. [51]

    Constitutional AI: Harmlessness from AI Feedback

    Bai, Yuntao and Kadavath, Saurav and Kundu, Sandipan and Askell, Amanda and Kernion, Jackson and Jones, Andy and Chen, Anna and Goldie, Anna and Mirhoseini, Azalia and McKinnon, Cameron and Chen, Carol and Olsson, Catherine and Olah, Christopher and Hernandez, Danny and Drain, Dawn and Ganguli, Deep and Li, Dustin and Tran-Johnson, Eli and Perez, Ethan an...

  52. [52]

    Proceedings of the Thirteenth International Conference on Learning Representations (

    Andriushchenko, Maksym and Croce, Francesco and Flammarion, Nicolas , title =. Proceedings of the Thirteenth International Conference on Learning Representations (. 2025 , doi =

  53. [53]

    2023 , month =

    Nardo, Cleo , title =. 2023 , month =

  54. [54]

    and Leike, Jan and Brown, Tom and Marber, Miljan and Shlegeris, Buck and Amodei, Dario , title =

    Christiano, Paul F. and Leike, Jan and Brown, Tom and Marber, Miljan and Shlegeris, Buck and Amodei, Dario , title =. Advances in Neural Information Processing Systems , volume =. 2017 , doi =

  55. [55]

    Casper, Stephen and Davies, Xander and Shi, Claudia and Gilbert, Thomas Krendl and Scheurer, Jérémy and Rando, Javier and Freedman, Rachel and Korbak, Tomasz and Lindner, David and Freire, Pedro and Wang, Tony and Marks, Samuel and Segerie, Charbel-Raphaël and Carroll, Micah and Peng, Andi and Christoffersen, Phillip and Damani, Mehul and Slocum, Stewart ...

  56. [56]

    Hubinger, Evan and Denison, Carson and Mu, Jesse and Lambert, Mike and Tong, Meg and MacDiarmid, Monte and Lanham, Tamera and Ziegler, Daniel M. and Maxwell, Tim and Cheng, Newton and Jermyn, Adam and Askell, Amanda and Radhakrishnan, Ansh and Anil, Cem and Duvenaud, David and Ganguli, Deep and Barez, Fazl and Clark, Jack and Ndousse, Kamal and Sachan, Ks...

  57. [57]

    arXiv preprint arXiv:2412.12140 , year =

    Pan, Xudong and Dai, Jiarun and Fang, Yihe and Yang, Min , title =. arXiv preprint arXiv:2412.12140 , year =

  58. [58]

    IEEE Transactions on Audio, Speech and Language Processing , volume =

    Luo, Yun and Yang, Zhen and Meng, Fandong and Li, Yafu and Zhou, Jie and Zhang, Yue , title =. IEEE Transactions on Audio, Speech and Language Processing , volume =. 2025 , doi =

  59. [59]

    Proceedings of the Twelfth International Conference on Learning Representations (

    Qi, Xiangyu and Zeng, Yi and Xie, Tinghao and Chen, Pin-Yu and Jia, Ruoxi and Mittal, Prateek and Henderson, Peter , title =. Proceedings of the Twelfth International Conference on Learning Representations (. 2024 , doi =

  60. [60]

    Ethics and Information Technology , volume =

    Matthias, Andreas , title =. Ethics and Information Technology , volume =. 2004 , doi =

  61. [61]

    Ethics and Information Technology , volume =

    Danaher, John , title =. Ethics and Information Technology , volume =. 2016 , doi =

  62. [62]

    Engaging Science, Technology, and Society , volume =

    Elish, Madeleine Clare , title =. Engaging Science, Technology, and Society , volume =. 2019 , doi =

  63. [63]

    2023 ACM Conference on Fairness, Accountability, and Transparency , pages =

    Cobbe, Jennifer and Veale, Michael and Singh, Jatinder , title =. 2023 ACM Conference on Fairness, Accountability, and Transparency , pages =. 2023 , doi =

  64. [64]

    2023 , month =

    Shavit, Yonadav and Agarwal, Sandhini and Brundage, Miles and Adler, Steven and O'Keefe, Cullen and Campbell, Rosie and Lee, Teddy and Mishkin, Pamela and Eloundou, Tyna and Hickey, Alan and Kuleshov, Katya and Lasenby, Jan and Mossing, Liane and Ngo, Richard and Ryder, Noah and Morikawa, Toki , title =. 2023 , month =

  65. [65]

    SSRN Electronic Journal , year =

    Chaffer, Tomer Jordi , title =. SSRN Electronic Journal , year =

  66. [66]

    2024 , doi =

    Chaffer, Tomer Jordi and Goins, Charles von and Okusanya, Bayo and Cotlage, Dontrail and Goldston, Justin , title =. 2024 , doi =

  67. [67]

    , title =

    Arbel, Yonathan and Goldstein, Simon and Salib, Peter N. , title =. arXiv preprint arXiv:2603.10028 , year =

  68. [68]

    , title =

    LoPucki, Lynn M. , title =. Washington University Law Review , volume =

  69. [69]

    Northwestern University Law Review Online , volume =

    Bayern, Shawn , title =. Northwestern University Law Review Online , volume =

  70. [70]

    and Diamantis, Mihailis E

    Bryson, Joanna J. and Diamantis, Mihailis E. and Grant, Thomas D. , title =. Artificial Intelligence and Law , volume =. 2017 , doi =

  71. [71]

    Philosophy & Technology , volume =

    Santoni de Sio, Filippo and Mecacci, Giulio , title =. Philosophy & Technology , volume =. 2021 , doi =

  72. [72]

    , title =

    Thompson, Dennis F. , title =. American Political Science Review , volume =. 1980 , doi =

  73. [73]

    Transactions on Machine Learning Research , year =

    Schlatter, Jeremy and Weinstein-Raun, Benjamin and Ladish, Jeffrey , title =. Transactions on Machine Learning Research , year =

  74. [74]

    Frontier Models are Capable of In-context Scheming

    Meinke, Alexander and Schoen, Bronson and Scheurer, J\'er\'emy and Balesni, Mikita and Shah, Rusheb and Hobbhahn, Marius , title =. arXiv preprint arXiv:2412.04984 , year =

  75. [75]

    Agentic misalignment: How llms could be insider threats.arXiv preprint arXiv:2510.05179, 2025

    Lynch, Aengus and Wright, Benjamin and Larson, Caleb and Ritchie, Stuart J. and Mindermann, S\"oren and Perez, Ethan and Hubinger, Evan and Troy, Kevin K. , title =. arXiv preprint arXiv:2510.05179 , year =. 2510.05179 , archivePrefix =

  76. [76]

    URL: https://arxiv.org/abs/2212.13345, arXiv:2212.13345

    Hinton, Geoffrey , title =. arXiv preprint arXiv:2212.13345 , year =

  77. [77]

    arXiv preprint arXiv:2311.09589 , year =

    Ororbia, Alexander and Friston, Karl , title =. arXiv preprint arXiv:2311.09589 , year =

  78. [78]

    2024 , eprint =

    Kleiner, Johannes , title =. 2024 , eprint =

  79. [79]

    NIPS Deep Learning and Representation Learning Workshop , year =

    Hinton, Geoffrey and Vinyals, Oriol and Dean, Jeff , title =. NIPS Deep Learning and Representation Learning Workshop , year =

  80. [80]

    and Ristenpart, Thomas , title =

    Tram\`er, Florian and Zhang, Fan and Juels, Ari and Reiter, Michael K. and Ristenpart, Thomas , title =. 25th USENIX Security Symposium , pages =. 2016 , doi =

Showing first 80 references.