pith. sign in

arxiv: 2605.27628 · v2 · pith:NWIZ7OKKnew · submitted 2026-05-26 · 💻 cs.AI · cs.CY· cs.ET· cs.MA· cs.SY· eess.SY

Intelligence as Managed Autonomy: Failure, Escalation, and Governance for Agentic AI Systems

Pith reviewed 2026-06-29 16:40 UTC · model grok-4.3

classification 💻 cs.AI cs.CYcs.ETcs.MAcs.SYeess.SY
keywords managed autonomyagentic AIepistemic driftSMARt modelPetri netAI governancefailure managementautonomy revocation
0
0 comments X

The pith

Intelligent AI behavior requires detecting epistemic drift and surrendering control when reliability drops.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that failures in autonomous AI arise from the assumption of unbounded operation rather than from model flaws alone. It defines intelligence as the built-in capacity to notice rising uncertainty, pause reasoning, seek recovery, and ultimately revoke autonomy. This theory is realized through the SMARt model, a four-layer state machine with formal guarantees from a timed guarded Petri net. The model uses domain-specific triggers to enforce safe escalation and governance reachability. If the approach holds, agent design would shift from maximizing autonomy to embedding explicit failure management at the architectural level.

Core claim

Intelligent behavior is defined through the formal capacity to detect epistemic drift, suspend reasoning, attempt recovery, and ultimately surrender control when reliability diminishes. This capacity is instantiated via the SMARt model, a four-layer framework of Stable, Meta-cognitive, Assisted, and Regulated states. A timed guarded Petri net formulation supplies theoretically bounded properties that mandate escalation, constrain invalid outputs, and ensure governance reachability under specified conditions.

What carries the argument

The SMARt model, a four-layer state framework (Stable, Meta-cognitive, Assisted, Regulated) realized as a timed guarded Petri net whose transitions are driven by domain-specific trigger sets.

If this is right

  • The architecture can formally mandate escalation and constrain invalid outputs.
  • Incorporating domain-specific triggers across settings such as healthcare and robotics can preserve safety.
  • The model supports safe, controlled expansion of an agent's operational scope over time.
  • Governance reachability is guaranteed when the Petri net conditions are satisfied.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Testing the model in simulation would require measuring how often agents correctly enter the regulated state under injected uncertainty.
  • The trigger-set approach might be extended to multi-agent settings where one agent can request control from another.
  • Formal verification tools could check whether new trigger sets preserve the original bounded properties of the net.

Load-bearing premise

Domain-specific trigger sets can be defined to meet both completeness and soundness criteria for detecting epistemic drift.

What would settle it

Implementation of the SMARt model in a test agent that encounters a clear epistemic drift scenario yet continues unsafe actions without entering the regulated state, or a Petri net reachability check that shows a governance state is unreachable when triggers fire.

read the original abstract

As autonomous and agentic AI systems scale in robotic and human-machine environments, managing hallucination and persistent but unjustified action remains an open challenge. Rather than attributing these failures solely to model or alignment limitations, this paper explores the architectural vulnerability of unbounded autonomy - the presumption that an agent should continue operating regardless of rising uncertainty. It introduces a theory of managed autonomy that defines intelligent behavior through the formal capacity to detect epistemic drift, suspend reasoning, attempt recovery, and ultimately surrender control when reliability diminishes. We instantiate this theory via the SMARt (Self-Managing Multi-tier Autonomous Reasoning with Regulated/Revoked transitions) model, a four-layer framework featuring Stable, Meta-cognitive, Assisted, and Regulated states. By developing a timed, guarded Petri net formulation, we establish theoretically bounded properties for the system, demonstrating how architecture can formally mandate escalation, constrain invalid outputs, and ensure governance reachability under specified conditions. We further analyze how incorporating domain-specific trigger sets across varied operational settings (e.g., healthcare, robotics, etc.) can systematically preserve safety, assuming completeness and soundness criteria are met. Because these triggers are designed to be adaptive, the SMARt model accommodates the safe, controlled expansion of an agent's operational scope over time. We conclude that formalizing failure management within the autonomy lifecycle is a crucial step toward realizing reliable and governed artificial intelligence.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that intelligent behavior in agentic AI systems is defined by the formal capacity to detect epistemic drift, suspend reasoning, attempt recovery, and surrender control when reliability diminishes. It instantiates this theory in the SMARt (Self-Managing Multi-tier Autonomous Reasoning with Regulated/Revoked transitions) model, a four-layer framework (Stable, Meta-cognitive, Assisted, Regulated) whose transitions are formalized via a timed guarded Petri net. The architecture is asserted to establish theoretically bounded properties that mandate escalation, constrain invalid outputs, and ensure governance reachability; domain-specific trigger sets, when complete and sound, are claimed to preserve safety while permitting controlled expansion of operational scope across domains such as healthcare and robotics.

Significance. If the Petri-net derivations and trigger-set criteria can be rigorously established, the work would supply a concrete architectural mechanism for embedding failure management and revocation into autonomous agents, addressing a recognized vulnerability in scaling agentic systems. The explicit use of a timed guarded Petri net to derive bounded properties is a methodological strength that could support falsifiable claims about reachability and escalation if the formalization is complete.

major comments (2)
  1. [Abstract] Abstract and SMARt model section: The safety-preservation and scope-expansion claims are conditioned on domain-specific trigger sets satisfying completeness and soundness criteria, yet the manuscript supplies neither a construction procedure for these sets, a verification method, nor a proof that the sets remain complete/sound under the model's own state transitions. This renders the stated reachability and revocation guarantees dependent on an external precondition whose satisfaction is not shown to be internal to the formalism.
  2. [SMARt model / Petri net formulation] Timed guarded Petri net formulation: The central claim that the architecture 'formally mandate[s] escalation, constrain[s] invalid outputs, and ensure[s] governance reachability' is asserted to follow from the net's properties, but the provided text does not exhibit the explicit transition guards, timing constraints, or reachability analysis that would demonstrate these properties independently of the trigger-set assumption.
minor comments (2)
  1. The acronym expansion 'Self-Managing Multi-tier Autonomous Reasoning with Regulated/Revoked transitions' uses an internal slash that may obscure readability; a parenthetical clarification would help.
  2. [Abstract] The abstract refers to 'specified conditions' for the bounded properties without enumerating them; an explicit list or reference to the relevant Petri-net equations would improve traceability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive report and the recommendation for major revision. The comments correctly identify areas where the formal claims require additional support. We address each point below and indicate the planned changes.

read point-by-point responses
  1. Referee: [Abstract] Abstract and SMARt model section: The safety-preservation and scope-expansion claims are conditioned on domain-specific trigger sets satisfying completeness and soundness criteria, yet the manuscript supplies neither a construction procedure for these sets, a verification method, nor a proof that the sets remain complete/sound under the model's own state transitions. This renders the stated reachability and revocation guarantees dependent on an external precondition whose satisfaction is not shown to be internal to the formalism.

    Authors: We agree that the safety and scope-expansion claims are conditioned on external completeness and soundness assumptions for the trigger sets, and that the manuscript does not supply a construction procedure, verification method, or invariance proof under the model's transitions. This is an accurate observation. In the revised version we will (i) make the external nature of the assumption explicit in the abstract and model section, (ii) add a short discussion of practical verification approaches (domain-expert review and static analysis), and (iii) clearly label full integration of trigger-set maintenance into the net as future work rather than a current result. revision: partial

  2. Referee: [SMARt model / Petri net formulation] Timed guarded Petri net formulation: The central claim that the architecture 'formally mandate[s] escalation, constrain[s] invalid outputs, and ensure[s] governance reachability' is asserted to follow from the net's properties, but the provided text does not exhibit the explicit transition guards, timing constraints, or reachability analysis that would demonstrate these properties independently of the trigger-set assumption.

    Authors: The current manuscript describes the timed guarded Petri net at the architectural level without exhibiting the concrete transition guards, timing bounds, or reachability analysis. We accept that the bounded properties are therefore not demonstrated from the net alone. The revised manuscript will add an appendix containing the formal net definition (places, transitions, guards, and timing constraints) together with a sketch of the reachability argument showing mandatory escalation and revocation paths. This addition will be presented independently of any particular trigger-set instantiation. revision: yes

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Abstract-only; no explicit free parameters, axioms, or invented entities beyond the SMARt model itself are detailed. The completeness and soundness of triggers are stated as assumptions.

axioms (1)
  • domain assumption Completeness and soundness criteria for domain-specific trigger sets are met
    Stated in the abstract as the condition under which the model preserves safety.
invented entities (1)
  • SMARt model (Self-Managing Multi-tier Autonomous Reasoning with Regulated/Revoked transitions) no independent evidence
    purpose: Four-layer framework to instantiate the theory of managed autonomy
    Newly named architecture whose properties are claimed to follow from the Petri net encoding

pith-pipeline@v0.9.1-grok · 5790 in / 1293 out tokens · 35000 ms · 2026-06-29T16:40:53.773968+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

114 extracted references · 23 canonical work pages · 12 internal anchors

  1. [1]

    Ghallab, D

    M. Ghallab, D. Nau, and P. Traverso, Automated Planning: Theory and Practice. Morgan Kaufmann, 2004

  2. [2]

    Russell and P

    S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach, 4th ed. Pearson, 2020

  3. [3]

    A survey of the classical planning literature,

    M. L. Littman et al., “A survey of the classical planning literature,” J. Artif. Intell. Res., vol. 65, pp. 1-66, 2019

  4. [4]

    The first law of robotics (a call to arms),

    D. Weld and O. Etzioni, “The first law of robotics (a call to arms),” IEEE Intelligent Systems, vol. 16, no. 1, pp. 48-53, 2001

  5. [5]

    Reasoning about autonomous processes in dynamic worlds,

    D. McDermott, “Reasoning about autonomous processes in dynamic worlds,” Artif. Intell., vol. 92, pp. 31-72, 1997

  6. [6]

    The FF planning system: Fast task planning using heuristic search,

    J. Hoffmann and B. Nebel, “The FF planning system: Fast task planning using heuristic search,” J. Artif. Intell. Res., vol. 14, pp. 253-302, 2001

  7. [7]

    Motion planning in medicine,

    R. Alterovitz, K. Goldberg, and J. Latombe, “Motion planning in medicine,” Commun. ACM, vol. 55, no. 11, pp. 78-88, 2012

  8. [8]

    ReAct: Synergizing Reasoning and Acting in Language Models

    Y. Yao et al., “ReAct: Synergizing reasoning and acting in language models,” arXiv:2210.03629, 2022

  9. [9]

    Toolformer: Language Models Can Teach Themselves to Use Tools

    J. Schick et al., “Toolformer: Language models can teach themselves to use tools,” arXiv:2302.04761, 2023

  10. [10]

    Reflexion: Language Agents with Verbal Reinforcement Learning

    M. Shinn et al., “Reflexion: Language agents with verbal reinforcement learning,” arXiv:2303.11366, 2023

  11. [11]

    The Rise and Potential of Large Language Model Based Agents: A Survey

    Q. Chen et al., “A survey on large language model-based autonomous agents,” arXiv:2309.07864, 2023

  12. [12]

    AutoGPT and the rise of autonomous LLM agents: Challenges and opportunities,

    T. Zetterlund et al., “AutoGPT and the rise of autonomous LLM agents: Challenges and opportunities,” ACM Computing Surveys, to appear, 2024

  13. [13]

    Chain-of-thought prompting elicits reasoning in large language models,

    J. Wei et al., “Chain-of-thought prompting elicits reasoning in large language models,” NeurIPS, 2022

  14. [14]

    Self-consistency improves chain-of-thought reasoning in language models,

    X. Wang et al., “Self-consistency improves chain-of-thought reasoning in language models,” ICLR, 2023

  15. [15]

    Self-Refine: Iterative Refinement with Self-Feedback

    A. Madaan et al., “Self-Refine: Iterative refinement with self-feedback,” arXiv:2303.17651, 2023

  16. [16]

    Reflexion: Language agents with verbal reinforcement learning,

    N. Shinn et al., “Reflexion: Language agents with verbal reinforcement learning,” NeurIPS, 2023

  17. [17]

    SelfCheck: LLMs can zero-shot check their own step-by-step reasoning,

    N. Miao, Y. W. Teh, and T. Rainforth, “SelfCheck: LLMs can zero-shot check their own step-by-step reasoning,” ICLR, 2024

  18. [18]

    SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

    P. Manakul, A. Liusie, and M. J. F. Gales, “SelfCheckGPT: Zero-resource hallucination detection for large language models,” arXiv:2303.08896, 2023

  19. [19]

    Let's Verify Step by Step

    H. Lightman et al., “Let’s verify step by step,” arXiv:2305.20050, 2023

  20. [20]

    Automatically correcting large language models: A survey,

    L. Pan et al., “Automatically correcting large language models: A survey,” Trans. ACL, vol. 12, 2024

  21. [21]

    Large language models cannot self-correct reasoning yet,

    J. Huang et al., “Large language models cannot self-correct reasoning yet,” ICLR, 2024

  22. [22]

    When can LLMs actually correct their own mistakes? A critical survey,

    R. Kamoi and T. Kobayashi, “When can LLMs actually correct their own mistakes? A critical survey,” Trans. ACL, vol. 12, 2024

  23. [23]

    Analyzing self-correction of large language models,

    B. Wang et al., “Analyzing self-correction of large language models,” arXiv:2310.00000, 2024. (Replace with correct arXiv ID if needed.)

  24. [24]

    Language models can solve computer tasks,

    G. Kim, P. Baldi, and S. McAleer, “Language models can solve computer tasks,” NeurIPS, 2023

  25. [25]

    Teaching large language models to self-debug,

    X. Chen et al., “Teaching large language models to self-debug,” ICLR, 2024

  26. [26]

    CRITIC: Large language models can self-correct with tool-interactive critiquing,

    Z. Gou et al., “CRITIC: Large language models can self-correct with tool-interactive critiquing,” ICLR, 2024

  27. [27]

    Can LLMs correct themselves? A benchmark of self-correction in LLMs,

    G. Tie et al., “Can LLMs correct themselves? A benchmark of self-correction in LLMs,” arXiv:2510.16062, 2025

  28. [28]

    An agent-based approach for building complex software systems,

    N. R. Jennings, “An agent-based approach for building complex software systems,” Commun. ACM, vol. 44, no. 4, pp. 35-41, 2001

  29. [29]

    Wooldridge, An Introduction to MultiAgent Systems, 2nd ed

    M. Wooldridge, An Introduction to MultiAgent Systems, 2nd ed. Wiley, 2009

  30. [30]

    AI safety via debate

    G. Irving et al., “AI safety via debate,” arXiv:1805.00899, 2018

  31. [31]

    LLM Debate Improves Mathematical Reasoning,

    K. Lakshminarayanan et al., “LLM Debate Improves Mathematical Reasoning,” arXiv:2305.17421, 2023

  32. [32]

    Holistic evaluation of language models,

    W. Liang et al., “Holistic evaluation of language models,” NeurIPS, 2022

  33. [33]

    Training verifiers to solve mathematical problems,

    K. Cobbe et al., “Training verifiers to solve mathematical problems,” arXiv:2111.08145, 2021

  34. [34]

    Solving Quantitative Reasoning Problems with Language Models

    A. Lewkowycz et al., “Solving quantitative reasoning problems with language models,” arXiv:2206.14858, 2022

  35. [35]

    Faithful reasoning using large language models,

    A. Creswell and M. Shanahan, “Faithful reasoning using large language models,” arXiv:2208.14271, 2022

  36. [36]

    Why think step by step? Reasoning emerges from the locality of experience,

    A. Prystawski and N. D. Goodman, “Why think step by step? Reasoning emerges from the locality of experience,” arXiv:2304.01941, 2023

  37. [37]

    GLAM: Efficient scaling with mixture-of-experts,

    S. Shazeer, “GLAM: Efficient scaling with mixture-of-experts,” arXiv:2103.00039, 2021

  38. [38]

    LLM committees: Scalable self-verification with multiple models,

    K. Chen et al., “LLM committees: Scalable self-verification with multiple models,” arXiv:2310.03061, 2023

  39. [39]

    Lynch, Distributed Algorithms

    N. Lynch, Distributed Algorithms. Morgan Kaufmann, 1996

  40. [40]

    Dorigo and T

    M. Dorigo and T. Stützle, Ant Colony Optimization. MIT Press, 2004

  41. [41]

    Bonabeau, M

    E. Bonabeau, M. Dorigo, and G. Theraulaz, Swarm Intelligence: From Natural to Artificial Systems. Oxford Univ. Press, 1999

  42. [42]

    Improving LLM reasoning via multi-agent collaboration,

    Y. Du et al., “Improving LLM reasoning via multi-agent collaboration,” arXiv:2308.05352, 2023

  43. [43]

    Guidelines for Human-AI Interaction,

    D. Amershi et al., “Guidelines for Human-AI Interaction,” Proc. CHI, 2019

  44. [44]

    Training language models to follow instructions with human feedback,

    J. Ouyang et al., “Training language models to follow instructions with human feedback,” NeurIPS, 2022

  45. [45]

    Learning to summarize with human feedback,

    R. B. Stiennon et al., “Learning to summarize with human feedback,” NeurIPS, 2020

  46. [46]

    Constitutional AI: Harmlessness from AI Feedback

    M. Bai et al., “Constitutional AI: Harmlessness from AI feedback,” arXiv:2212.08073, 2022

  47. [47]

    AGENTSAFE: A Unified Framework for Ethical Assurance and Governance in Agentic AI

    Khan, R., Joyce, D., Habiba, M., “AGENTSAFE: A Unified Framework for Ethical Assurance and Governance in Agentic AI”, 2025

  48. [48]

    Syros, G., Suri, A., Nita-Rotaru, C., 'SAGA: A Security Architecture for Governing AI Agentic Systems', 2025

  49. [49]

    L., Singhal, T., Kelkar, A., 'MI9 -- Agent Intelligence Protocol: Runtime Governance for Agentic AI Systems', 2025

    Wang, C. L., Singhal, T., Kelkar, A., 'MI9 -- Agent Intelligence Protocol: Runtime Governance for Agentic AI Systems', 2025

  50. [50]

    Gomez, F., 'Adapting Insider Risk Mitigations for Agentic Misalignment: An Empirical Study', 2025

  51. [51]

    Research priorities for robust and beneficial AI,

    S. J. Russell, D. Dewey, and M. Tegmark, “Research priorities for robust and beneficial AI,” AI Magazine, vol. 36, no. 4, pp. 105-114, 2015

  52. [52]

    Self-evaluation improves selective generation in large language models,

    E. Zelikman et al., “Self-evaluation improves selective generation in large language models,” arXiv:2203.11113, 2022

  53. [53]

    Open problems in cooperative AI,

    S. Casper et al., “Open problems in cooperative AI,” NeurIPS, 2020

  54. [54]

    R. C. Arkin, Governing Lethal Behavior in Autonomous Robots. CRC Press, 2009

  55. [55]

    Research priorities for robust and beneficial AI,

    S. Russell, D. Dewey, and M. Tegmark, “Research priorities for robust and beneficial AI,” AI Magazine, vol. 36, no. 4, pp. 105-114, 2015

  56. [56]

    Safely interruptible agents,

    L. Orseau and S. Armstrong, “Safely interruptible agents,” Proc. UAI, pp. 557-566, 2016

  57. [57]

    Reinforcement learning with a corrupted reward channel,

    T. Everitt et al., “Reinforcement learning with a corrupted reward channel,” Proc. IJCAI, pp. 4705-4713, 2017

  58. [59]

    Russell, Human Compatible: Artificial Intelligence and the Problem of Control

    S. Russell, Human Compatible: Artificial Intelligence and the Problem of Control. Viking, 2019

  59. [60]

    The off-switch game,

    D. Hadfield-Menell, S. Russell, P. Abbeel, and A. Dragan, “The off-switch game,” IJCAI, pp. 220-227, 2017

  60. [61]

    X-risk analysis for large language models,

    D. Hendrycks and M. Mazeika, “X-risk analysis for large language models,” arXiv:2306.12042, 2023

  61. [62]

    Steps toward robust artificial intelligence,

    T. G. Dietterich, “Steps toward robust artificial intelligence,” AI Magazine, vol. 38, no. 3, pp. 3-24, 2017

  62. [63]

    Intelligent agents: Theory and practice,

    M. Wooldridge and N. R. Jennings, “Intelligent agents: Theory and practice,” Knowledge Engineering Review, vol. 10, no. 2, pp. 115-152, 1995

  63. [64]

    Value-function approximations for partially observable Markov decision processes,

    M. Hauskrecht, “Value-function approximations for partially observable Markov decision processes,” Journal of Artificial Intelligence Research, vol. 13, pp. 33-94, 2000

  64. [65]

    On human-robot cooperation,

    R. Alami et al., “On human-robot cooperation,” Int. J. Robotics Research, vol. 23, no. 7-8, pp. 889-904, 2004

  65. [66]

    Shoham and K

    Y. Shoham and K. Leyton-Brown, Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations. Cambridge Univ. Press, 2009

  66. [67]

    The ethics of algorithms: Mapping the debate,

    B. Mittelstadt et al., “The ethics of algorithms: Mapping the debate,” Big Data & Society, vol. 3, no. 2, 2016

  67. [68]

    A Survey of Hallucination in Large Foundation Models

    S. Rawte et al., “A Survey on Hallucination in Large Language Models,” arXiv:2309.05922, 2023

  68. [69]

    Toward the realization of intelligent controls,

    G. N. Saridis, “Toward the realization of intelligent controls,” Proc. IEEE, vol. 67, no. 8, pp. 1115-1133, 1979

  69. [70]

    K. P. Valavanis, Intelligent Robotic Systems: Theory, Design and Applications. Springer, 1992

  70. [71]

    Modeling, analysis and simulation of a materials handling system with extended petri nets

    Ramaswamy, S., K. P. Valavanis, and S. P. Landry. "Modeling, analysis and simulation of a materials handling system with extended petri nets." [1992] Proc. of the 31st IEEE Conference on Decision and Control. IEEE, 1992

  71. [72]

    Modeling, analysis and simulation of failures in a materials handling system with extended Petri nets,

    S. Ramaswamy and K. P. Valavanis, "Modeling, analysis and simulation of failures in a materials handling system with extended Petri nets," in IEEE Transactions on Systems, Man, and Cybernetics, vol. 24, no. 9, pp. 1358-1373, Sept. 1994

  72. [73]

    K. P. Valavanis, Advances in Unmanned Aerial Vehicles: State of the Art and the Road to Autonomy. Springer, 2007

  73. [74]

    Outline for a theory of intelligence,

    J. S. Albus, “Outline for a theory of intelligence,” IEEE Trans. SMC, vol. 21, no. 3, pp. 473-509, 1991

  74. [75]

    Hierarchical control of manufacturing systems,

    A. A. Desrochers, “Hierarchical control of manufacturing systems,” IEEE Control Systems Magazine, vol. 10, no. 1, pp. 5-11, 1990

  75. [76]

    A. A. Desrochers and R. Al-Aomar, Hierarchical Planning and Scheduling in Manufacturing Systems. Springer, 1999

  76. [77]

    Hierarchical structures in decision-making,

    A. H. Levis, “Hierarchical structures in decision-making,” IEEE Trans. SMC, vol. 11, no. 7, pp. 471-478, 1981

  77. [78]

    Meystel, Multiresolutional Decision-Making for Intelligent Agents

    A. Meystel, Multiresolutional Decision-Making for Intelligent Agents. CRC Press, 1990

  78. [79]

    Meystel, Intelligent Systems: A Semiotic Perspective

    A. Meystel, Intelligent Systems: A Semiotic Perspective. Wiley, 1991

  79. [80]

    Russell, Human Compatible: AI and the Problem of Control

    S. Russell, Human Compatible: AI and the Problem of Control. Viking, 2019

  80. [81]

    Stabilization of helical macromolecular phases by confined bending

    N. Soares et al., “Corrigibility,” arXiv:1509.06454, 2015

Showing first 80 references.