pith. sign in

arxiv: 2606.04903 · v1 · pith:3555BZEOnew · submitted 2026-06-03 · 💻 cs.LO · cs.AI· cs.MA· cs.PL

Provably Auditable and Safe LLM Agents from Human-Authored Ontologies

Pith reviewed 2026-06-28 03:47 UTC · model grok-4.3

classification 💻 cs.LO cs.AIcs.MAcs.PL
keywords LLM agentstyped lambda calculusauditabilityontologiesBasic Formal OntologyAgentic Reduxhealthcare compliancesecurity vulnerability disclosure
0
0 comments X

The pith

Typed lambda calculus proves that Agentic Redux LLM agent executions are semantically correct with append-only audit ledgers on suitable domains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an LLM agent architecture for use with nontrivial problem domains that require linear auditability. It proves using the typed lambda calculus that on appropriate domains the executions are semantically guaranteed to be correct with decisions in an append-only ledger. It also introduces a methodology for creating agent frameworks by having human experts ontologize the domain with Basic Formal Ontology and derive roles for agents and humans. Example domains include healthcare billing compliance and security vulnerability disclosure. This setup aims to ensure that agents follow assigned roles without deviation while maintaining full traceability.

Core claim

Agentic Redux is an architecture in which LLM agents operate according to roles derived from human-authored ontologies in Basic Formal Ontology. Using the typed lambda calculus, the paper proves that when run on domains that can be appropriately formalized, the executions of these agents are semantically guaranteed to be correct, and all decisions are recorded in an append-only ledger. This provides provable auditability and safety for applications in healthcare billing compliance and security vulnerability disclosure.

What carries the argument

Agentic Redux architecture combined with Ontology-First Agent Design, where typed lambda calculus supplies semantic guarantees for correctness and an append-only ledger supplies linear auditability.

If this is right

  • In healthcare billing compliance, agent decisions can be verified against formal rules with complete traceability.
  • In security vulnerability disclosure, every step of the process remains recorded and correct by construction.
  • Human experts can define agent roles through ontology rather than direct programming of rules.
  • The architecture separates domain formalization from LLM execution to enforce role adherence.
  • The same proof technique applies to any domain that admits a suitable typed lambda calculus model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could apply to other regulated domains such as finance if they admit similar formalizations.
  • The append-only ledger structure may integrate with existing compliance logging systems.
  • Testing the approach on additional domains would check whether the lambda calculus proofs scale beyond the two examples.
  • The separation of ontology creation from agent execution may limit certain forms of unintended behavior.

Load-bearing premise

Problem domains can be formalized in the typed lambda calculus such that LLM agent executions stay inside the model and follow the ontology-derived roles without deviation.

What would settle it

An execution trace from one of the domains in which an agent produces a decision outside the ontology-derived role or omits a step from the ledger would falsify the semantic guarantee.

Figures

Figures reproduced from arXiv: 2606.04903 by Aaron Sterling.

Figure 1
Figure 1. Figure 1: The Agentic Redux architecture. Sub-agents receive isolated state slices and propose actions to a meta-agent, which adjudicates proposals against declared invariants be￾fore committing state transitions. Code is available at https://github.com/Thistleseeds/ agentic-redux 1.3 Agentic Redux 1.3.1 Agentic Redux Architecture This paper introduces the agent architecture Agentic Redux. It is inspired by the fron… view at source ↗
Figure 2
Figure 2. Figure 2: Ontology-First Agent Design. The ontologies referenced in this paper were pro [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Role Derivation procedure. to agents of the classic Confinement Problem[18], restricts both what an agent can observe and what an agent can directly modify: an agent sees only the minimal slice of state needed to perform its work, and cannot directly modify state at all, since all state changes are mediated by the meta-agent. 2.3 Derivation of System Invariants Once we have roles and slices of state that w… view at source ↗
Figure 4
Figure 4. Figure 4: Deriving invariants, then assigning the cross-cutting invariants to the Agentic Redux [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Agentic Redux expanded with a Counselor Queue for the UDT Compliance probem [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The independent continuant Vulnerability and its relations, in the Security Vulnera [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: A comparison between the happy paths of the independent continuant’s journey in [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: The Agentic Redux architecture for the Security Vulnerability Disclosure ontology. [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗
read the original abstract

We introduce the LLM agent architecture Agentic Redux, intended for use with nontrivial problem domains that require linear auditability. Using the typed lambda calculus, we prove that, run on appropriate domains, Agentic Redux executions are semantically guaranteed to be correct, with all decisions recorded in an append-only ledger. We present two production-grade appropriate domains, in healthcare billing compliance, and security vulnerability disclosure. Working code for Agentic Redux run on both domains is available in a supporting code repository. We also introduce Ontology-First Agent Design, a methodology for creation of agent frameworks on a problem domain, in which a human expert ontologizes the problem domain with Basic Formal Ontology, and then assigns an LLM to derive roles that agents and humans-in-the-loop can fill, in order to work the problems in the domain.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces Agentic Redux, an LLM agent architecture for nontrivial domains requiring linear auditability. It claims that, using the typed lambda calculus, executions on appropriate domains are semantically guaranteed to be correct with all decisions recorded in an append-only ledger. It presents Ontology-First Agent Design, in which a human expert uses Basic Formal Ontology (BFO) to ontologize the domain and an LLM derives roles for agents and humans-in-the-loop. Two production-grade domains are given (healthcare billing compliance and security vulnerability disclosure) along with working code.

Significance. If the claimed proof and enforcement mechanisms were fully specified and verified, the work would offer a notable contribution to verifiable LLM agents by combining human-authored ontologies with formal semantics for auditability in high-stakes domains. The availability of working code on concrete domains is a positive element.

major comments (3)
  1. [Abstract] Abstract: The central claim that 'using the typed lambda calculus, we prove that... Agentic Redux executions are semantically guaranteed to be correct' is asserted without any derivation, type system definition, reduction rules, or formal model of how LLM outputs are constrained to well-typed terms. This is load-bearing for the 'provably' guarantee.
  2. The description of Ontology-First Agent Design and the two domains does not provide an explicit mechanism (e.g., runtime type checker, reduction to lambda terms, or rejection of out-of-model outputs) that would ensure every LLM-generated action remains a valid typed term whose reduction is recorded in the ledger. Without this, the formal model describes an ideal agent rather than the actual LLM-driven system.
  3. The weakest assumption—that problem domains can be suitably formalized in typed lambda calculus so that LLM executions remain inside the model—is stated but not supported by any concrete type assignment, ontology-to-lambda mapping, or example derivation for the healthcare or security domains.
minor comments (1)
  1. [Abstract] The abstract and introduction use the term 'Agentic Redux' and 'Ontology-First Agent Design' without an initial definition or citation to prior work on BFO or lambda calculus in agent systems.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and commit to revisions that will strengthen the formal content of the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that 'using the typed lambda calculus, we prove that... Agentic Redux executions are semantically guaranteed to be correct' is asserted without any derivation, type system definition, reduction rules, or formal model of how LLM outputs are constrained to well-typed terms. This is load-bearing for the 'provably' guarantee.

    Authors: We agree that the abstract states the central claim without accompanying formal details. The current manuscript does not contain the type system definition, reduction rules, or proof derivation. We will revise by adding a new section (or expanding Section 3) that defines the typed lambda calculus, presents the reduction rules, describes the formal model for constraining LLM outputs to well-typed terms, and includes a proof sketch of semantic correctness. revision: yes

  2. Referee: The description of Ontology-First Agent Design and the two domains does not provide an explicit mechanism (e.g., runtime type checker, reduction to lambda terms, or rejection of out-of-model outputs) that would ensure every LLM-generated action remains a valid typed term whose reduction is recorded in the ledger. Without this, the formal model describes an ideal agent rather than the actual LLM-driven system.

    Authors: This observation is correct. The manuscript describes the overall architecture and the append-only ledger but does not specify the runtime enforcement mechanism. We will add an explicit subsection detailing the runtime type checker, the process by which LLM outputs are reduced to lambda terms or rejected if they fall outside the model, and how valid reductions are recorded in the ledger. revision: yes

  3. Referee: The weakest assumption—that problem domains can be suitably formalized in typed lambda calculus so that LLM executions remain inside the model—is stated but not supported by any concrete type assignment, ontology-to-lambda mapping, or example derivation for the healthcare or security domains.

    Authors: We acknowledge that the manuscript states the assumption at a high level without concrete supporting examples. We will add an appendix containing explicit type assignments, ontology-to-lambda mappings, and example derivations for both the healthcare billing compliance and security vulnerability disclosure domains. revision: yes

Circularity Check

0 steps flagged

No significant circularity; formal proof relies on external typed lambda calculus

full rationale

The paper's central claim is a proof that Agentic Redux executions are semantically guaranteed to be correct when run on appropriate domains, using the typed lambda calculus as the formal system. This is a standard external mathematical framework, not derived from or equivalent to the paper's own inputs, fitted parameters, or self-referential definitions. The Ontology-First methodology involves human-authored BFO ontologies with LLM role derivation, but the proof is conditional on the stated assumption that domains can be formalized such that executions remain within the model; this assumption is not smuggled in via self-citation or reduced by construction to the result itself. No load-bearing self-citations, ansatzes, or renamings of known results appear in the derivation chain. The result is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The paper relies on standard formal methods and introduces new architectural concepts without additional fitted parameters visible in the abstract.

axioms (2)
  • standard math Typed lambda calculus can model and prove semantic correctness of agent executions on appropriate domains.
    Invoked as the basis for the proof of correctness in the abstract.
  • domain assumption Basic Formal Ontology can be used by human experts to structure problem domains for subsequent LLM role derivation.
    Central to the Ontology-First Agent Design methodology described.
invented entities (2)
  • Agentic Redux no independent evidence
    purpose: LLM agent architecture providing semantic correctness guarantees and append-only audit ledger.
    Newly introduced architecture in the paper.
  • Ontology-First Agent Design no independent evidence
    purpose: Methodology for creating agent frameworks by human ontology authoring followed by LLM role derivation.
    New methodology proposed in the paper.

pith-pipeline@v0.9.1-grok · 5662 in / 1409 out tokens · 38225 ms · 2026-06-28T03:47:44.288204+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 9 canonical work pages

  1. [1]

    Redux: A JS library for predictable and maintainable global state man- agement, 2015–2026

    Dan Abramov. Redux: A JS library for predictable and maintainable global state man- agement, 2015–2026. URLhttps://redux.js.org/

  2. [2]

    Vera: A language designed for machines to write, 2026

    Alasdair Allan. Vera: A language designed for machines to write, 2026. URLhttps: //veralang.dev/

  3. [3]

    Spear.Building Ontologies with Basic Formal Ontology

    Robert Arp, Barry Smith, and Andrew D. Spear.Building Ontologies with Basic Formal Ontology. MIT Press, 2015. doi:10.7551/mitpress/9780262527811.001.0001

  4. [4]

    Llms4ol 2025 overview: The 2nd large language models for ontology learning challenge

    Hamed Babaei Giglou, Jennifer D’Souza, Nandana Mihindukulasooriya, and S¨ oren Auer. Llms4ol 2025 overview: The 2nd large language models for ontology learning challenge. Open Conference Proceedings, 6, Oct. 2025. doi:10.52825/ocp.v6i.2913. URLhttps:// www.tib-op.org/ojs/index.php/ocp/article/view/2913

  5. [5]

    Franklin, Ali Ghodsi, Joseph M

    Peter Bailis, Alan Fekete, Michael J. Franklin, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. Coordination avoidance in database systems. InProceedings of the VLDB Endowment, 2014. doi:10.14778/2735508.2735509

  6. [6]

    Bakker, Daan L

    Roos M. Bakker, Daan L. Di Scala, Maaike H. T. de Boer, and Stephan A. Raaijmakers. Ontology learning with LLMs: A benchmark study on axiom identification, 2025. URL https://arxiv.org/abs/2512.05594

  7. [7]

    A critique of ANSI SQL isolation levels.SIGMOD Rec., 24(2):1–10, May 1995

    Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O’Neil, and Patrick O’Neil. A critique of ANSI SQL isolation levels.SIGMOD Rec., 24(2):1–10, May 1995. ISSN 0163-

  8. [8]

    URLhttps://doi.org/10.1145/568271.223785

    doi:10.1145/568271.223785. URLhttps://doi.org/10.1145/568271.223785

  9. [9]

    Blackhat LLMs, 2026

    Nicholas Carlini. Blackhat LLMs, 2026. URLhttps://youtu.be/1sd26pWhfmg

  10. [10]

    Evaluating and mitigating the growing risk of LLM- discovered 0-days, 2026

    Nicholas Carlini, Keane Lucas, Evyatar Ben Asher, Newton Cheng, Hasnain Lakhani, David Forsythe, and Kyla Guru. Evaluating and mitigating the growing risk of LLM- discovered 0-days, 2026. URLhttps://red.anthropic.com/2026/zero-days/

  11. [11]

    Risk agents

    CashApp/Block. Risk agents. Minutes of meeting of ROOST Coop working group, 2026. URLhttps://github.com/roostorg/coop/discussions/171

  12. [12]

    Scaling managed agents: Decoupling the brain from the hands,

    Anthropic Engineering. Scaling managed agents: Decoupling the brain from the hands,

  13. [13]

    URLhttps://www.anthropic.com/engineering/managed-agents

  14. [14]

    Basic Formal Ontology, 2002–2026

    Barry Smith et al. Basic Formal Ontology, 2002–2026. URLhttps:// basic-formal-ontology.org/

  15. [15]

    Basic Formal Ontology 2.0, 2020

    Barry Smith et al. Basic Formal Ontology 2.0, 2020. URLhttps://raw. githubusercontent.com/BFO-ontology/BFO/master/docs/bfo2-reference/ BFO2-Reference.pdf. Direct link to pdf

  16. [16]

    Basic Formal Ontology ISO standard, 2021

    Barry Smith et al. Basic Formal Ontology ISO standard, 2021. URLhttps://standards. iso.org/iso-iec/21838/-2/ed-1/en/. 2021

  17. [17]

    LLMs4Life: Large language models for ontology learning in life sciences, 2024

    Nadeen Fathallah, Steffen Staab, and Alsayed Algergawy. LLMs4Life: Large language models for ontology learning in life sciences, 2024. URLhttps://arxiv.org/abs/2412. 02035

  18. [18]

    Ontology.Encyclopedia of Database Systems, 2009

    Tom Gruber. Ontology.Encyclopedia of Database Systems, 2009. doi:10.1007/978-0-387- 39940-9 1318. 22

  19. [19]

    O’Reilly Media, Sebastopol, CA, October 2025

    Valliappa Lakshmanan and Hannes Hapke.Generative AI Design Patterns. O’Reilly Media, Sebastopol, CA, October 2025. ISBN 9798341622654. URLhttps://www.oreilly.com/ library/view/generative-ai-design/9798341622654/

  20. [20]

    Butler W. Lampson. A note on the confinement problem.Commun. ACM, 16(10):613–615, October 1973. ISSN 0001-0782. doi:10.1145/362375.362389. URLhttps://doi.org/10. 1145/362375.362389

  21. [21]

    On- tology generation using large language models

    Anna Sofia Lippolis, Mohammad Javad Saeedizade, Robin Keskis¨ arkk¨ a, Sara Zuppiroli, Miguel Ceriani, Aldo Gangemi, Eva Blomqvist, and Andrea Giovanni Nuzzolese. On- tology generation using large language models. InThe Semantic Web: 22nd Eu- ropean Semantic Web Conference, ESWC 2025, Portoroz, Slovenia, June 1–5, 2025, Proceedings, Part I, pages 321–341,...

  22. [22]

    Claude code found a linux vulnerability hidden for 23 years, 2026

    Max Lynch. Claude code found a linux vulnerability hidden for 23 years, 2026. URL https://mtlynch.io/claude-code-found-linux-vulnerability/

  23. [23]

    Do LLMs really adapt to domains? an ontology learning perspective

    Huu Tan Mai, Cuong Xuan Chu, and Heiko Paulheim. Do LLMs really adapt to domains? an ontology learning perspective. In Gianluca Demartini, Katja Hose, Maribel Acosta, Matteo Palmonari, Gong Cheng, Hala Skaf-Molli, Nicolas Ferranti, Daniel Hern´ andez, and Aidan Hogan, editors,The Semantic Web – ISWC 2024, pages 126–143, Cham, 2025. Springer Nature Switzer...

  24. [24]

    Microsoft agent framework, 2026

    Microsoft. Microsoft agent framework, 2026. URLhttps://github.com/microsoft/ agent-framework

  25. [25]

    Pierce.Types and Programming Languages

    Benjamin C. Pierce.Types and Programming Languages. MIT Press, 2002. URLhttps: //mitpress.mit.edu/9780262162098/types-and-programming-languages/

  26. [26]

    Osprey, 2026

    ROOST. Osprey, 2026. URLhttps://github.com/roostorg/osprey

  27. [27]

    Awesome LLM apps, 2026

    Shubham Saboo. Awesome LLM apps, 2026. URLhttps://github.com/Shubhamsaboo/ awesome-llm-apps

  28. [28]

    Different coding agent strategies are better at different jobs, 2026

    Aaron Sterling. Different coding agent strategies are better at different jobs, 2026. In preparation

  29. [29]

    AHCCCS covered behavioral health services guide, 2025

    Arizona Health Care Cost Containment System. AHCCCS covered behavioral health services guide, 2025. URLhttps://www.azahcccs.gov/PlansProviders/Downloads/ MedicalCodingResources/AHCCCSCoveredBHServicesManual.pdf

  30. [30]

    Wright and Matthias Felleisen

    Andrew K. Wright and Matthias Felleisen. A syntactic approach to type soundness.In- formation and Computation, 115(1):38–94, 1994. doi:10.1006/inco.1994.1093

  31. [31]

    a list of state components a process reads or writes

    Xiao Zhang, Huiyuan Lai, Qianru Meng, and Johan Bos. OntoURL: A benchmark for evaluating large language models on symbolic ontological understanding, reasoning and learning, 2025. URLhttps://arxiv.org/abs/2505.11031. A The Agentic Redux Calculus Note about the proofs in this Appendix:The text in the main body of the paper is 99% text I wrote myself. The t...

  32. [32]

    consultωto select a roler∈Roles D (thearbitration choice)

  33. [33]

    compute the sliceσ=project D(s.domain, r)

  34. [34]

    invokeα r(σ, ω) to obtain an actiona

  35. [35]

    At most one (r, a) pair is in flight to the commit discipline at any time

    return (r, a). At most one (r, a) pair is in flight to the commit discipline at any time. Simultaneity among real-world agents is absorbed into the arbitration choice in Step 1. Definition A.35(Agentic Redux commit discipline).TheAgentic Redux commit discipline κAgentic Redux is the function that, on input (s, a, t, ω):

  36. [36]

    computes thecandidate domain stateˆs≜applyMutation D(s.domain, a, t)

  37. [37]

    walksinvariants D in list order, evaluating each invariant on ˆs

  38. [38]

    halts on the first invariantι k ∈invariants D whose evaluation on ˆsdoes not returnpass: •ifι k(ˆs) =reject(m), appends tos.audita single entry with tagrejected, actiona, timestampt, and witness slotSome(id(ι k), m), and returns the framework state with domainandqueuecomponents unchanged; •ifι k(ˆs) =escalate(m), appends tos.audita single entry with tages...

  39. [39]

    what happens if two agents act at once?

    if every invariant returnspasson ˆs, appends tos.audita single entry with tagapproved, actiona, timestampt, and witness slotNone, and returns the framework state withdomain component ˆsandqueuecomponent unchanged. Each branch appends exactly one audit entry; the approve and reject branches leave the queue unchanged, and the escalate branch appends exactly...