pith. sign in

arxiv: 2605.23951 · v1 · pith:A3AZ7I7Znew · submitted 2026-05-09 · 💻 cs.AI · cs.LO· cs.MA

Methods for Formal Verification of Agent Skills: Three Layers Toward a Mechanically Checkable Capability-Containment Proof

Pith reviewed 2026-06-30 22:55 UTC · model grok-4.3

classification 💻 cs.AI cs.LOcs.MA
keywords agent skillsformal verificationcapability containmentstatic analysisrefinement typesmodel checkingLLM agents
0
0 comments X

The pith

Three composable methods raise agent skills from declared to formally verified capability containment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines a semantics for skill behaviour that separates the deterministic script side from the non-deterministic LLM side. It then supplies three methods—static analysis over an effect lattice, a refinement type system for tool calls, and SMT-bounded model checking—that together establish capability containment. These methods reuse existing tools and extend the SKILL.md format so that verification becomes mechanical. The composition covers the runtime threat model except for the residual case of LLM refusal to act, which the parent biconditional already catches at session boundaries.

Core claim

The three layers of sound static capability-containment analysis via abstract interpretation, a refinement type system that rejects out-of-manifest tool calls, and SMT-bounded model checking against the biconditional criterion together soundlessly cover the parent paper's threat model modulo the single residual of LLM refusal to act.

What carries the argument

Three composable verification layers over a semantics that treats skill execution as a deterministic script reachable through a non-deterministic LLM side.

If this is right

  • Existing tools such as Z3, Semgrep and refinement-type checkers suffice for formal skill verification.
  • The resulting proof artifact extends the SKILL.md convention without requiring new infrastructure.
  • Verification becomes a zero-dependency process that ships as JavaScript modules with unit tests.
  • Operators can mechanically check containment without building custom verifiers for each skill.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same layering could be applied to skill manifests in agent systems that use different runtimes.
  • Running the methods on a larger corpus of real skills would reveal practical bounds on verification time.
  • Incorporating refusal behaviour directly into the model checking layer would eliminate the residual case.

Load-bearing premise

The chosen semantics for skill behaviour matches how an LLM-driven runtime actually consumes the skill.

What would settle it

A skill whose script passes all three layers yet produces a capability violation inside the runtime's transaction-buffer horizon.

read the original abstract

The companion paper introduced a four-level verification lattice on agent-skill manifests (unverified, declared, tested, formal) and left the top level aspirational. This paper closes that gap. We give a precise semantics for skill behaviour faithful to how a skill is consumed by an LLM-driven runtime (a deterministic script-side reachable through a non-deterministic LLM-side), state the verification problem as a capability-containment property over that semantics, and present three composable methods that together raise a skill from declared or tested to formal: (1) sound static capability-containment analysis of the script-side via abstract interpretation over a small effect lattice; (2) a refinement type system for tool-call envelopes that mechanically rejects any call whose statically-inferred capability is not in the manifest's declared set; (3) SMT-bounded model checking against the parent paper's biconditional correctness criterion, with the bound chosen so any counter-example fitting the runtime's transaction-buffer horizon is exhibited as a concrete trace. We prove the three layers composed soundly cover the parent paper's threat model modulo a single residual (the LLM's freedom to refuse to act) that the parent paper's runtime biconditional catches at session boundary. The methods reuse existing well-engineered tools (Z3, Semgrep, CodeQL, refinement-type checkers, mechanised proof assistants) rather than asking operators to build new ones, and the proof-carrying artifact extends the existing SKILL.md convention. All three methods plus the bundle producer and re-checker ship as zero-dependency JavaScript modules in the open-source enclawed framework (https://github.com/metereconsulting/enclawed; project page https://www.enclawed.com/), with 53 unit tests and an end-to-end CLI demo on a sample skill.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper claims to close the aspirational top level of a four-level verification lattice for agent-skill manifests by defining a precise semantics for skill behaviour (deterministic script-side reachable via non-deterministic LLM-side), formulating the problem as a capability-containment property, and presenting three composable methods—sound abstract interpretation over an effect lattice, a refinement type system for tool-call envelopes, and SMT-bounded model checking against the parent paper's biconditional—that together yield a mechanically checkable proof covering the parent threat model modulo only the LLM refusal residual. The methods reuse existing tools (Z3, Semgrep, etc.), extend the SKILL.md convention with proof-carrying artifacts, and are shipped as zero-dependency JS modules with 53 unit tests and an end-to-end demo.

Significance. If the central soundness claim holds, the work supplies a concrete, tool-reusing route to formal capability containment for LLM-driven agent skills without requiring operators to develop new verifiers. Strengths include the explicit reuse of well-engineered off-the-shelf components, the open-source release with unit tests and CLI demo, and the machine-checkable nature of the three-layer composition; these elements support reproducibility and practical adoption if the faithfulness assumption is adequately justified.

major comments (1)
  1. [Semantics (abstract and § on semantics)] Semantics section (as described in the abstract and introduction): the manuscript asserts that the provided semantics is 'faithful to how a skill is consumed by an LLM-driven runtime' and that the three layers 'compose soundly' to cover the threat model modulo the refusal residual, but supplies no explicit encoding of LLM-side non-determinism (tool-call envelope selection, refusal interaction with the transaction buffer) nor any validation against observed LLM traces. This assumption is load-bearing for the transfer of the containment property to the actual runtime.
minor comments (2)
  1. [Abstract / Introduction] The abstract refers to 'the parent paper' without a numbered citation; the introduction should include an explicit reference to the companion work when stating the threat model and biconditional criterion.
  2. [Implementation / § on methods] The claim that the methods 'reuse existing well-engineered tools' is repeated; a single consolidated table listing each tool, the layer that uses it, and the precise guarantee it supplies would improve clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and constructive review. The single major comment identifies a genuine gap in the explicit treatment of LLM-side non-determinism within the semantics section. We address it below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Semantics (abstract and § on semantics)] Semantics section (as described in the abstract and introduction): the manuscript asserts that the provided semantics is 'faithful to how a skill is consumed by an LLM-driven runtime' and that the three layers 'compose soundly' to cover the threat model modulo the refusal residual, but supplies no explicit encoding of LLM-side non-determinism (tool-call envelope selection, refusal interaction with the transaction buffer) nor any validation against observed LLM traces. This assumption is load-bearing for the transfer of the containment property to the actual runtime.

    Authors: We agree that the manuscript would be strengthened by an explicit encoding of the non-deterministic LLM-side choices. In the revised version we will add a dedicated subsection to the semantics section that defines (i) the non-deterministic selection of tool-call envelopes and (ii) the interaction of refusal decisions with the transaction buffer, both expressed as nondeterministic operators over the deterministic script-side transition relation. The three verification layers will then be shown to be sound with respect to this extended semantics. Regarding validation against observed LLM traces, the semantics is intentionally derived from the runtime biconditional of the parent paper rather than from new empirical data; we will add a short discussion clarifying this derivation and noting that direct trace validation lies outside the scope of the present formal-methods contribution. These changes will make the load-bearing faithfulness assumption fully explicit without altering the core technical claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper states a semantics for skill behaviour as an explicit modelling assumption ('faithful to how a skill is consumed by an LLM-driven runtime (a deterministic script-side reachable through a non-deterministic LLM-side)'), defines the verification problem over that semantics, and supplies three independent methods (abstract interpretation, refinement types, SMT model checking) that reuse off-the-shelf tools. The composition proof is claimed to cover the parent paper's threat model modulo one residual; the parent reference supplies the target biconditional and threat model but does not reduce the present methods or their soundness argument to a self-citation chain or to quantities defined by the inputs. No equations, fitted parameters, or ansatzes are shown that would make any 'prediction' or 'result' equivalent to its inputs by construction. The work is therefore self-contained against external benchmarks (Z3, Semgrep, etc.).

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no information is available on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5881 in / 1167 out tokens · 38902 ms · 2026-06-30T22:55:59.632133+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 1 linked inside Pith

  1. [1]

    Cedar policy language

    Amazon Web Services. Cedar policy language. https://www.cedarpolicy.com/, 2024. Formally-verified policy language for runtime authorisation; paired with a Lean/Rust ver- ification effort

  2. [2]

    AgentHarm: A benchmark for measuring harmfulness of LLM agents

    Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Bailey, Yinzhi Kang, Kellin Pelrine, Anton Albert, Fabien Roger, Tim Riedel, Alexandre Lacoste, Anastasiia Skripko, Yixin Wang, Jiaxin Ji, Christopher Berner, Mary Phuong, Hongseok Liu, Iain Murray, Jamie Hayes, Helen Toner, David Krueger,...

  3. [3]

    Claude Skills: User-defined capabilities for Claude

    Anthropic. Claude Skills: User-defined capabilities for Claude. https://www.anthropic. com/news/skills, 2024. Production skill artefact format with SKILL.md convention; human curation, no formal-property attestation. 25

  4. [4]

    Modelcontextprotocol(MCP)specification

    Anthropic. Modelcontextprotocol(MCP)specification. https://modelcontextprotocol.io/,

  5. [5]

    Open protocol for connecting AI assistants to tools, data sources, and skill-like artefacts

  6. [6]

    Andrew W. Appel. Foundational proof-carrying code. InProceedings of LICS, 2001

  7. [7]

    Constitutional AI: Harmlessness from AI feedback

    Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, AnnaChen, AnnaGoldie, AzaliaMirhoseini, CameronMcKinnon, CarolChen, CatherineOlsson, Christopher Olah, Danny Hernandez, Dawn Drain, et al. Constitutional AI: Harmlessness from AI feedback. InarXiv:2212.08073, 2022. Training-time alignment via principle-derived self-cr...

  8. [8]

    Model checking probabilistic systems.Handbook of Model Checking, Chapter 28, 2018

    Christel Baier, Luca de Alfaro, Vojtěch Forejt, and Marta Kwiatkowska. Model checking probabilistic systems.Handbook of Model Checking, Chapter 28, 2018

  9. [9]

    Introduction to runtime verification.Lectures on Runtime Verification, LNCS 10457, 2018

    Ezio Bartocci, Yliès Falcone, Adrian Francalanza, and Giles Reger. Introduction to runtime verification.Lectures on Runtime Verification, LNCS 10457, 2018

  10. [10]

    Clarke, and Yunshan Zhu

    Armin Biere, Alessandro Cimatti, Edmund M. Clarke, and Yunshan Zhu. Symbolic model checking without BDDs.Proceedings of TACAS, 1999

  11. [11]

    Wasmtime: A standalone WebAssembly runtime with capability-based sandboxing

    Bytecode Alliance. Wasmtime: A standalone WebAssembly runtime with capability-based sandboxing. InBytecode Alliance technical report, 2024.https://wasmtime.dev/

  12. [12]

    Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs.Proceedings of OSDI, 2008

  13. [13]

    AgentPoison: Red-teaming LLM agents via poisoning memory or knowledge bases.arXiv preprint arXiv:2407.12784, 2025

    Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. AgentPoison: Red-teaming LLM agents via poisoning memory or knowledge bases.arXiv preprint arXiv:2407.12784, 2025

  14. [14]

    Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints

    Patrick Cousot and Radhia Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. InProceedings of POPL, 1977

  15. [15]

    Cover and Joy A

    Thomas M. Cover and Joy A. Thomas.Elements of Information Theory. Wiley-Interscience, 2nd edition, 2006

  16. [16]

    Z3: An efficient SMT solver

    Leonardo de Moura and Nikolaj Bjørner. Z3: An efficient SMT solver. InProceedings of TACAS, 2008

  17. [17]

    Dorothy E. Denning. A lattice model of secure information flow. volume 19, pages 236–243, 1976

  18. [18]

    Formalizing the LLM as a probabilistic transducer for verification under uncertain inputs

    Yiyang Dong, Vicente Calvo Quintela, Yu Tang, Jiaxiong Liu, Yuhua Cao, and Yang Liu. Formalizing the LLM as a probabilistic transducer for verification under uncertain inputs. In Proceedings of the 1st International Workshop on Formal Methods for LLM-based Systems, 2025

  19. [19]

    GenAI against humanity: Nefarious applications of generative artificial intelligence and large language models.Journal of Computational Social Science, 7:549–569, 2024

    Emilio Ferrara. GenAI against humanity: Nefarious applications of generative artificial intelligence and large language models.Journal of Computational Social Science, 7:549–569, 2024

  20. [20]

    CodeQL: Variant analysis for software security, 2025.https://codeql.github.com/

    GitHub. CodeQL: Variant analysis for software security, 2025.https://codeql.github.com/. 26

  21. [21]

    DART: Directed automated random testing

    Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: Directed automated random testing. Proceedings of PLDI, 2005

  22. [22]

    Gary A. Kildall. A unified approach to global program optimization. InProceedings of POPL, 1973

  23. [23]

    PRISM4.0: Verificationofprobabilistic real-time systems

    MartaKwiatkowska, GethinNorman, andDavidParker. PRISM4.0: Verificationofprobabilistic real-time systems. InProceedings of CAV, 2011

  24. [24]

    LangChain: Build context-aware reasoning applications

    LangChain, Inc. LangChain: Build context-aware reasoning applications. https://www. langchain.com/, 2024. Agent-orchestration framework with tool integration; no static-analysis or proof layer

  25. [25]

    Formal verification of a realistic compiler.Communications of the ACM, 52(7): 107–115, 2009

    Xavier Leroy. Formal verification of a realistic compiler.Communications of the ACM, 52(7): 107–115, 2009

  26. [26]

    Sigstore: A new standard for signing, verifying, and protecting software

    Linux Foundation, Sigstore Project. Sigstore: A new standard for signing, verifying, and protecting software. https://www.sigstore.dev/, 2024. Keyless signing infrastructure for software artefacts; attests provenance, not behaviour

  27. [27]

    Type inference for refinements and effects.Proceedings of POPL, 2015

    Luísa Lourenço and Luís Caires. Type inference for refinements and effects.Proceedings of POPL, 2015

  28. [28]

    Lucassen and David K

    John M. Lucassen and David K. Gifford. Polymorphic effect systems.Proceedings of POPL, 1988

  29. [29]

    enclawed: A configurable, sector-neutral hardening framework for personal-class ai assistant gateways, 2026

    Alfredo Metere. enclawed: A configurable, sector-neutral hardening framework for personal-class ai assistant gateways, 2026. Open-source reference implementation. https://github.com/metereconsulting/enclawed

  30. [30]

    Skills as verifiable artifacts: A trust schema and a biconditional correctness criterion for human-in-the-loop agent runtimes, 2026

    Alfredo Metere. Skills as verifiable artifacts: A trust schema and a biconditional correctness criterion for human-in-the-loop agent runtimes, 2026. URLhttps://arxiv.org/pdf/2605. 00424v1. Companion paper

  31. [31]

    enclawed project page.https://www.enclawed.com/, 2026

    Metere Consulting, LLC. enclawed project page.https://www.enclawed.com/, 2026. Docu- mentation, downloads, and the open-source distribution of the enclawed framework

  32. [32]

    Pyright: Static type checker for Python, 2025.https://github.com/microsoft/ pyright

    Microsoft. Pyright: Static type checker for Python, 2025.https://github.com/microsoft/ pyright

  33. [33]

    AutoGen: Multi-agent conversation framework

    Microsoft Research. AutoGen: Multi-agent conversation framework. https://microsoft. github.io/autogen/, 2024. Multi-agent orchestration; tool calls go through agent conversa- tions, no formal-property attestation

  34. [34]

    Miller.Robust composition: Towards a unified approach to access control and concur- rency control

    Mark S. Miller.Robust composition: Towards a unified approach to access control and concur- rency control. PhD thesis, Johns Hopkins University, 2006

  35. [35]

    ATLAS: Adversarial threat landscape for ai systems, 2024

    MITRE Corporation. ATLAS: Adversarial threat landscape for ai systems, 2024. https: //atlas.mitre.org/

  36. [36]

    George C. Necula. Proof-carrying code. InProceedings of POPL, 1997. 27

  37. [37]

    Supply-chain levels for software artefacts (SLSA).https: //slsa.dev/, 2024

    Open Source Security Foundation. Supply-chain levels for software artefacts (SLSA).https: //slsa.dev/, 2024. Tiered supply-chain integrity framework (Levels 0–3); a build-process attestation, not a behavioural one

  38. [38]

    Function calling and tool use

    OpenAI. Function calling and tool use. https://platform.openai.com/docs/guides/ function-calling, 2024. JSON-Schema-typed function declarations for LLM tool invocation; the closest production-deployed peer to skill manifests

  39. [39]

    GPTs and the GPT store

    OpenAI. GPTs and the GPT store. https://openai.com/blog/introducing-gpts, 2024. Custom-GPT marketplace; review-based governance, no formal containment proof

  40. [40]

    LLM AI cybersecurity & governance checklist (top 10)

    OWASP Foundation. LLM AI cybersecurity & governance checklist (top 10). OWASP, 2024

  41. [41]

    Semgrep: Lightweight static analysis for many languages, 2025.https://semgrep.dev/

    r2c. Semgrep: Lightweight static analysis for many languages, 2025.https://semgrep.dev/

  42. [42]

    Springer Series in Synergetics

    Hannes Risken.The Fokker–Planck Equation: Methods of Solution and Applications. Springer Series in Synergetics. Springer, 2nd edition, 1996

  43. [43]

    Alexander Robey, Eric Wong, Hamed Hassani, and George J. Pappas. SmoothLLM: Defending large language models against jailbreaking attacks. InProceedings of NeurIPS, 2023

  44. [44]

    Rondon, Ming Kawaguchi, and Ranjit Jhala

    Patrick M. Rondon, Ming Kawaguchi, and Ranjit Jhala. Liquid types. InProceedings of PLDI, 2008

  45. [45]

    Andrei Sabelfeld and Andrew C. Myers. Language-based information-flow security.IEEE Journal on Selected Areas in Communications, 21(1):5–19, 2003

  46. [46]

    The update framework (TUF) specification, 2024

    Justin Samuel and Justin Cappos. The update framework (TUF) specification, 2024. Specifica- tion version 1.0.x

  47. [47]

    Open policy agent (OPA): Policy as code

    Tim Sandall. Open policy agent (OPA): Policy as code. InProceedings of KubeCon / Cloud- Native Computing Foundation, 2018.https://www.openpolicyagent.org/

  48. [48]

    Claude E. Shannon. A mathematical theory of communication.Bell System Technical Journal, 27(3):379–423, 623–656, 1948

  49. [49]

    Reasoning about a machine with local capabilities

    Lau Skorstengaard, Dominique Devriese, and Lars Birkedal. Reasoning about a machine with local capabilities. InProceedings of POPL, 2018

  50. [50]

    Strogatz.Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering

    Steven H. Strogatz.Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering. Westview Press, 2nd edition, 2014

  51. [51]

    Secure distributed programming with value-dependent types

    Nikhil Swamy, Juan Chen, Cédric Fournet, Pierre-Yves Strub, Karthikeyan Bhargavan, and Jean Yang. Secure distributed programming with value-dependent types. InProceedings of ICFP, 2011

  52. [52]

    The object-capability discipline, 2025.http://erights.org/ elib/capability/ode/index.html

    The Object-Capability Wiki. The object-capability discipline, 2025.http://erights.org/ elib/capability/ode/index.html

  53. [53]

    in-toto: Providing farm-to-table guarantees for bits and bytes

    Santiago Torres-Arias, Hammad Afzali, Trishank Karthik Kuppusamy, Reza Curtmola, and Justin Cappos. in-toto: Providing farm-to-table guarantees for bits and bytes. https:// in-toto.io/, 2019. Provenance-tracking framework underlying SLSA. 28

  54. [54]

    Seidel, Ranjit Jhala, Dimitrios Vytiniotis, and Simon Peyton Jones

    Niki Vazou, Eric L. Seidel, Ranjit Jhala, Dimitrios Vytiniotis, and Simon Peyton Jones. Refinement types for Haskell.Proceedings of ICFP, 2014

  55. [55]

    A sound type system for secure flow analysis.Journal of Computer Security, 4(2–3):167–187, 1996

    Dennis Volpano, Geoffrey Smith, and Cynthia Irvine. A sound type system for secure flow analysis.Journal of Computer Security, 4(2–3):167–187, 1996

  56. [56]

    The WebAssembly system interface (WASI), 2024.https: //wasi.dev/

    WebAssembly Community Group. The WebAssembly system interface (WASI), 2024.https: //wasi.dev/. 29