Methods for Formal Verification of Agent Skills: Three Layers Toward a Mechanically Checkable Capability-Containment Proof
Pith reviewed 2026-06-30 22:55 UTC · model grok-4.3
The pith
Three composable methods raise agent skills from declared to formally verified capability containment.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The three layers of sound static capability-containment analysis via abstract interpretation, a refinement type system that rejects out-of-manifest tool calls, and SMT-bounded model checking against the biconditional criterion together soundlessly cover the parent paper's threat model modulo the single residual of LLM refusal to act.
What carries the argument
Three composable verification layers over a semantics that treats skill execution as a deterministic script reachable through a non-deterministic LLM side.
If this is right
- Existing tools such as Z3, Semgrep and refinement-type checkers suffice for formal skill verification.
- The resulting proof artifact extends the SKILL.md convention without requiring new infrastructure.
- Verification becomes a zero-dependency process that ships as JavaScript modules with unit tests.
- Operators can mechanically check containment without building custom verifiers for each skill.
Where Pith is reading between the lines
- The same layering could be applied to skill manifests in agent systems that use different runtimes.
- Running the methods on a larger corpus of real skills would reveal practical bounds on verification time.
- Incorporating refusal behaviour directly into the model checking layer would eliminate the residual case.
Load-bearing premise
The chosen semantics for skill behaviour matches how an LLM-driven runtime actually consumes the skill.
What would settle it
A skill whose script passes all three layers yet produces a capability violation inside the runtime's transaction-buffer horizon.
read the original abstract
The companion paper introduced a four-level verification lattice on agent-skill manifests (unverified, declared, tested, formal) and left the top level aspirational. This paper closes that gap. We give a precise semantics for skill behaviour faithful to how a skill is consumed by an LLM-driven runtime (a deterministic script-side reachable through a non-deterministic LLM-side), state the verification problem as a capability-containment property over that semantics, and present three composable methods that together raise a skill from declared or tested to formal: (1) sound static capability-containment analysis of the script-side via abstract interpretation over a small effect lattice; (2) a refinement type system for tool-call envelopes that mechanically rejects any call whose statically-inferred capability is not in the manifest's declared set; (3) SMT-bounded model checking against the parent paper's biconditional correctness criterion, with the bound chosen so any counter-example fitting the runtime's transaction-buffer horizon is exhibited as a concrete trace. We prove the three layers composed soundly cover the parent paper's threat model modulo a single residual (the LLM's freedom to refuse to act) that the parent paper's runtime biconditional catches at session boundary. The methods reuse existing well-engineered tools (Z3, Semgrep, CodeQL, refinement-type checkers, mechanised proof assistants) rather than asking operators to build new ones, and the proof-carrying artifact extends the existing SKILL.md convention. All three methods plus the bundle producer and re-checker ship as zero-dependency JavaScript modules in the open-source enclawed framework (https://github.com/metereconsulting/enclawed; project page https://www.enclawed.com/), with 53 unit tests and an end-to-end CLI demo on a sample skill.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to close the aspirational top level of a four-level verification lattice for agent-skill manifests by defining a precise semantics for skill behaviour (deterministic script-side reachable via non-deterministic LLM-side), formulating the problem as a capability-containment property, and presenting three composable methods—sound abstract interpretation over an effect lattice, a refinement type system for tool-call envelopes, and SMT-bounded model checking against the parent paper's biconditional—that together yield a mechanically checkable proof covering the parent threat model modulo only the LLM refusal residual. The methods reuse existing tools (Z3, Semgrep, etc.), extend the SKILL.md convention with proof-carrying artifacts, and are shipped as zero-dependency JS modules with 53 unit tests and an end-to-end demo.
Significance. If the central soundness claim holds, the work supplies a concrete, tool-reusing route to formal capability containment for LLM-driven agent skills without requiring operators to develop new verifiers. Strengths include the explicit reuse of well-engineered off-the-shelf components, the open-source release with unit tests and CLI demo, and the machine-checkable nature of the three-layer composition; these elements support reproducibility and practical adoption if the faithfulness assumption is adequately justified.
major comments (1)
- [Semantics (abstract and § on semantics)] Semantics section (as described in the abstract and introduction): the manuscript asserts that the provided semantics is 'faithful to how a skill is consumed by an LLM-driven runtime' and that the three layers 'compose soundly' to cover the threat model modulo the refusal residual, but supplies no explicit encoding of LLM-side non-determinism (tool-call envelope selection, refusal interaction with the transaction buffer) nor any validation against observed LLM traces. This assumption is load-bearing for the transfer of the containment property to the actual runtime.
minor comments (2)
- [Abstract / Introduction] The abstract refers to 'the parent paper' without a numbered citation; the introduction should include an explicit reference to the companion work when stating the threat model and biconditional criterion.
- [Implementation / § on methods] The claim that the methods 'reuse existing well-engineered tools' is repeated; a single consolidated table listing each tool, the layer that uses it, and the precise guarantee it supplies would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The single major comment identifies a genuine gap in the explicit treatment of LLM-side non-determinism within the semantics section. We address it below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Semantics (abstract and § on semantics)] Semantics section (as described in the abstract and introduction): the manuscript asserts that the provided semantics is 'faithful to how a skill is consumed by an LLM-driven runtime' and that the three layers 'compose soundly' to cover the threat model modulo the refusal residual, but supplies no explicit encoding of LLM-side non-determinism (tool-call envelope selection, refusal interaction with the transaction buffer) nor any validation against observed LLM traces. This assumption is load-bearing for the transfer of the containment property to the actual runtime.
Authors: We agree that the manuscript would be strengthened by an explicit encoding of the non-deterministic LLM-side choices. In the revised version we will add a dedicated subsection to the semantics section that defines (i) the non-deterministic selection of tool-call envelopes and (ii) the interaction of refusal decisions with the transaction buffer, both expressed as nondeterministic operators over the deterministic script-side transition relation. The three verification layers will then be shown to be sound with respect to this extended semantics. Regarding validation against observed LLM traces, the semantics is intentionally derived from the runtime biconditional of the parent paper rather than from new empirical data; we will add a short discussion clarifying this derivation and noting that direct trace validation lies outside the scope of the present formal-methods contribution. These changes will make the load-bearing faithfulness assumption fully explicit without altering the core technical claims. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper states a semantics for skill behaviour as an explicit modelling assumption ('faithful to how a skill is consumed by an LLM-driven runtime (a deterministic script-side reachable through a non-deterministic LLM-side)'), defines the verification problem over that semantics, and supplies three independent methods (abstract interpretation, refinement types, SMT model checking) that reuse off-the-shelf tools. The composition proof is claimed to cover the parent paper's threat model modulo one residual; the parent reference supplies the target biconditional and threat model but does not reduce the present methods or their soundness argument to a self-citation chain or to quantities defined by the inputs. No equations, fitted parameters, or ansatzes are shown that would make any 'prediction' or 'result' equivalent to its inputs by construction. The work is therefore self-contained against external benchmarks (Z3, Semgrep, etc.).
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Cedar policy language
Amazon Web Services. Cedar policy language. https://www.cedarpolicy.com/, 2024. Formally-verified policy language for runtime authorisation; paired with a Lean/Rust ver- ification effort
2024
-
[2]
AgentHarm: A benchmark for measuring harmfulness of LLM agents
Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Bailey, Yinzhi Kang, Kellin Pelrine, Anton Albert, Fabien Roger, Tim Riedel, Alexandre Lacoste, Anastasiia Skripko, Yixin Wang, Jiaxin Ji, Christopher Berner, Mary Phuong, Hongseok Liu, Iain Murray, Jamie Hayes, Helen Toner, David Krueger,...
2025
-
[3]
Claude Skills: User-defined capabilities for Claude
Anthropic. Claude Skills: User-defined capabilities for Claude. https://www.anthropic. com/news/skills, 2024. Production skill artefact format with SKILL.md convention; human curation, no formal-property attestation. 25
2024
-
[4]
Modelcontextprotocol(MCP)specification
Anthropic. Modelcontextprotocol(MCP)specification. https://modelcontextprotocol.io/,
-
[5]
Open protocol for connecting AI assistants to tools, data sources, and skill-like artefacts
-
[6]
Andrew W. Appel. Foundational proof-carrying code. InProceedings of LICS, 2001
2001
-
[7]
Constitutional AI: Harmlessness from AI feedback
Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, AnnaChen, AnnaGoldie, AzaliaMirhoseini, CameronMcKinnon, CarolChen, CatherineOlsson, Christopher Olah, Danny Hernandez, Dawn Drain, et al. Constitutional AI: Harmlessness from AI feedback. InarXiv:2212.08073, 2022. Training-time alignment via principle-derived self-cr...
Pith/arXiv arXiv 2022
-
[8]
Model checking probabilistic systems.Handbook of Model Checking, Chapter 28, 2018
Christel Baier, Luca de Alfaro, Vojtěch Forejt, and Marta Kwiatkowska. Model checking probabilistic systems.Handbook of Model Checking, Chapter 28, 2018
2018
-
[9]
Introduction to runtime verification.Lectures on Runtime Verification, LNCS 10457, 2018
Ezio Bartocci, Yliès Falcone, Adrian Francalanza, and Giles Reger. Introduction to runtime verification.Lectures on Runtime Verification, LNCS 10457, 2018
2018
-
[10]
Clarke, and Yunshan Zhu
Armin Biere, Alessandro Cimatti, Edmund M. Clarke, and Yunshan Zhu. Symbolic model checking without BDDs.Proceedings of TACAS, 1999
1999
-
[11]
Wasmtime: A standalone WebAssembly runtime with capability-based sandboxing
Bytecode Alliance. Wasmtime: A standalone WebAssembly runtime with capability-based sandboxing. InBytecode Alliance technical report, 2024.https://wasmtime.dev/
2024
-
[12]
Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs.Proceedings of OSDI, 2008
2008
-
[13]
Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. AgentPoison: Red-teaming LLM agents via poisoning memory or knowledge bases.arXiv preprint arXiv:2407.12784, 2025
arXiv 2025
-
[14]
Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints
Patrick Cousot and Radhia Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. InProceedings of POPL, 1977
1977
-
[15]
Cover and Joy A
Thomas M. Cover and Joy A. Thomas.Elements of Information Theory. Wiley-Interscience, 2nd edition, 2006
2006
-
[16]
Z3: An efficient SMT solver
Leonardo de Moura and Nikolaj Bjørner. Z3: An efficient SMT solver. InProceedings of TACAS, 2008
2008
-
[17]
Dorothy E. Denning. A lattice model of secure information flow. volume 19, pages 236–243, 1976
1976
-
[18]
Formalizing the LLM as a probabilistic transducer for verification under uncertain inputs
Yiyang Dong, Vicente Calvo Quintela, Yu Tang, Jiaxiong Liu, Yuhua Cao, and Yang Liu. Formalizing the LLM as a probabilistic transducer for verification under uncertain inputs. In Proceedings of the 1st International Workshop on Formal Methods for LLM-based Systems, 2025
2025
-
[19]
GenAI against humanity: Nefarious applications of generative artificial intelligence and large language models.Journal of Computational Social Science, 7:549–569, 2024
Emilio Ferrara. GenAI against humanity: Nefarious applications of generative artificial intelligence and large language models.Journal of Computational Social Science, 7:549–569, 2024
2024
-
[20]
CodeQL: Variant analysis for software security, 2025.https://codeql.github.com/
GitHub. CodeQL: Variant analysis for software security, 2025.https://codeql.github.com/. 26
2025
-
[21]
DART: Directed automated random testing
Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: Directed automated random testing. Proceedings of PLDI, 2005
2005
-
[22]
Gary A. Kildall. A unified approach to global program optimization. InProceedings of POPL, 1973
1973
-
[23]
PRISM4.0: Verificationofprobabilistic real-time systems
MartaKwiatkowska, GethinNorman, andDavidParker. PRISM4.0: Verificationofprobabilistic real-time systems. InProceedings of CAV, 2011
2011
-
[24]
LangChain: Build context-aware reasoning applications
LangChain, Inc. LangChain: Build context-aware reasoning applications. https://www. langchain.com/, 2024. Agent-orchestration framework with tool integration; no static-analysis or proof layer
2024
-
[25]
Formal verification of a realistic compiler.Communications of the ACM, 52(7): 107–115, 2009
Xavier Leroy. Formal verification of a realistic compiler.Communications of the ACM, 52(7): 107–115, 2009
2009
-
[26]
Sigstore: A new standard for signing, verifying, and protecting software
Linux Foundation, Sigstore Project. Sigstore: A new standard for signing, verifying, and protecting software. https://www.sigstore.dev/, 2024. Keyless signing infrastructure for software artefacts; attests provenance, not behaviour
2024
-
[27]
Type inference for refinements and effects.Proceedings of POPL, 2015
Luísa Lourenço and Luís Caires. Type inference for refinements and effects.Proceedings of POPL, 2015
2015
-
[28]
Lucassen and David K
John M. Lucassen and David K. Gifford. Polymorphic effect systems.Proceedings of POPL, 1988
1988
-
[29]
enclawed: A configurable, sector-neutral hardening framework for personal-class ai assistant gateways, 2026
Alfredo Metere. enclawed: A configurable, sector-neutral hardening framework for personal-class ai assistant gateways, 2026. Open-source reference implementation. https://github.com/metereconsulting/enclawed
2026
-
[30]
Skills as verifiable artifacts: A trust schema and a biconditional correctness criterion for human-in-the-loop agent runtimes, 2026
Alfredo Metere. Skills as verifiable artifacts: A trust schema and a biconditional correctness criterion for human-in-the-loop agent runtimes, 2026. URLhttps://arxiv.org/pdf/2605. 00424v1. Companion paper
2026
-
[31]
enclawed project page.https://www.enclawed.com/, 2026
Metere Consulting, LLC. enclawed project page.https://www.enclawed.com/, 2026. Docu- mentation, downloads, and the open-source distribution of the enclawed framework
2026
-
[32]
Pyright: Static type checker for Python, 2025.https://github.com/microsoft/ pyright
Microsoft. Pyright: Static type checker for Python, 2025.https://github.com/microsoft/ pyright
2025
-
[33]
AutoGen: Multi-agent conversation framework
Microsoft Research. AutoGen: Multi-agent conversation framework. https://microsoft. github.io/autogen/, 2024. Multi-agent orchestration; tool calls go through agent conversa- tions, no formal-property attestation
2024
-
[34]
Miller.Robust composition: Towards a unified approach to access control and concur- rency control
Mark S. Miller.Robust composition: Towards a unified approach to access control and concur- rency control. PhD thesis, Johns Hopkins University, 2006
2006
-
[35]
ATLAS: Adversarial threat landscape for ai systems, 2024
MITRE Corporation. ATLAS: Adversarial threat landscape for ai systems, 2024. https: //atlas.mitre.org/
2024
-
[36]
George C. Necula. Proof-carrying code. InProceedings of POPL, 1997. 27
1997
-
[37]
Supply-chain levels for software artefacts (SLSA).https: //slsa.dev/, 2024
Open Source Security Foundation. Supply-chain levels for software artefacts (SLSA).https: //slsa.dev/, 2024. Tiered supply-chain integrity framework (Levels 0–3); a build-process attestation, not a behavioural one
2024
-
[38]
Function calling and tool use
OpenAI. Function calling and tool use. https://platform.openai.com/docs/guides/ function-calling, 2024. JSON-Schema-typed function declarations for LLM tool invocation; the closest production-deployed peer to skill manifests
2024
-
[39]
GPTs and the GPT store
OpenAI. GPTs and the GPT store. https://openai.com/blog/introducing-gpts, 2024. Custom-GPT marketplace; review-based governance, no formal containment proof
2024
-
[40]
LLM AI cybersecurity & governance checklist (top 10)
OWASP Foundation. LLM AI cybersecurity & governance checklist (top 10). OWASP, 2024
2024
-
[41]
Semgrep: Lightweight static analysis for many languages, 2025.https://semgrep.dev/
r2c. Semgrep: Lightweight static analysis for many languages, 2025.https://semgrep.dev/
2025
-
[42]
Springer Series in Synergetics
Hannes Risken.The Fokker–Planck Equation: Methods of Solution and Applications. Springer Series in Synergetics. Springer, 2nd edition, 1996
1996
-
[43]
Alexander Robey, Eric Wong, Hamed Hassani, and George J. Pappas. SmoothLLM: Defending large language models against jailbreaking attacks. InProceedings of NeurIPS, 2023
2023
-
[44]
Rondon, Ming Kawaguchi, and Ranjit Jhala
Patrick M. Rondon, Ming Kawaguchi, and Ranjit Jhala. Liquid types. InProceedings of PLDI, 2008
2008
-
[45]
Andrei Sabelfeld and Andrew C. Myers. Language-based information-flow security.IEEE Journal on Selected Areas in Communications, 21(1):5–19, 2003
2003
-
[46]
The update framework (TUF) specification, 2024
Justin Samuel and Justin Cappos. The update framework (TUF) specification, 2024. Specifica- tion version 1.0.x
2024
-
[47]
Open policy agent (OPA): Policy as code
Tim Sandall. Open policy agent (OPA): Policy as code. InProceedings of KubeCon / Cloud- Native Computing Foundation, 2018.https://www.openpolicyagent.org/
2018
-
[48]
Claude E. Shannon. A mathematical theory of communication.Bell System Technical Journal, 27(3):379–423, 623–656, 1948
1948
-
[49]
Reasoning about a machine with local capabilities
Lau Skorstengaard, Dominique Devriese, and Lars Birkedal. Reasoning about a machine with local capabilities. InProceedings of POPL, 2018
2018
-
[50]
Strogatz.Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering
Steven H. Strogatz.Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering. Westview Press, 2nd edition, 2014
2014
-
[51]
Secure distributed programming with value-dependent types
Nikhil Swamy, Juan Chen, Cédric Fournet, Pierre-Yves Strub, Karthikeyan Bhargavan, and Jean Yang. Secure distributed programming with value-dependent types. InProceedings of ICFP, 2011
2011
-
[52]
The object-capability discipline, 2025.http://erights.org/ elib/capability/ode/index.html
The Object-Capability Wiki. The object-capability discipline, 2025.http://erights.org/ elib/capability/ode/index.html
2025
-
[53]
in-toto: Providing farm-to-table guarantees for bits and bytes
Santiago Torres-Arias, Hammad Afzali, Trishank Karthik Kuppusamy, Reza Curtmola, and Justin Cappos. in-toto: Providing farm-to-table guarantees for bits and bytes. https:// in-toto.io/, 2019. Provenance-tracking framework underlying SLSA. 28
2019
-
[54]
Seidel, Ranjit Jhala, Dimitrios Vytiniotis, and Simon Peyton Jones
Niki Vazou, Eric L. Seidel, Ranjit Jhala, Dimitrios Vytiniotis, and Simon Peyton Jones. Refinement types for Haskell.Proceedings of ICFP, 2014
2014
-
[55]
A sound type system for secure flow analysis.Journal of Computer Security, 4(2–3):167–187, 1996
Dennis Volpano, Geoffrey Smith, and Cynthia Irvine. A sound type system for secure flow analysis.Journal of Computer Security, 4(2–3):167–187, 1996
1996
-
[56]
The WebAssembly system interface (WASI), 2024.https: //wasi.dev/
WebAssembly Community Group. The WebAssembly system interface (WASI), 2024.https: //wasi.dev/. 29
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.