Methods for Formal Verification of Agent Skills: Three Layers Toward a Mechanically Checkable Capability-Containment Proof

Alfredo Metere

arxiv: 2605.23951 · v1 · pith:A3AZ7I7Znew · submitted 2026-05-09 · 💻 cs.AI · cs.LO· cs.MA

Methods for Formal Verification of Agent Skills: Three Layers Toward a Mechanically Checkable Capability-Containment Proof

Alfredo Metere This is my paper

Pith reviewed 2026-06-30 22:55 UTC · model grok-4.3

classification 💻 cs.AI cs.LOcs.MA

keywords agent skillsformal verificationcapability containmentstatic analysisrefinement typesmodel checkingLLM agents

0 comments

The pith

Three composable methods raise agent skills from declared to formally verified capability containment.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines a semantics for skill behaviour that separates the deterministic script side from the non-deterministic LLM side. It then supplies three methods—static analysis over an effect lattice, a refinement type system for tool calls, and SMT-bounded model checking—that together establish capability containment. These methods reuse existing tools and extend the SKILL.md format so that verification becomes mechanical. The composition covers the runtime threat model except for the residual case of LLM refusal to act, which the parent biconditional already catches at session boundaries.

Core claim

The three layers of sound static capability-containment analysis via abstract interpretation, a refinement type system that rejects out-of-manifest tool calls, and SMT-bounded model checking against the biconditional criterion together soundlessly cover the parent paper's threat model modulo the single residual of LLM refusal to act.

What carries the argument

Three composable verification layers over a semantics that treats skill execution as a deterministic script reachable through a non-deterministic LLM side.

If this is right

Existing tools such as Z3, Semgrep and refinement-type checkers suffice for formal skill verification.
The resulting proof artifact extends the SKILL.md convention without requiring new infrastructure.
Verification becomes a zero-dependency process that ships as JavaScript modules with unit tests.
Operators can mechanically check containment without building custom verifiers for each skill.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same layering could be applied to skill manifests in agent systems that use different runtimes.
Running the methods on a larger corpus of real skills would reveal practical bounds on verification time.
Incorporating refusal behaviour directly into the model checking layer would eliminate the residual case.

Load-bearing premise

The chosen semantics for skill behaviour matches how an LLM-driven runtime actually consumes the skill.

What would settle it

A skill whose script passes all three layers yet produces a capability violation inside the runtime's transaction-buffer horizon.

read the original abstract

The companion paper introduced a four-level verification lattice on agent-skill manifests (unverified, declared, tested, formal) and left the top level aspirational. This paper closes that gap. We give a precise semantics for skill behaviour faithful to how a skill is consumed by an LLM-driven runtime (a deterministic script-side reachable through a non-deterministic LLM-side), state the verification problem as a capability-containment property over that semantics, and present three composable methods that together raise a skill from declared or tested to formal: (1) sound static capability-containment analysis of the script-side via abstract interpretation over a small effect lattice; (2) a refinement type system for tool-call envelopes that mechanically rejects any call whose statically-inferred capability is not in the manifest's declared set; (3) SMT-bounded model checking against the parent paper's biconditional correctness criterion, with the bound chosen so any counter-example fitting the runtime's transaction-buffer horizon is exhibited as a concrete trace. We prove the three layers composed soundly cover the parent paper's threat model modulo a single residual (the LLM's freedom to refuse to act) that the parent paper's runtime biconditional catches at session boundary. The methods reuse existing well-engineered tools (Z3, Semgrep, CodeQL, refinement-type checkers, mechanised proof assistants) rather than asking operators to build new ones, and the proof-carrying artifact extends the existing SKILL.md convention. All three methods plus the bundle producer and re-checker ship as zero-dependency JavaScript modules in the open-source enclawed framework (https://github.com/metereconsulting/enclawed; project page https://www.enclawed.com/), with 53 unit tests and an end-to-end CLI demo on a sample skill.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper delivers three practical, tool-reusing layers for mechanically checking capability containment on agent skills, with shipped code, but the semantics-to-runtime link is the key assumption to verify.

read the letter

This paper closes the formal tier of the companion lattice by defining a semantics for skills that splits deterministic script behavior from non-deterministic LLM choice, then giving three layers—abstract interpretation over an effect lattice, refinement types on tool envelopes, and SMT-bounded checking against the biconditional—that compose into a soundness claim covering the threat model except for refusal.

It does the useful work of making the methods immediately runnable: they call Z3, Semgrep, CodeQL and existing refinement checkers, extend SKILL.md, and ship as zero-dependency JS modules with 53 tests plus a CLI demo. That turns an aspirational level into something an operator can apply without building new infrastructure.

The soft spot is the faithfulness assumption. The containment proof only transfers if the model accurately captures how an LLM runtime actually selects and executes tool calls within the transaction buffer. The abstract asserts this without showing the encoding details or any trace validation against real LLM outputs, so the central claim rests on that modeling step. If the non-determinism is under-approximated, the proved property does not reach the deployed system.

The citation pattern is clean and the work stays within the prior threat model rather than inflating scope. This is for people already working on verifiable agent manifests who want concrete next steps beyond testing. It shows clear engagement with the lattice and supplies falsifiable artifacts.

I would send it to peer review; the combination of new methods, composition proof, and open code is enough to justify referee time even if the semantics section needs tightening.

Referee Report

1 major / 2 minor

Summary. The paper claims to close the aspirational top level of a four-level verification lattice for agent-skill manifests by defining a precise semantics for skill behaviour (deterministic script-side reachable via non-deterministic LLM-side), formulating the problem as a capability-containment property, and presenting three composable methods—sound abstract interpretation over an effect lattice, a refinement type system for tool-call envelopes, and SMT-bounded model checking against the parent paper's biconditional—that together yield a mechanically checkable proof covering the parent threat model modulo only the LLM refusal residual. The methods reuse existing tools (Z3, Semgrep, etc.), extend the SKILL.md convention with proof-carrying artifacts, and are shipped as zero-dependency JS modules with 53 unit tests and an end-to-end demo.

Significance. If the central soundness claim holds, the work supplies a concrete, tool-reusing route to formal capability containment for LLM-driven agent skills without requiring operators to develop new verifiers. Strengths include the explicit reuse of well-engineered off-the-shelf components, the open-source release with unit tests and CLI demo, and the machine-checkable nature of the three-layer composition; these elements support reproducibility and practical adoption if the faithfulness assumption is adequately justified.

major comments (1)

[Semantics (abstract and § on semantics)] Semantics section (as described in the abstract and introduction): the manuscript asserts that the provided semantics is 'faithful to how a skill is consumed by an LLM-driven runtime' and that the three layers 'compose soundly' to cover the threat model modulo the refusal residual, but supplies no explicit encoding of LLM-side non-determinism (tool-call envelope selection, refusal interaction with the transaction buffer) nor any validation against observed LLM traces. This assumption is load-bearing for the transfer of the containment property to the actual runtime.

minor comments (2)

[Abstract / Introduction] The abstract refers to 'the parent paper' without a numbered citation; the introduction should include an explicit reference to the companion work when stating the threat model and biconditional criterion.
[Implementation / § on methods] The claim that the methods 'reuse existing well-engineered tools' is repeated; a single consolidated table listing each tool, the layer that uses it, and the precise guarantee it supplies would improve clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and constructive review. The single major comment identifies a genuine gap in the explicit treatment of LLM-side non-determinism within the semantics section. We address it below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Semantics (abstract and § on semantics)] Semantics section (as described in the abstract and introduction): the manuscript asserts that the provided semantics is 'faithful to how a skill is consumed by an LLM-driven runtime' and that the three layers 'compose soundly' to cover the threat model modulo the refusal residual, but supplies no explicit encoding of LLM-side non-determinism (tool-call envelope selection, refusal interaction with the transaction buffer) nor any validation against observed LLM traces. This assumption is load-bearing for the transfer of the containment property to the actual runtime.

Authors: We agree that the manuscript would be strengthened by an explicit encoding of the non-deterministic LLM-side choices. In the revised version we will add a dedicated subsection to the semantics section that defines (i) the non-deterministic selection of tool-call envelopes and (ii) the interaction of refusal decisions with the transaction buffer, both expressed as nondeterministic operators over the deterministic script-side transition relation. The three verification layers will then be shown to be sound with respect to this extended semantics. Regarding validation against observed LLM traces, the semantics is intentionally derived from the runtime biconditional of the parent paper rather than from new empirical data; we will add a short discussion clarifying this derivation and noting that direct trace validation lies outside the scope of the present formal-methods contribution. These changes will make the load-bearing faithfulness assumption fully explicit without altering the core technical claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper states a semantics for skill behaviour as an explicit modelling assumption ('faithful to how a skill is consumed by an LLM-driven runtime (a deterministic script-side reachable through a non-deterministic LLM-side)'), defines the verification problem over that semantics, and supplies three independent methods (abstract interpretation, refinement types, SMT model checking) that reuse off-the-shelf tools. The composition proof is claimed to cover the parent paper's threat model modulo one residual; the parent reference supplies the target biconditional and threat model but does not reduce the present methods or their soundness argument to a self-citation chain or to quantities defined by the inputs. No equations, fitted parameters, or ansatzes are shown that would make any 'prediction' or 'result' equivalent to its inputs by construction. The work is therefore self-contained against external benchmarks (Z3, Semgrep, etc.).

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no information is available on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5881 in / 1167 out tokens · 38902 ms · 2026-06-30T22:55:59.632133+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 1 linked inside Pith

[1]

Cedar policy language

Amazon Web Services. Cedar policy language. https://www.cedarpolicy.com/, 2024. Formally-verified policy language for runtime authorisation; paired with a Lean/Rust ver- ification effort

2024
[2]

AgentHarm: A benchmark for measuring harmfulness of LLM agents

Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Bailey, Yinzhi Kang, Kellin Pelrine, Anton Albert, Fabien Roger, Tim Riedel, Alexandre Lacoste, Anastasiia Skripko, Yixin Wang, Jiaxin Ji, Christopher Berner, Mary Phuong, Hongseok Liu, Iain Murray, Jamie Hayes, Helen Toner, David Krueger,...

2025
[3]

Claude Skills: User-defined capabilities for Claude

Anthropic. Claude Skills: User-defined capabilities for Claude. https://www.anthropic. com/news/skills, 2024. Production skill artefact format with SKILL.md convention; human curation, no formal-property attestation. 25

2024
[4]

Modelcontextprotocol(MCP)specification

Anthropic. Modelcontextprotocol(MCP)specification. https://modelcontextprotocol.io/,
[5]

Open protocol for connecting AI assistants to tools, data sources, and skill-like artefacts
[6]

Andrew W. Appel. Foundational proof-carrying code. InProceedings of LICS, 2001

2001
[7]

Constitutional AI: Harmlessness from AI feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, AnnaChen, AnnaGoldie, AzaliaMirhoseini, CameronMcKinnon, CarolChen, CatherineOlsson, Christopher Olah, Danny Hernandez, Dawn Drain, et al. Constitutional AI: Harmlessness from AI feedback. InarXiv:2212.08073, 2022. Training-time alignment via principle-derived self-cr...

Pith/arXiv arXiv 2022
[8]

Model checking probabilistic systems.Handbook of Model Checking, Chapter 28, 2018

Christel Baier, Luca de Alfaro, Vojtěch Forejt, and Marta Kwiatkowska. Model checking probabilistic systems.Handbook of Model Checking, Chapter 28, 2018

2018
[9]

Introduction to runtime verification.Lectures on Runtime Verification, LNCS 10457, 2018

Ezio Bartocci, Yliès Falcone, Adrian Francalanza, and Giles Reger. Introduction to runtime verification.Lectures on Runtime Verification, LNCS 10457, 2018

2018
[10]

Clarke, and Yunshan Zhu

Armin Biere, Alessandro Cimatti, Edmund M. Clarke, and Yunshan Zhu. Symbolic model checking without BDDs.Proceedings of TACAS, 1999

1999
[11]

Wasmtime: A standalone WebAssembly runtime with capability-based sandboxing

Bytecode Alliance. Wasmtime: A standalone WebAssembly runtime with capability-based sandboxing. InBytecode Alliance technical report, 2024.https://wasmtime.dev/

2024
[12]

Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs.Proceedings of OSDI, 2008

2008
[13]

AgentPoison: Red-teaming LLM agents via poisoning memory or knowledge bases.arXiv preprint arXiv:2407.12784, 2025

Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. AgentPoison: Red-teaming LLM agents via poisoning memory or knowledge bases.arXiv preprint arXiv:2407.12784, 2025

arXiv 2025
[14]

Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints

Patrick Cousot and Radhia Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. InProceedings of POPL, 1977

1977
[15]

Cover and Joy A

Thomas M. Cover and Joy A. Thomas.Elements of Information Theory. Wiley-Interscience, 2nd edition, 2006

2006
[16]

Z3: An efficient SMT solver

Leonardo de Moura and Nikolaj Bjørner. Z3: An efficient SMT solver. InProceedings of TACAS, 2008

2008
[17]

Dorothy E. Denning. A lattice model of secure information flow. volume 19, pages 236–243, 1976

1976
[18]

Formalizing the LLM as a probabilistic transducer for verification under uncertain inputs

Yiyang Dong, Vicente Calvo Quintela, Yu Tang, Jiaxiong Liu, Yuhua Cao, and Yang Liu. Formalizing the LLM as a probabilistic transducer for verification under uncertain inputs. In Proceedings of the 1st International Workshop on Formal Methods for LLM-based Systems, 2025

2025
[19]

GenAI against humanity: Nefarious applications of generative artificial intelligence and large language models.Journal of Computational Social Science, 7:549–569, 2024

Emilio Ferrara. GenAI against humanity: Nefarious applications of generative artificial intelligence and large language models.Journal of Computational Social Science, 7:549–569, 2024

2024
[20]

CodeQL: Variant analysis for software security, 2025.https://codeql.github.com/

GitHub. CodeQL: Variant analysis for software security, 2025.https://codeql.github.com/. 26

2025
[21]

DART: Directed automated random testing

Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: Directed automated random testing. Proceedings of PLDI, 2005

2005
[22]

Gary A. Kildall. A unified approach to global program optimization. InProceedings of POPL, 1973

1973
[23]

PRISM4.0: Verificationofprobabilistic real-time systems

MartaKwiatkowska, GethinNorman, andDavidParker. PRISM4.0: Verificationofprobabilistic real-time systems. InProceedings of CAV, 2011

2011
[24]

LangChain: Build context-aware reasoning applications

LangChain, Inc. LangChain: Build context-aware reasoning applications. https://www. langchain.com/, 2024. Agent-orchestration framework with tool integration; no static-analysis or proof layer

2024
[25]

Formal verification of a realistic compiler.Communications of the ACM, 52(7): 107–115, 2009

Xavier Leroy. Formal verification of a realistic compiler.Communications of the ACM, 52(7): 107–115, 2009

2009
[26]

Sigstore: A new standard for signing, verifying, and protecting software

Linux Foundation, Sigstore Project. Sigstore: A new standard for signing, verifying, and protecting software. https://www.sigstore.dev/, 2024. Keyless signing infrastructure for software artefacts; attests provenance, not behaviour

2024
[27]

Type inference for refinements and effects.Proceedings of POPL, 2015

Luísa Lourenço and Luís Caires. Type inference for refinements and effects.Proceedings of POPL, 2015

2015
[28]

Lucassen and David K

John M. Lucassen and David K. Gifford. Polymorphic effect systems.Proceedings of POPL, 1988

1988
[29]

enclawed: A configurable, sector-neutral hardening framework for personal-class ai assistant gateways, 2026

Alfredo Metere. enclawed: A configurable, sector-neutral hardening framework for personal-class ai assistant gateways, 2026. Open-source reference implementation. https://github.com/metereconsulting/enclawed

2026
[30]

Skills as verifiable artifacts: A trust schema and a biconditional correctness criterion for human-in-the-loop agent runtimes, 2026

Alfredo Metere. Skills as verifiable artifacts: A trust schema and a biconditional correctness criterion for human-in-the-loop agent runtimes, 2026. URLhttps://arxiv.org/pdf/2605. 00424v1. Companion paper

2026
[31]

enclawed project page.https://www.enclawed.com/, 2026

Metere Consulting, LLC. enclawed project page.https://www.enclawed.com/, 2026. Docu- mentation, downloads, and the open-source distribution of the enclawed framework

2026
[32]

Pyright: Static type checker for Python, 2025.https://github.com/microsoft/ pyright

Microsoft. Pyright: Static type checker for Python, 2025.https://github.com/microsoft/ pyright

2025
[33]

AutoGen: Multi-agent conversation framework

Microsoft Research. AutoGen: Multi-agent conversation framework. https://microsoft. github.io/autogen/, 2024. Multi-agent orchestration; tool calls go through agent conversa- tions, no formal-property attestation

2024
[34]

Miller.Robust composition: Towards a unified approach to access control and concur- rency control

Mark S. Miller.Robust composition: Towards a unified approach to access control and concur- rency control. PhD thesis, Johns Hopkins University, 2006

2006
[35]

ATLAS: Adversarial threat landscape for ai systems, 2024

MITRE Corporation. ATLAS: Adversarial threat landscape for ai systems, 2024. https: //atlas.mitre.org/

2024
[36]

George C. Necula. Proof-carrying code. InProceedings of POPL, 1997. 27

1997
[37]

Supply-chain levels for software artefacts (SLSA).https: //slsa.dev/, 2024

Open Source Security Foundation. Supply-chain levels for software artefacts (SLSA).https: //slsa.dev/, 2024. Tiered supply-chain integrity framework (Levels 0–3); a build-process attestation, not a behavioural one

2024
[38]

Function calling and tool use

OpenAI. Function calling and tool use. https://platform.openai.com/docs/guides/ function-calling, 2024. JSON-Schema-typed function declarations for LLM tool invocation; the closest production-deployed peer to skill manifests

2024
[39]

GPTs and the GPT store

OpenAI. GPTs and the GPT store. https://openai.com/blog/introducing-gpts, 2024. Custom-GPT marketplace; review-based governance, no formal containment proof

2024
[40]

LLM AI cybersecurity & governance checklist (top 10)

OWASP Foundation. LLM AI cybersecurity & governance checklist (top 10). OWASP, 2024

2024
[41]

Semgrep: Lightweight static analysis for many languages, 2025.https://semgrep.dev/

r2c. Semgrep: Lightweight static analysis for many languages, 2025.https://semgrep.dev/

2025
[42]

Springer Series in Synergetics

Hannes Risken.The Fokker–Planck Equation: Methods of Solution and Applications. Springer Series in Synergetics. Springer, 2nd edition, 1996

1996
[43]

Alexander Robey, Eric Wong, Hamed Hassani, and George J. Pappas. SmoothLLM: Defending large language models against jailbreaking attacks. InProceedings of NeurIPS, 2023

2023
[44]

Rondon, Ming Kawaguchi, and Ranjit Jhala

Patrick M. Rondon, Ming Kawaguchi, and Ranjit Jhala. Liquid types. InProceedings of PLDI, 2008

2008
[45]

Andrei Sabelfeld and Andrew C. Myers. Language-based information-flow security.IEEE Journal on Selected Areas in Communications, 21(1):5–19, 2003

2003
[46]

The update framework (TUF) specification, 2024

Justin Samuel and Justin Cappos. The update framework (TUF) specification, 2024. Specifica- tion version 1.0.x

2024
[47]

Open policy agent (OPA): Policy as code

Tim Sandall. Open policy agent (OPA): Policy as code. InProceedings of KubeCon / Cloud- Native Computing Foundation, 2018.https://www.openpolicyagent.org/

2018
[48]

Claude E. Shannon. A mathematical theory of communication.Bell System Technical Journal, 27(3):379–423, 623–656, 1948

1948
[49]

Reasoning about a machine with local capabilities

Lau Skorstengaard, Dominique Devriese, and Lars Birkedal. Reasoning about a machine with local capabilities. InProceedings of POPL, 2018

2018
[50]

Strogatz.Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering

Steven H. Strogatz.Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering. Westview Press, 2nd edition, 2014

2014
[51]

Secure distributed programming with value-dependent types

Nikhil Swamy, Juan Chen, Cédric Fournet, Pierre-Yves Strub, Karthikeyan Bhargavan, and Jean Yang. Secure distributed programming with value-dependent types. InProceedings of ICFP, 2011

2011
[52]

The object-capability discipline, 2025.http://erights.org/ elib/capability/ode/index.html

The Object-Capability Wiki. The object-capability discipline, 2025.http://erights.org/ elib/capability/ode/index.html

2025
[53]

in-toto: Providing farm-to-table guarantees for bits and bytes

Santiago Torres-Arias, Hammad Afzali, Trishank Karthik Kuppusamy, Reza Curtmola, and Justin Cappos. in-toto: Providing farm-to-table guarantees for bits and bytes. https:// in-toto.io/, 2019. Provenance-tracking framework underlying SLSA. 28

2019
[54]

Seidel, Ranjit Jhala, Dimitrios Vytiniotis, and Simon Peyton Jones

Niki Vazou, Eric L. Seidel, Ranjit Jhala, Dimitrios Vytiniotis, and Simon Peyton Jones. Refinement types for Haskell.Proceedings of ICFP, 2014

2014
[55]

A sound type system for secure flow analysis.Journal of Computer Security, 4(2–3):167–187, 1996

Dennis Volpano, Geoffrey Smith, and Cynthia Irvine. A sound type system for secure flow analysis.Journal of Computer Security, 4(2–3):167–187, 1996

1996
[56]

The WebAssembly system interface (WASI), 2024.https: //wasi.dev/

WebAssembly Community Group. The WebAssembly system interface (WASI), 2024.https: //wasi.dev/. 29

2024

[1] [1]

Cedar policy language

Amazon Web Services. Cedar policy language. https://www.cedarpolicy.com/, 2024. Formally-verified policy language for runtime authorisation; paired with a Lean/Rust ver- ification effort

2024

[2] [2]

AgentHarm: A benchmark for measuring harmfulness of LLM agents

Maksym Andriushchenko, Alexandra Souly, Mateusz Dziemian, Derek Duenas, Maxwell Lin, Justin Wang, Dan Hendrycks, Andy Bailey, Yinzhi Kang, Kellin Pelrine, Anton Albert, Fabien Roger, Tim Riedel, Alexandre Lacoste, Anastasiia Skripko, Yixin Wang, Jiaxin Ji, Christopher Berner, Mary Phuong, Hongseok Liu, Iain Murray, Jamie Hayes, Helen Toner, David Krueger,...

2025

[3] [3]

Claude Skills: User-defined capabilities for Claude

Anthropic. Claude Skills: User-defined capabilities for Claude. https://www.anthropic. com/news/skills, 2024. Production skill artefact format with SKILL.md convention; human curation, no formal-property attestation. 25

2024

[4] [4]

Modelcontextprotocol(MCP)specification

Anthropic. Modelcontextprotocol(MCP)specification. https://modelcontextprotocol.io/,

[5] [5]

Open protocol for connecting AI assistants to tools, data sources, and skill-like artefacts

[6] [6]

Andrew W. Appel. Foundational proof-carrying code. InProceedings of LICS, 2001

2001

[7] [7]

Constitutional AI: Harmlessness from AI feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, AnnaChen, AnnaGoldie, AzaliaMirhoseini, CameronMcKinnon, CarolChen, CatherineOlsson, Christopher Olah, Danny Hernandez, Dawn Drain, et al. Constitutional AI: Harmlessness from AI feedback. InarXiv:2212.08073, 2022. Training-time alignment via principle-derived self-cr...

Pith/arXiv arXiv 2022

[8] [8]

Model checking probabilistic systems.Handbook of Model Checking, Chapter 28, 2018

Christel Baier, Luca de Alfaro, Vojtěch Forejt, and Marta Kwiatkowska. Model checking probabilistic systems.Handbook of Model Checking, Chapter 28, 2018

2018

[9] [9]

Introduction to runtime verification.Lectures on Runtime Verification, LNCS 10457, 2018

Ezio Bartocci, Yliès Falcone, Adrian Francalanza, and Giles Reger. Introduction to runtime verification.Lectures on Runtime Verification, LNCS 10457, 2018

2018

[10] [10]

Clarke, and Yunshan Zhu

Armin Biere, Alessandro Cimatti, Edmund M. Clarke, and Yunshan Zhu. Symbolic model checking without BDDs.Proceedings of TACAS, 1999

1999

[11] [11]

Wasmtime: A standalone WebAssembly runtime with capability-based sandboxing

Bytecode Alliance. Wasmtime: A standalone WebAssembly runtime with capability-based sandboxing. InBytecode Alliance technical report, 2024.https://wasmtime.dev/

2024

[12] [12]

Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs.Proceedings of OSDI, 2008

2008

[13] [13]

AgentPoison: Red-teaming LLM agents via poisoning memory or knowledge bases.arXiv preprint arXiv:2407.12784, 2025

Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, and Bo Li. AgentPoison: Red-teaming LLM agents via poisoning memory or knowledge bases.arXiv preprint arXiv:2407.12784, 2025

arXiv 2025

[14] [14]

Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints

Patrick Cousot and Radhia Cousot. Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. InProceedings of POPL, 1977

1977

[15] [15]

Cover and Joy A

Thomas M. Cover and Joy A. Thomas.Elements of Information Theory. Wiley-Interscience, 2nd edition, 2006

2006

[16] [16]

Z3: An efficient SMT solver

Leonardo de Moura and Nikolaj Bjørner. Z3: An efficient SMT solver. InProceedings of TACAS, 2008

2008

[17] [17]

Dorothy E. Denning. A lattice model of secure information flow. volume 19, pages 236–243, 1976

1976

[18] [18]

Formalizing the LLM as a probabilistic transducer for verification under uncertain inputs

Yiyang Dong, Vicente Calvo Quintela, Yu Tang, Jiaxiong Liu, Yuhua Cao, and Yang Liu. Formalizing the LLM as a probabilistic transducer for verification under uncertain inputs. In Proceedings of the 1st International Workshop on Formal Methods for LLM-based Systems, 2025

2025

[19] [19]

GenAI against humanity: Nefarious applications of generative artificial intelligence and large language models.Journal of Computational Social Science, 7:549–569, 2024

Emilio Ferrara. GenAI against humanity: Nefarious applications of generative artificial intelligence and large language models.Journal of Computational Social Science, 7:549–569, 2024

2024

[20] [20]

CodeQL: Variant analysis for software security, 2025.https://codeql.github.com/

GitHub. CodeQL: Variant analysis for software security, 2025.https://codeql.github.com/. 26

2025

[21] [21]

DART: Directed automated random testing

Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: Directed automated random testing. Proceedings of PLDI, 2005

2005

[22] [22]

Gary A. Kildall. A unified approach to global program optimization. InProceedings of POPL, 1973

1973

[23] [23]

PRISM4.0: Verificationofprobabilistic real-time systems

MartaKwiatkowska, GethinNorman, andDavidParker. PRISM4.0: Verificationofprobabilistic real-time systems. InProceedings of CAV, 2011

2011

[24] [24]

LangChain: Build context-aware reasoning applications

LangChain, Inc. LangChain: Build context-aware reasoning applications. https://www. langchain.com/, 2024. Agent-orchestration framework with tool integration; no static-analysis or proof layer

2024

[25] [25]

Formal verification of a realistic compiler.Communications of the ACM, 52(7): 107–115, 2009

Xavier Leroy. Formal verification of a realistic compiler.Communications of the ACM, 52(7): 107–115, 2009

2009

[26] [26]

Sigstore: A new standard for signing, verifying, and protecting software

Linux Foundation, Sigstore Project. Sigstore: A new standard for signing, verifying, and protecting software. https://www.sigstore.dev/, 2024. Keyless signing infrastructure for software artefacts; attests provenance, not behaviour

2024

[27] [27]

Type inference for refinements and effects.Proceedings of POPL, 2015

Luísa Lourenço and Luís Caires. Type inference for refinements and effects.Proceedings of POPL, 2015

2015

[28] [28]

Lucassen and David K

John M. Lucassen and David K. Gifford. Polymorphic effect systems.Proceedings of POPL, 1988

1988

[29] [29]

enclawed: A configurable, sector-neutral hardening framework for personal-class ai assistant gateways, 2026

Alfredo Metere. enclawed: A configurable, sector-neutral hardening framework for personal-class ai assistant gateways, 2026. Open-source reference implementation. https://github.com/metereconsulting/enclawed

2026

[30] [30]

Skills as verifiable artifacts: A trust schema and a biconditional correctness criterion for human-in-the-loop agent runtimes, 2026

Alfredo Metere. Skills as verifiable artifacts: A trust schema and a biconditional correctness criterion for human-in-the-loop agent runtimes, 2026. URLhttps://arxiv.org/pdf/2605. 00424v1. Companion paper

2026

[31] [31]

enclawed project page.https://www.enclawed.com/, 2026

Metere Consulting, LLC. enclawed project page.https://www.enclawed.com/, 2026. Docu- mentation, downloads, and the open-source distribution of the enclawed framework

2026

[32] [32]

Pyright: Static type checker for Python, 2025.https://github.com/microsoft/ pyright

Microsoft. Pyright: Static type checker for Python, 2025.https://github.com/microsoft/ pyright

2025

[33] [33]

AutoGen: Multi-agent conversation framework

Microsoft Research. AutoGen: Multi-agent conversation framework. https://microsoft. github.io/autogen/, 2024. Multi-agent orchestration; tool calls go through agent conversa- tions, no formal-property attestation

2024

[34] [34]

Miller.Robust composition: Towards a unified approach to access control and concur- rency control

Mark S. Miller.Robust composition: Towards a unified approach to access control and concur- rency control. PhD thesis, Johns Hopkins University, 2006

2006

[35] [35]

ATLAS: Adversarial threat landscape for ai systems, 2024

MITRE Corporation. ATLAS: Adversarial threat landscape for ai systems, 2024. https: //atlas.mitre.org/

2024

[36] [36]

George C. Necula. Proof-carrying code. InProceedings of POPL, 1997. 27

1997

[37] [37]

Supply-chain levels for software artefacts (SLSA).https: //slsa.dev/, 2024

Open Source Security Foundation. Supply-chain levels for software artefacts (SLSA).https: //slsa.dev/, 2024. Tiered supply-chain integrity framework (Levels 0–3); a build-process attestation, not a behavioural one

2024

[38] [38]

Function calling and tool use

OpenAI. Function calling and tool use. https://platform.openai.com/docs/guides/ function-calling, 2024. JSON-Schema-typed function declarations for LLM tool invocation; the closest production-deployed peer to skill manifests

2024

[39] [39]

GPTs and the GPT store

OpenAI. GPTs and the GPT store. https://openai.com/blog/introducing-gpts, 2024. Custom-GPT marketplace; review-based governance, no formal containment proof

2024

[40] [40]

LLM AI cybersecurity & governance checklist (top 10)

OWASP Foundation. LLM AI cybersecurity & governance checklist (top 10). OWASP, 2024

2024

[41] [41]

Semgrep: Lightweight static analysis for many languages, 2025.https://semgrep.dev/

r2c. Semgrep: Lightweight static analysis for many languages, 2025.https://semgrep.dev/

2025

[42] [42]

Springer Series in Synergetics

Hannes Risken.The Fokker–Planck Equation: Methods of Solution and Applications. Springer Series in Synergetics. Springer, 2nd edition, 1996

1996

[43] [43]

Alexander Robey, Eric Wong, Hamed Hassani, and George J. Pappas. SmoothLLM: Defending large language models against jailbreaking attacks. InProceedings of NeurIPS, 2023

2023

[44] [44]

Rondon, Ming Kawaguchi, and Ranjit Jhala

Patrick M. Rondon, Ming Kawaguchi, and Ranjit Jhala. Liquid types. InProceedings of PLDI, 2008

2008

[45] [45]

Andrei Sabelfeld and Andrew C. Myers. Language-based information-flow security.IEEE Journal on Selected Areas in Communications, 21(1):5–19, 2003

2003

[46] [46]

The update framework (TUF) specification, 2024

Justin Samuel and Justin Cappos. The update framework (TUF) specification, 2024. Specifica- tion version 1.0.x

2024

[47] [47]

Open policy agent (OPA): Policy as code

Tim Sandall. Open policy agent (OPA): Policy as code. InProceedings of KubeCon / Cloud- Native Computing Foundation, 2018.https://www.openpolicyagent.org/

2018

[48] [48]

Claude E. Shannon. A mathematical theory of communication.Bell System Technical Journal, 27(3):379–423, 623–656, 1948

1948

[49] [49]

Reasoning about a machine with local capabilities

Lau Skorstengaard, Dominique Devriese, and Lars Birkedal. Reasoning about a machine with local capabilities. InProceedings of POPL, 2018

2018

[50] [50]

Strogatz.Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering

Steven H. Strogatz.Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering. Westview Press, 2nd edition, 2014

2014

[51] [51]

Secure distributed programming with value-dependent types

Nikhil Swamy, Juan Chen, Cédric Fournet, Pierre-Yves Strub, Karthikeyan Bhargavan, and Jean Yang. Secure distributed programming with value-dependent types. InProceedings of ICFP, 2011

2011

[52] [52]

The object-capability discipline, 2025.http://erights.org/ elib/capability/ode/index.html

The Object-Capability Wiki. The object-capability discipline, 2025.http://erights.org/ elib/capability/ode/index.html

2025

[53] [53]

in-toto: Providing farm-to-table guarantees for bits and bytes

Santiago Torres-Arias, Hammad Afzali, Trishank Karthik Kuppusamy, Reza Curtmola, and Justin Cappos. in-toto: Providing farm-to-table guarantees for bits and bytes. https:// in-toto.io/, 2019. Provenance-tracking framework underlying SLSA. 28

2019

[54] [54]

Seidel, Ranjit Jhala, Dimitrios Vytiniotis, and Simon Peyton Jones

Niki Vazou, Eric L. Seidel, Ranjit Jhala, Dimitrios Vytiniotis, and Simon Peyton Jones. Refinement types for Haskell.Proceedings of ICFP, 2014

2014

[55] [55]

A sound type system for secure flow analysis.Journal of Computer Security, 4(2–3):167–187, 1996

Dennis Volpano, Geoffrey Smith, and Cynthia Irvine. A sound type system for secure flow analysis.Journal of Computer Security, 4(2–3):167–187, 1996

1996

[56] [56]

The WebAssembly system interface (WASI), 2024.https: //wasi.dev/

WebAssembly Community Group. The WebAssembly system interface (WASI), 2024.https: //wasi.dev/. 29

2024