LLM2Ltac mines symbolic tactics from 11,725 Coq theorems using LLMs and integrates them into CoqHammer, improving proof rates by 23.87% on 6,199 theorems from four large verification projects.
seL4: formal verification of an OS kernel,
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
AutoSOUP automates component-level memory-safety verification by generating Safety-Oriented Unit Proofs via three techniques and a hybrid LLM-plus-program-synthesis architecture called LLM-As-Function-Call.
WybeCoder interleaves code generation, invariant synthesis, and proof construction to produce verified imperative programs, solving 74% of Verina tasks and 62% of Clever tasks while surpassing prior results.
AutoRocq is an LLM agent that learns proofs on-the-fly by collaborating with the Rocq prover to verify programs on SV-COMP benchmarks and Linux kernel modules.
Human-Certified Module Repositories (HCMRs) are proposed as a new architectural model blending human oversight with automated analysis to certify reusable software modules for safe assembly by humans and AI agents.
citing papers explorer
-
A Learning Method for Symbolic Systems Using Large Language Models
LLM2Ltac mines symbolic tactics from 11,725 Coq theorems using LLMs and integrates them into CoqHammer, improving proof rates by 23.87% on 6,199 theorems from four large verification projects.
-
AutoSOUP: Safety-Oriented Unit Proof Generation for Component-level Memory-Safety Verification
AutoSOUP automates component-level memory-safety verification by generating Safety-Oriented Unit Proofs via three techniques and a hybrid LLM-plus-program-synthesis architecture called LLM-As-Function-Call.
-
WybeCoder: Verified Imperative Code Generation
WybeCoder interleaves code generation, invariant synthesis, and proof construction to produce verified imperative programs, solving 74% of Verina tasks and 62% of Clever tasks while surpassing prior results.
-
Agentic Verification of Software Systems
AutoRocq is an LLM agent that learns proofs on-the-fly by collaborating with the Rocq prover to verify programs on SV-COMP benchmarks and Linux kernel modules.
-
Human-Certified Module Repositories for the AI Age
Human-Certified Module Repositories (HCMRs) are proposed as a new architectural model blending human oversight with automated analysis to certify reusable software modules for safe assembly by humans and AI agents.
- Algebraic Semantics of Governed Execution: Monoidal Categories, Effect Algebras, and Coterminous Boundaries
- Effect-Transparent Governance for AI Workflow Architectures: Semantic Preservation, Expressive Minimality, and Decidability Boundaries