KVerus: Scalable and Resilient Formal Verification Proof Generation for Rust Code

Lin Huang; Minghua Wang; Tao Wei; Xinyi Wan; Yanhao Wang; Yuwei Liu

arxiv: 2605.03822 · v1 · submitted 2026-05-05 · 💻 cs.SE · cs.CR

KVerus: Scalable and Resilient Formal Verification Proof Generation for Rust Code

Yuwei Liu , Xinyi Wan , Yanhao Wang , Minghua Wang , Lin Huang , Tao Wei This is my paper

Pith reviewed 2026-05-07 15:47 UTC · model grok-4.3

classification 💻 cs.SE cs.CR

keywords formal verificationRustproof generationdynamic knowledge baseself-adaptive verificationcross-file dependencieslemma indexingproof repair

0 comments

The pith

KVerus builds a dynamic knowledge base to generate and repair formal proofs for Rust code across changing dependencies and toolchains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents KVerus as a retrieval-augmented system for Verus-based formal verification of Rust programs. It identifies the Semantic-Structural Gap as the core barrier that prevents standard language models from producing reliable proofs in real evolving codebases. KVerus counters this by automatically constructing and updating a knowledge base that combines program dependency analysis, semantic indexing of lemmas, and error-driven refinement loops. This lets the system synthesize proofs for cross-file dependencies and fix them when the code or verification environment changes. The approach is evaluated on both isolated functions and full repository modules, with additional results from applying it to an operating-system kernel.

Core claim

KVerus constructs a dynamic knowledge base of code metadata, lemma semantics, and toolchain specifics. By combining dependency-aware program analysis, semantic lemma indexing, and error-driven self-refinement, it navigates intricate cross-file dependencies to synthesize proofs and automatically repair proofs when faced with common evolutionary changes.

What carries the argument

Dynamic knowledge base built from dependency analysis, semantic lemma indexing, and error-driven self-refinement, which supports self-adaptive proof synthesis and repair across module boundaries.

If this is right

Verifies 80.2 percent of tasks on single-file benchmarks, compared with 56.9 percent for the prior AutoVerus system.
Reaches 51.0 percent success on repository-level benchmarks containing cross-file dependencies, versus 4.5 percent for multi-round prompting.
Produces upstream-accepted proofs for 23 previously unverified functions representing 21 percent of the proof code in a kernel memory-management module.
Degrades less than prior methods when the verification toolchain receives breaking updates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same dependency-aware indexing and repair loop could be reused with other formal verifiers that expose similar error messages and lemma libraries.
Teams maintaining verified libraries would spend less time on proof repair after routine refactors if the knowledge base is kept current.
Extending the base to track performance or security invariants alongside functional proofs could support combined assurance arguments in one workflow.

Load-bearing premise

The assumption that analysis of dependencies and past errors will keep producing a knowledge base sufficient to bridge semantic code patterns with rigid structural proof requirements even as modules and toolchains continue to evolve.

What would settle it

A sequence of cross-module code changes that introduce new structural dependencies not captured by the existing analysis or indexing, after which KVerus repeatedly fails to produce or repair correct proofs without manual lemma additions.

Figures

Figures reproduced from arXiv: 2605.03822 by Lin Huang, Minghua Wang, Tao Wei, Xinyi Wan, Yanhao Wang, Yuwei Liu.

**Figure 1.** Figure 1: The workflow of KVerus. 3.1 Preprocessor: Code and Lemma Preprocessing The Preprocessor module analyzes the source code of the target verification library to construct a metadata dependency graph and to index lemma functions for subsequent retrieval and proof synthesis. Metadata Dependency Graph. We build metadata dependency graph as a typed directed graph over Verus/Rust language objects. Each node is a … view at source ↗

**Figure 2.** Figure 2: Error types after the refinement view at source ↗

read the original abstract

Formal verification provides the highest assurance of software correctness and security, but its application to large-scale, evolving systems remains a major challenge. While large language models (LLMs) have shown promise in automating proof generation, they often fail in real-world settings due to their inability to handle complex cross-module dependencies or changes in the codebase or the verification toolchain. We identify the fundamental problem as the Semantic-Structural Gap: LLMs operate on semantic code patterns, whereas formal verification is governed by rigid structural dependencies, a disconnect that leads to brittle, unsustainable proofs. To bridge this gap, we propose a new paradigm of self-adaptive verification and present KVerus, a retrieval-augmented system for Verus-based Rust verification that can adapt to an evolving software environment. KVerus constructs a dynamic knowledge base of code metadata, lemma semantics, and toolchain specifics. By combining dependency-aware program analysis, semantic lemma indexing, and error-driven self-refinement, it can navigate intricate cross-file dependencies to synthesize proofs and automatically repair proofs when faced with common evolutionary changes. Across three single-file benchmarks, KVerus verifies 80.2% of tasks, outperforming the state-of-the-art AutoVerus (56.9%) and degrades less than AutoVerus under breaking Verus updates. On three repository-level benchmarks with cross-file dependencies, KVerus achieves a 51.0% success rate, compared to 4.5% for a multi-round prompting baseline. Finally, on the Asterinas Rust OS kernel, KVerus produces upstream-accepted proofs that verify 23 previously unverified functions (21.0% of proof code) in the memory-management module. KVerus represents a significant step towards making formal verification a scalable and sustainable practice for modern, security-critical software.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

KVerus shows concrete gains in Verus proof generation by linking dependency analysis to LLM retrieval and error-driven repair, with solid benchmark numbers and a real kernel contribution.

read the letter

KVerus improves automated proof generation for Rust under Verus by building a dynamic knowledge base from dependency analysis, semantic lemma indexing, and error feedback to handle code changes and cross-file dependencies. The main advance is the self-adaptive loop that couples these pieces rather than treating retrieval or repair in isolation. On three single-file benchmarks it reaches 80.2% success versus 56.9% for AutoVerus, and on three repository-level tasks it hits 51% where a multi-round prompting baseline gets 4.5%. It also produced 23 upstream-accepted proofs covering 21% of the proof code in Asterinas memory management, and it degrades less than the baseline when Verus itself updates. The paper supplies implementation details and external acceptance as evidence, which strengthens the empirical case. The evaluations look reproducible enough on the reported tasks, with no obvious internal contradictions in how the results are presented. The framing of a Semantic-Structural Gap is reasonable for explaining why plain LLMs struggle, though it is not a formally defined quantity. Soft spots are minor but worth noting: success still varies with the underlying LLM, and the benchmarks, while named, would benefit from more explicit discussion of how representative they are of arbitrary evolutionary changes. The Asterinas result is a strong practical signal, but it covers one module, so broader claims about arbitrary large codebases rest on the assumption that the knowledge-base approach scales without heavy manual lemma work. This paper is aimed at researchers and tool builders working on LLM-assisted formal verification for systems code. Readers focused on Rust, Verus, or scalable proof automation will get direct value from the numbers and the repair mechanism. It has enough grounded evaluation and a clear problem statement to deserve serious referee time rather than desk rejection.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces KVerus, a retrieval-augmented system for Verus-based Rust verification. It constructs a dynamic knowledge base via dependency-aware program analysis, semantic lemma indexing, and error-driven self-refinement to address the Semantic-Structural Gap between LLM semantic patterns and rigid formal dependencies. The system is evaluated on three single-file benchmarks (80.2% success vs. 56.9% for AutoVerus), three repository-level benchmarks with cross-file dependencies (51.0% vs. 4.5% for multi-round prompting), and a case study on the Asterinas kernel where it produces 23 upstream-accepted proofs covering 21% of the memory-management module's proof code. It also shows reduced degradation under Verus toolchain updates.

Significance. If the reported results hold, KVerus represents a meaningful advance toward scalable, resilient formal verification for evolving Rust codebases. The concrete empirical gains over baselines, combined with the real-world contribution of upstream-accepted proofs to an OS kernel, provide tangible evidence of practical utility. The emphasis on automatic adaptation to cross-module changes and toolchain evolution is a notable strength, and the paper supplies implementation details and degradation results that support reproducibility of the core claims.

minor comments (3)

The abstract and evaluation sections refer to 'three single-file benchmarks' and 'three repository-level benchmarks' without naming them or providing a high-level summary table of task counts and characteristics; adding this would improve immediate readability and allow readers to assess scope more quickly.
The introduction of the 'Semantic-Structural Gap' is presented as a fundamental problem but lacks a concise formal characterization or pointer to related concepts in the literature; a short definitional paragraph early in the paper would clarify its scope.
Figure captions and axis labels in the benchmark comparison plots could be expanded to explicitly state the number of tasks per benchmark and the exact baseline configurations used, reducing the need to cross-reference the text.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and constructive review, which accurately captures the core contributions of KVerus in addressing the Semantic-Structural Gap for scalable Rust verification. The recommendation for minor revision is appreciated, and we note the emphasis on empirical gains, real-world impact on the Asterinas kernel, and reproducibility. Since the report lists no specific major comments, we have no point-by-point rebuttals to provide. We will incorporate minor editorial improvements in the revised version to further enhance clarity and implementation details.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central claims consist of empirical success rates (80.2% on single-file benchmarks, 51.0% on repository-level tasks, and 23 upstream-accepted proofs in Asterinas) measured against external baselines and third-party code. No equations, fitted parameters, self-definitional constructs, or load-bearing self-citations appear in the derivation; the Semantic-Structural Gap is introduced as a descriptive framing rather than a formal premise that the results are forced to satisfy. Implementation details for dependency analysis, semantic indexing, and error-driven refinement are presented as independent engineering choices whose effectiveness is validated externally, leaving the reported outcomes non-tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the premise that LLMs guided by structural metadata can produce correct proofs and that the identified Semantic-Structural Gap is the dominant failure mode; no free parameters or new physical entities are introduced.

axioms (1)

domain assumption LLMs operating on semantic patterns can be made to respect rigid structural dependencies of formal verification when supplied with dependency metadata and error feedback
This is the central bridging claim stated in the abstract as the solution to the Semantic-Structural Gap.

invented entities (1)

Semantic-Structural Gap no independent evidence
purpose: Conceptual explanation for why standard LLM prompting fails on formal verification tasks involving cross-module dependencies and toolchain evolution
Introduced in the abstract as the fundamental problem the system is designed to solve; no independent empirical test of the gap itself is described.

pith-pipeline@v0.9.0 · 5643 in / 1602 out tokens · 47653 ms · 2026-05-07T15:47:44.743899+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

[1]

https://model-checking.github.io/ verify-rust-std

Verify Rust Standard Library Effort, 2024. https://model-checking.github.io/ verify-rust-std

work page 2024
[2]

Alphaverus: Bootstrapping formally verified code generation through self-improving translation and tree- finement

Pranjal Aggarwal, Bryan Parno, and Sean Welleck. Alphaverus: Bootstrapping formally verified code generation through self-improving translation and tree- finement. InForty-second International Conference on Machine Learning (ICML 2025), 2025

work page 2025
[3]

Claude Sonnet 4, 2025

Anthropic. Claude Sonnet 4, 2025. https://www.anthropic.com/news/claude-4

work page 2025
[4]

The prusti project: Formal verification for rust

Vytautas Astrauskas, Aurel Bíl`y, Jonáš Fiala, Zachary Grannan, Christoph Math- eja, Peter Müller, Federico Poli, and Alexander J Summers. The prusti project: Formal verification for rust. InNASA Formal Methods Symposium, pages 88–108. Springer, 2022

work page 2022
[5]

Springer Science & Business Media, 2013

Yves Bertot and Pierre Castéran.Interactive theorem proving and program de- velopment: Coq’Art: the calculus of inductive constructions. Springer Science & Business Media, 2013

work page 2013
[6]

LangChain, October 2022

Harrison Chase. LangChain, October 2022. https://github.com/langchain-ai/ langchain

work page 2022
[7]

Automated proof generation for rust code via self-evolution

Tianyu Chen, Shuai Lu, Shan Lu, Yeyun Gong, Chenyuan Yang, Xuheng Li, Md Rakib Hossain Misu, Hao Yu, Nan Duan, Peng Cheng, et al. Automated proof generation for rust code via self-evolution. InThe Thirteenth International Conference on Learning Representations (ICLR 25), 2025

work page 2025
[8]

Atmosphere: Practical verified kernels with rust and verus

Xiangdong Chen, Zhaofeng Li, Jerry Zhang, Vikram Narayanan, and Anton Burtsev. Atmosphere: Practical verified kernels with rust and verus. SOSP ’25, page 752–767, New York, NY, USA, 2025. Association for Computing Machinery

work page 2025
[9]

Z3: An efficient smt solver

Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient smt solver. InInter- national conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 337–340. Springer, 2008

work page 2008
[10]

Creusot: A foundry for the deductive verification of rust programs

Xavier Denis, Jacques-Henri Jourdan, and Claude Marché. Creusot: A foundry for the deductive verification of rust programs. InInternational Conference on Formal Engineering Methods, pages 90–105. Springer, 2022

work page 2022
[11]

CertiKOS: An extensible architecture for building certified concurrent OS kernels

Ronghui Gu, Zhong Shao, Hao Chen, Xiongnan Newman Wu, Jieung Kim, Vilhelm Sjöberg, and David Costanzo. CertiKOS: An extensible architecture for building certified concurrent OS kernels. In12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 653–669, 2016

work page 2016
[12]

Leaf: Modularity for temporary sharing in separation logic.Proceedings of the ACM on Programming Languages, 7(OOPSLA2):31–58, 2023

Travis Hance, Jon Howell, Oded Padon, and Bryan Parno. Leaf: Modularity for temporary sharing in separation logic.Proceedings of the ACM on Programming Languages, 7(OOPSLA2):31–58, 2023

work page 2023
[13]

Shard- ing the state machine: Automated modular reasoning for complex concurrent systems

Travis Hance, Yi Zhou, Andrea Lattuada, Reto Achermann, Alex Conway, Ryan Stutsman, Gerd Zellweger, Chris Hawblitzel, Jon Howell, and Bryan Parno. Shard- ing the state machine: Automated modular reasoning for complex concurrent systems. In17th USENIX Symposium on Operating Systems Design and Implemen- tation (OSDI 23), pages 911–929, 2023

work page 2023
[14]

Miri: Practical undefined behavior detection for rust

Ralf Jung, Benjamin Kimock, Christian Poveda, Eduardo Sánchez Muñoz, Oli Scherer, and Qian Wang. Miri: Practical undefined behavior detection for rust. Proc. ACM Program. Lang., 10(POPL), January 2026

work page 2026
[15]

Iris: Monoids and invariants as an orthogonal basis for concurrent reasoning.ACM SIGPLAN Notices, 50(1):637–650, 2015

Ralf Jung, David Swasey, Filip Sieczkowski, Kasper Svendsen, Aaron Turon, Lars Birkedal, and Derek Dreyer. Iris: Monoids and invariants as an orthogonal basis for concurrent reasoning.ACM SIGPLAN Notices, 50(1):637–650, 2015. KVerus: Scalable and Resilient Formal Verification Proof Generation for Rust Code Conference’17, July 2017, Washington, DC, USA

work page 2015
[16]

seL4: Formal verification of an OS kernel

Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, et al. seL4: Formal verification of an OS kernel. InProceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 207–220, 2009

work page 2009
[17]

Verus: A practical foundation for systems verification

Andrea Lattuada, Travis Hance, Jay Bosamiya, Matthias Brun, Chanhee Cho, Hay- ley LeBlanc, Pranav Srinivasan, Reto Achermann, Tej Chajed, Chris Hawblitzel, et al. Verus: A practical foundation for systems verification. InProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, pages 438–454, 2024

work page 2024
[18]

Verus: Verifying rust programs using linear ghost types.Proceedings of the ACM on Programming Languages, 7(OOPSLA1):286–315, 2023

Andrea Lattuada, Travis Hance, Chanhee Cho, Matthias Brun, Isitha Subasinghe, Yi Zhou, Jon Howell, Bryan Parno, and Chris Hawblitzel. Verus: Verifying rust programs using linear ghost types.Proceedings of the ACM on Programming Languages, 7(OOPSLA1):286–315, 2023

work page 2023
[19]

Dafny: An automatic program verifier for functional correct- ness

K Rustan M Leino. Dafny: An automatic program verifier for functional correct- ness. InInternational conference on logic for programming artificial intelligence and reasoning, pages 348–370. Springer, 2010

work page 2010
[20]

Trigger selection strategies to stabi- lize program verifiers

K Rustan M Leino and Clément Pit-Claudel. Trigger selection strategies to stabi- lize program verifiers. InInternational Conference on Computer Aided Verification, pages 361–381. Springer, 2016

work page 2016
[21]

Compcert-a formally verified optimizing compiler

Xavier Leroy, Sandrine Blazy, Daniel Kästner, Bernhard Schommer, Markus Pister, and Christian Ferdinand. Compcert-a formally verified optimizing compiler. In ERTS 2016: Embedded Real Time Software and Systems, 8th European Congress, 2016

work page 2016
[22]

Linear types for large-scale systems verification.Proceed- ings of the ACM on Programming Languages, 6(OOPSLA1):1–28, 2022

Jialin Li, Andrea Lattuada, Yi Zhou, Jonathan Cameron, Jon Howell, Bryan Parno, and Chris Hawblitzel. Linear types for large-scale systems verification.Proceed- ings of the ACM on Programming Languages, 6(OOPSLA1):1–28, 2022

work page 2022
[23]

Proof automation with large language models

Minghai Lu, Benjamin Delaware, and Tianyi Zhang. Proof automation with large language models. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, pages 1509–1520, 2024

work page 2024
[24]

Specgen: Automated generation of formal program specifications via large language models

Lezhi Ma, Shangqing Liu, Yi Li, Xiaofei Xie, and Lei Bu. Specgen: Automated generation of formal program specifications via large language models. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), pages 16–28. IEEE, 2025

work page 2025
[25]

The lean 4 theorem prover and programming language

Leonardo de Moura and Sebastian Ullrich. The lean 4 theorem prover and programming language. InInternational Conference on Automated Deduction, pages 625–635. Springer, 2021

work page 2021
[26]

Laurel: Unblocking automated verification with large language models.Proceedings of the ACM on Programming Languages, 9(OOPSLA1):1519– 1545, 2025

Eric Mugnier, Emmanuel Anaya Gonzalez, Nadia Polikarpova, Ranjit Jhala, and Zhou Yuanyuan. Laurel: Unblocking automated verification with large language models.Proceedings of the ACM on Programming Languages, 9(OOPSLA1):1519– 1545, 2025

work page 2025
[27]

Springer, 2002

Tobias Nipkow, Markus Wenzel, and Lawrence C Paulson.Isabelle/HOL: a proof assistant for higher-order logic. Springer, 2002

work page 2002
[28]

Asterinas: A Linux ABI-Compatible, Rust-Based Framekernel OS with a Small and Sound TCB

Yuke Peng, Hongliang Tian, Zhang Junyang, Ruihan Li, Chengjun Chen, Jianfeng Jiang, Jinyi Xian, Xiaolin Wang, Chenren Xu, Diyu Zhou, et al. Asterinas: A Linux ABI-Compatible, Rust-Based Framekernel OS with a Small and Sound TCB. In2025 USENIX Annual Technical Conference (USENIX ATC 25), 2025

work page 2025
[29]

Owlc: Compiling security pro- tocols to verified, secure, high-performance libraries.Cryptology ePrint Archive, 2025

Pratap Singh, Joshua Gancher, and Bryan Parno. Owlc: Compiling security pro- tocols to verified, secure, high-performance libraries.Cryptology ePrint Archive, 2025

work page 2025
[30]

Anvil: Verifying liveness of cluster management controllers

Xudong Sun, Wenjie Ma, Jiawei Tyler Gu, Zicheng Ma, Tej Chajed, Jon Howell, Andrea Lattuada, Oded Padon, Lalith Suresh, Adriana Szekeres, et al. Anvil: Verifying liveness of cluster management controllers. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), pages 649–666, 2024

work page 2024
[31]

Mirai: an abstract interpreter for the rust compiler’s mid-level intermediate representation (mir)., 2025

MIRAI Developement Team. Mirai: an abstract interpreter for the rust compiler’s mid-level intermediate representation (mir)., 2025. https://github.com/endorlabs/ MIRAI

work page 2025
[32]

The rocq proof assistant v9.1.0, 2025

The Rocq Development Team. The rocq proof assistant v9.1.0, 2025. https://rocq- prover.org

work page 2025
[33]

A verus compiler front-end for ides, 2026

The Verus Developement Team. A verus compiler front-end for ides, 2026. https://github.com/verus-lang/verus-analyzer

work page 2026
[34]

Rango: Adap- tive retrieval-augmented proving for automated software verification

Kyle Thompson, Nuno Saavedra, Pedro Carrott, Kevin Fisher, Alex Sanchez- Stern, Yuriy Brun, João F Ferreira, Sorin Lerner, and Emily First. Rango: Adap- tive retrieval-augmented proving for automated software verification. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), pages 347–359. IEEE, 2025

work page 2025
[35]

Verifying dynamic trait objects in rust

Alexa VanHattum, Daniel Schwartz-Narbonne, Nathan Chong, and Adrian Samp- son. Verifying dynamic trait objects in rust. InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice, pages 321– 330, 2022

work page 2022
[36]

Unsafecop: Towards memory safety for real-world unsafe rust co de with p ractical bounded model checking

Minghua Wang, Jingling Xue, Lin Huang, Yuan Zi, and Tao Wei. Unsafecop: Towards memory safety for real-world unsafe rust co de with p ractical bounded model checking. InInternational Symposium on Formal Methods, pages 307–324. Springer, 2024

work page 2024
[37]

Autoverus: Automated proof generation for rust code.Proceedings of the ACM on Programming Languages, 9(OOPSLA2):3454–3482, 2025

Chenyuan Yang, Xuheng Li, Md Rakib Hossain Misu, Jianan Yao, Weidong Cui, Yeyun Gong, Chris Hawblitzel, Shuvendu Lahiri, Jacob R Lorch, Shuai Lu, et al. Autoverus: Automated proof generation for rust code.Proceedings of the ACM on Programming Languages, 9(OOPSLA2):3454–3482, 2025

work page 2025
[38]

CortenMM: Efficient Memory Management with Strong Correctness Guarantees

Junyang Zhang, Xiangcan Xu, Yonghao Zou, Zhe Tang, Xinyi Wan, Kang Hu, Siyuan Wang, Wenbo Xu, Di Wang, Hao Chen, Hongliang Tian, Lin Huang, Shoumeng Yan, Yuval Tamir, Yingwei Luo, Xiaolin Wang, Huashan Yu, Zhenlin Wang, and Diyu Zhou. CortenMM: Efficient Memory Management with Strong Correctness Guarantees. InProceedings of the ACM SIGOPS 31th Symposium o...

work page 2025
[39]

Vst-a: A foundationally sound annotation verifier.Proceedings of the ACM on Programming Languages (POPL), 8:2069–2098, 2024

Litao Zhou, Jianxing Qin, Qinshi Wang, Andrew W Appel, and Qinxiang Cao. Vst-a: A foundationally sound annotation verifier.Proceedings of the ACM on Programming Languages (POPL), 8:2069–2098, 2024

work page 2069
[40]

VeriSMo: A verified security module for confidential VMs

Ziqiao Zhou, Weiteng Chen, Sishuai Gong, Chris Hawblitzel, Weidong Cui, et al. VeriSMo: A verified security module for confidential VMs. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), pages 599–614, 2024. A Proof Failures Caused by Incorrect Specifications While verifying the state machine of Asterinas’ RCU (Read-Copy- U...

work page 2024
[41]

We highlight the corrected lines added by human expert

]; 18} 19} Listing 7: Proof failure due to incorrect specification of protocol_lock_skip. We highlight the corrected lines added by human expert. Conference’17, July 2017, Washington, DC, USA Yuwei Liu, Xinyi Wan, Yanhao Wang, Minghua Wang, Lin Huang, and Tao Wei 1#[inductive(initialize)] 2fninitialize_inductive(post:Self, cpu_num:CpuId) { 3assert(post.wf...

work page 2017

[1] [1]

https://model-checking.github.io/ verify-rust-std

Verify Rust Standard Library Effort, 2024. https://model-checking.github.io/ verify-rust-std

work page 2024

[2] [2]

Alphaverus: Bootstrapping formally verified code generation through self-improving translation and tree- finement

Pranjal Aggarwal, Bryan Parno, and Sean Welleck. Alphaverus: Bootstrapping formally verified code generation through self-improving translation and tree- finement. InForty-second International Conference on Machine Learning (ICML 2025), 2025

work page 2025

[3] [3]

Claude Sonnet 4, 2025

Anthropic. Claude Sonnet 4, 2025. https://www.anthropic.com/news/claude-4

work page 2025

[4] [4]

The prusti project: Formal verification for rust

Vytautas Astrauskas, Aurel Bíl`y, Jonáš Fiala, Zachary Grannan, Christoph Math- eja, Peter Müller, Federico Poli, and Alexander J Summers. The prusti project: Formal verification for rust. InNASA Formal Methods Symposium, pages 88–108. Springer, 2022

work page 2022

[5] [5]

Springer Science & Business Media, 2013

Yves Bertot and Pierre Castéran.Interactive theorem proving and program de- velopment: Coq’Art: the calculus of inductive constructions. Springer Science & Business Media, 2013

work page 2013

[6] [6]

LangChain, October 2022

Harrison Chase. LangChain, October 2022. https://github.com/langchain-ai/ langchain

work page 2022

[7] [7]

Automated proof generation for rust code via self-evolution

Tianyu Chen, Shuai Lu, Shan Lu, Yeyun Gong, Chenyuan Yang, Xuheng Li, Md Rakib Hossain Misu, Hao Yu, Nan Duan, Peng Cheng, et al. Automated proof generation for rust code via self-evolution. InThe Thirteenth International Conference on Learning Representations (ICLR 25), 2025

work page 2025

[8] [8]

Atmosphere: Practical verified kernels with rust and verus

Xiangdong Chen, Zhaofeng Li, Jerry Zhang, Vikram Narayanan, and Anton Burtsev. Atmosphere: Practical verified kernels with rust and verus. SOSP ’25, page 752–767, New York, NY, USA, 2025. Association for Computing Machinery

work page 2025

[9] [9]

Z3: An efficient smt solver

Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient smt solver. InInter- national conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 337–340. Springer, 2008

work page 2008

[10] [10]

Creusot: A foundry for the deductive verification of rust programs

Xavier Denis, Jacques-Henri Jourdan, and Claude Marché. Creusot: A foundry for the deductive verification of rust programs. InInternational Conference on Formal Engineering Methods, pages 90–105. Springer, 2022

work page 2022

[11] [11]

CertiKOS: An extensible architecture for building certified concurrent OS kernels

Ronghui Gu, Zhong Shao, Hao Chen, Xiongnan Newman Wu, Jieung Kim, Vilhelm Sjöberg, and David Costanzo. CertiKOS: An extensible architecture for building certified concurrent OS kernels. In12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 653–669, 2016

work page 2016

[12] [12]

Leaf: Modularity for temporary sharing in separation logic.Proceedings of the ACM on Programming Languages, 7(OOPSLA2):31–58, 2023

Travis Hance, Jon Howell, Oded Padon, and Bryan Parno. Leaf: Modularity for temporary sharing in separation logic.Proceedings of the ACM on Programming Languages, 7(OOPSLA2):31–58, 2023

work page 2023

[13] [13]

Shard- ing the state machine: Automated modular reasoning for complex concurrent systems

Travis Hance, Yi Zhou, Andrea Lattuada, Reto Achermann, Alex Conway, Ryan Stutsman, Gerd Zellweger, Chris Hawblitzel, Jon Howell, and Bryan Parno. Shard- ing the state machine: Automated modular reasoning for complex concurrent systems. In17th USENIX Symposium on Operating Systems Design and Implemen- tation (OSDI 23), pages 911–929, 2023

work page 2023

[14] [14]

Miri: Practical undefined behavior detection for rust

Ralf Jung, Benjamin Kimock, Christian Poveda, Eduardo Sánchez Muñoz, Oli Scherer, and Qian Wang. Miri: Practical undefined behavior detection for rust. Proc. ACM Program. Lang., 10(POPL), January 2026

work page 2026

[15] [15]

Iris: Monoids and invariants as an orthogonal basis for concurrent reasoning.ACM SIGPLAN Notices, 50(1):637–650, 2015

Ralf Jung, David Swasey, Filip Sieczkowski, Kasper Svendsen, Aaron Turon, Lars Birkedal, and Derek Dreyer. Iris: Monoids and invariants as an orthogonal basis for concurrent reasoning.ACM SIGPLAN Notices, 50(1):637–650, 2015. KVerus: Scalable and Resilient Formal Verification Proof Generation for Rust Code Conference’17, July 2017, Washington, DC, USA

work page 2015

[16] [16]

seL4: Formal verification of an OS kernel

Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, et al. seL4: Formal verification of an OS kernel. InProceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 207–220, 2009

work page 2009

[17] [17]

Verus: A practical foundation for systems verification

Andrea Lattuada, Travis Hance, Jay Bosamiya, Matthias Brun, Chanhee Cho, Hay- ley LeBlanc, Pranav Srinivasan, Reto Achermann, Tej Chajed, Chris Hawblitzel, et al. Verus: A practical foundation for systems verification. InProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, pages 438–454, 2024

work page 2024

[18] [18]

Verus: Verifying rust programs using linear ghost types.Proceedings of the ACM on Programming Languages, 7(OOPSLA1):286–315, 2023

Andrea Lattuada, Travis Hance, Chanhee Cho, Matthias Brun, Isitha Subasinghe, Yi Zhou, Jon Howell, Bryan Parno, and Chris Hawblitzel. Verus: Verifying rust programs using linear ghost types.Proceedings of the ACM on Programming Languages, 7(OOPSLA1):286–315, 2023

work page 2023

[19] [19]

Dafny: An automatic program verifier for functional correct- ness

K Rustan M Leino. Dafny: An automatic program verifier for functional correct- ness. InInternational conference on logic for programming artificial intelligence and reasoning, pages 348–370. Springer, 2010

work page 2010

[20] [20]

Trigger selection strategies to stabi- lize program verifiers

K Rustan M Leino and Clément Pit-Claudel. Trigger selection strategies to stabi- lize program verifiers. InInternational Conference on Computer Aided Verification, pages 361–381. Springer, 2016

work page 2016

[21] [21]

Compcert-a formally verified optimizing compiler

Xavier Leroy, Sandrine Blazy, Daniel Kästner, Bernhard Schommer, Markus Pister, and Christian Ferdinand. Compcert-a formally verified optimizing compiler. In ERTS 2016: Embedded Real Time Software and Systems, 8th European Congress, 2016

work page 2016

[22] [22]

Linear types for large-scale systems verification.Proceed- ings of the ACM on Programming Languages, 6(OOPSLA1):1–28, 2022

Jialin Li, Andrea Lattuada, Yi Zhou, Jonathan Cameron, Jon Howell, Bryan Parno, and Chris Hawblitzel. Linear types for large-scale systems verification.Proceed- ings of the ACM on Programming Languages, 6(OOPSLA1):1–28, 2022

work page 2022

[23] [23]

Proof automation with large language models

Minghai Lu, Benjamin Delaware, and Tianyi Zhang. Proof automation with large language models. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, pages 1509–1520, 2024

work page 2024

[24] [24]

Specgen: Automated generation of formal program specifications via large language models

Lezhi Ma, Shangqing Liu, Yi Li, Xiaofei Xie, and Lei Bu. Specgen: Automated generation of formal program specifications via large language models. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), pages 16–28. IEEE, 2025

work page 2025

[25] [25]

The lean 4 theorem prover and programming language

Leonardo de Moura and Sebastian Ullrich. The lean 4 theorem prover and programming language. InInternational Conference on Automated Deduction, pages 625–635. Springer, 2021

work page 2021

[26] [26]

Laurel: Unblocking automated verification with large language models.Proceedings of the ACM on Programming Languages, 9(OOPSLA1):1519– 1545, 2025

Eric Mugnier, Emmanuel Anaya Gonzalez, Nadia Polikarpova, Ranjit Jhala, and Zhou Yuanyuan. Laurel: Unblocking automated verification with large language models.Proceedings of the ACM on Programming Languages, 9(OOPSLA1):1519– 1545, 2025

work page 2025

[27] [27]

Springer, 2002

Tobias Nipkow, Markus Wenzel, and Lawrence C Paulson.Isabelle/HOL: a proof assistant for higher-order logic. Springer, 2002

work page 2002

[28] [28]

Asterinas: A Linux ABI-Compatible, Rust-Based Framekernel OS with a Small and Sound TCB

Yuke Peng, Hongliang Tian, Zhang Junyang, Ruihan Li, Chengjun Chen, Jianfeng Jiang, Jinyi Xian, Xiaolin Wang, Chenren Xu, Diyu Zhou, et al. Asterinas: A Linux ABI-Compatible, Rust-Based Framekernel OS with a Small and Sound TCB. In2025 USENIX Annual Technical Conference (USENIX ATC 25), 2025

work page 2025

[29] [29]

Owlc: Compiling security pro- tocols to verified, secure, high-performance libraries.Cryptology ePrint Archive, 2025

Pratap Singh, Joshua Gancher, and Bryan Parno. Owlc: Compiling security pro- tocols to verified, secure, high-performance libraries.Cryptology ePrint Archive, 2025

work page 2025

[30] [30]

Anvil: Verifying liveness of cluster management controllers

Xudong Sun, Wenjie Ma, Jiawei Tyler Gu, Zicheng Ma, Tej Chajed, Jon Howell, Andrea Lattuada, Oded Padon, Lalith Suresh, Adriana Szekeres, et al. Anvil: Verifying liveness of cluster management controllers. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), pages 649–666, 2024

work page 2024

[31] [31]

Mirai: an abstract interpreter for the rust compiler’s mid-level intermediate representation (mir)., 2025

MIRAI Developement Team. Mirai: an abstract interpreter for the rust compiler’s mid-level intermediate representation (mir)., 2025. https://github.com/endorlabs/ MIRAI

work page 2025

[32] [32]

The rocq proof assistant v9.1.0, 2025

The Rocq Development Team. The rocq proof assistant v9.1.0, 2025. https://rocq- prover.org

work page 2025

[33] [33]

A verus compiler front-end for ides, 2026

The Verus Developement Team. A verus compiler front-end for ides, 2026. https://github.com/verus-lang/verus-analyzer

work page 2026

[34] [34]

Rango: Adap- tive retrieval-augmented proving for automated software verification

Kyle Thompson, Nuno Saavedra, Pedro Carrott, Kevin Fisher, Alex Sanchez- Stern, Yuriy Brun, João F Ferreira, Sorin Lerner, and Emily First. Rango: Adap- tive retrieval-augmented proving for automated software verification. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), pages 347–359. IEEE, 2025

work page 2025

[35] [35]

Verifying dynamic trait objects in rust

Alexa VanHattum, Daniel Schwartz-Narbonne, Nathan Chong, and Adrian Samp- son. Verifying dynamic trait objects in rust. InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice, pages 321– 330, 2022

work page 2022

[36] [36]

Unsafecop: Towards memory safety for real-world unsafe rust co de with p ractical bounded model checking

Minghua Wang, Jingling Xue, Lin Huang, Yuan Zi, and Tao Wei. Unsafecop: Towards memory safety for real-world unsafe rust co de with p ractical bounded model checking. InInternational Symposium on Formal Methods, pages 307–324. Springer, 2024

work page 2024

[37] [37]

Autoverus: Automated proof generation for rust code.Proceedings of the ACM on Programming Languages, 9(OOPSLA2):3454–3482, 2025

Chenyuan Yang, Xuheng Li, Md Rakib Hossain Misu, Jianan Yao, Weidong Cui, Yeyun Gong, Chris Hawblitzel, Shuvendu Lahiri, Jacob R Lorch, Shuai Lu, et al. Autoverus: Automated proof generation for rust code.Proceedings of the ACM on Programming Languages, 9(OOPSLA2):3454–3482, 2025

work page 2025

[38] [38]

CortenMM: Efficient Memory Management with Strong Correctness Guarantees

Junyang Zhang, Xiangcan Xu, Yonghao Zou, Zhe Tang, Xinyi Wan, Kang Hu, Siyuan Wang, Wenbo Xu, Di Wang, Hao Chen, Hongliang Tian, Lin Huang, Shoumeng Yan, Yuval Tamir, Yingwei Luo, Xiaolin Wang, Huashan Yu, Zhenlin Wang, and Diyu Zhou. CortenMM: Efficient Memory Management with Strong Correctness Guarantees. InProceedings of the ACM SIGOPS 31th Symposium o...

work page 2025

[39] [39]

Vst-a: A foundationally sound annotation verifier.Proceedings of the ACM on Programming Languages (POPL), 8:2069–2098, 2024

Litao Zhou, Jianxing Qin, Qinshi Wang, Andrew W Appel, and Qinxiang Cao. Vst-a: A foundationally sound annotation verifier.Proceedings of the ACM on Programming Languages (POPL), 8:2069–2098, 2024

work page 2069

[40] [40]

VeriSMo: A verified security module for confidential VMs

Ziqiao Zhou, Weiteng Chen, Sishuai Gong, Chris Hawblitzel, Weidong Cui, et al. VeriSMo: A verified security module for confidential VMs. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), pages 599–614, 2024. A Proof Failures Caused by Incorrect Specifications While verifying the state machine of Asterinas’ RCU (Read-Copy- U...

work page 2024

[41] [41]

We highlight the corrected lines added by human expert

]; 18} 19} Listing 7: Proof failure due to incorrect specification of protocol_lock_skip. We highlight the corrected lines added by human expert. Conference’17, July 2017, Washington, DC, USA Yuwei Liu, Xinyi Wan, Yanhao Wang, Minghua Wang, Lin Huang, and Tao Wei 1#[inductive(initialize)] 2fninitialize_inductive(post:Self, cpu_num:CpuId) { 3assert(post.wf...

work page 2017