KVerus: Scalable and Resilient Formal Verification Proof Generation for Rust Code
Pith reviewed 2026-05-07 15:47 UTC · model grok-4.3
The pith
KVerus builds a dynamic knowledge base to generate and repair formal proofs for Rust code across changing dependencies and toolchains.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
KVerus constructs a dynamic knowledge base of code metadata, lemma semantics, and toolchain specifics. By combining dependency-aware program analysis, semantic lemma indexing, and error-driven self-refinement, it navigates intricate cross-file dependencies to synthesize proofs and automatically repair proofs when faced with common evolutionary changes.
What carries the argument
Dynamic knowledge base built from dependency analysis, semantic lemma indexing, and error-driven self-refinement, which supports self-adaptive proof synthesis and repair across module boundaries.
If this is right
- Verifies 80.2 percent of tasks on single-file benchmarks, compared with 56.9 percent for the prior AutoVerus system.
- Reaches 51.0 percent success on repository-level benchmarks containing cross-file dependencies, versus 4.5 percent for multi-round prompting.
- Produces upstream-accepted proofs for 23 previously unverified functions representing 21 percent of the proof code in a kernel memory-management module.
- Degrades less than prior methods when the verification toolchain receives breaking updates.
Where Pith is reading between the lines
- The same dependency-aware indexing and repair loop could be reused with other formal verifiers that expose similar error messages and lemma libraries.
- Teams maintaining verified libraries would spend less time on proof repair after routine refactors if the knowledge base is kept current.
- Extending the base to track performance or security invariants alongside functional proofs could support combined assurance arguments in one workflow.
Load-bearing premise
The assumption that analysis of dependencies and past errors will keep producing a knowledge base sufficient to bridge semantic code patterns with rigid structural proof requirements even as modules and toolchains continue to evolve.
What would settle it
A sequence of cross-module code changes that introduce new structural dependencies not captured by the existing analysis or indexing, after which KVerus repeatedly fails to produce or repair correct proofs without manual lemma additions.
Figures
read the original abstract
Formal verification provides the highest assurance of software correctness and security, but its application to large-scale, evolving systems remains a major challenge. While large language models (LLMs) have shown promise in automating proof generation, they often fail in real-world settings due to their inability to handle complex cross-module dependencies or changes in the codebase or the verification toolchain. We identify the fundamental problem as the Semantic-Structural Gap: LLMs operate on semantic code patterns, whereas formal verification is governed by rigid structural dependencies, a disconnect that leads to brittle, unsustainable proofs. To bridge this gap, we propose a new paradigm of self-adaptive verification and present KVerus, a retrieval-augmented system for Verus-based Rust verification that can adapt to an evolving software environment. KVerus constructs a dynamic knowledge base of code metadata, lemma semantics, and toolchain specifics. By combining dependency-aware program analysis, semantic lemma indexing, and error-driven self-refinement, it can navigate intricate cross-file dependencies to synthesize proofs and automatically repair proofs when faced with common evolutionary changes. Across three single-file benchmarks, KVerus verifies 80.2% of tasks, outperforming the state-of-the-art AutoVerus (56.9%) and degrades less than AutoVerus under breaking Verus updates. On three repository-level benchmarks with cross-file dependencies, KVerus achieves a 51.0% success rate, compared to 4.5% for a multi-round prompting baseline. Finally, on the Asterinas Rust OS kernel, KVerus produces upstream-accepted proofs that verify 23 previously unverified functions (21.0% of proof code) in the memory-management module. KVerus represents a significant step towards making formal verification a scalable and sustainable practice for modern, security-critical software.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces KVerus, a retrieval-augmented system for Verus-based Rust verification. It constructs a dynamic knowledge base via dependency-aware program analysis, semantic lemma indexing, and error-driven self-refinement to address the Semantic-Structural Gap between LLM semantic patterns and rigid formal dependencies. The system is evaluated on three single-file benchmarks (80.2% success vs. 56.9% for AutoVerus), three repository-level benchmarks with cross-file dependencies (51.0% vs. 4.5% for multi-round prompting), and a case study on the Asterinas kernel where it produces 23 upstream-accepted proofs covering 21% of the memory-management module's proof code. It also shows reduced degradation under Verus toolchain updates.
Significance. If the reported results hold, KVerus represents a meaningful advance toward scalable, resilient formal verification for evolving Rust codebases. The concrete empirical gains over baselines, combined with the real-world contribution of upstream-accepted proofs to an OS kernel, provide tangible evidence of practical utility. The emphasis on automatic adaptation to cross-module changes and toolchain evolution is a notable strength, and the paper supplies implementation details and degradation results that support reproducibility of the core claims.
minor comments (3)
- The abstract and evaluation sections refer to 'three single-file benchmarks' and 'three repository-level benchmarks' without naming them or providing a high-level summary table of task counts and characteristics; adding this would improve immediate readability and allow readers to assess scope more quickly.
- The introduction of the 'Semantic-Structural Gap' is presented as a fundamental problem but lacks a concise formal characterization or pointer to related concepts in the literature; a short definitional paragraph early in the paper would clarify its scope.
- Figure captions and axis labels in the benchmark comparison plots could be expanded to explicitly state the number of tasks per benchmark and the exact baseline configurations used, reducing the need to cross-reference the text.
Simulated Author's Rebuttal
We thank the referee for the positive and constructive review, which accurately captures the core contributions of KVerus in addressing the Semantic-Structural Gap for scalable Rust verification. The recommendation for minor revision is appreciated, and we note the emphasis on empirical gains, real-world impact on the Asterinas kernel, and reproducibility. Since the report lists no specific major comments, we have no point-by-point rebuttals to provide. We will incorporate minor editorial improvements in the revised version to further enhance clarity and implementation details.
Circularity Check
No significant circularity
full rationale
The paper's central claims consist of empirical success rates (80.2% on single-file benchmarks, 51.0% on repository-level tasks, and 23 upstream-accepted proofs in Asterinas) measured against external baselines and third-party code. No equations, fitted parameters, self-definitional constructs, or load-bearing self-citations appear in the derivation; the Semantic-Structural Gap is introduced as a descriptive framing rather than a formal premise that the results are forced to satisfy. Implementation details for dependency analysis, semantic indexing, and error-driven refinement are presented as independent engineering choices whose effectiveness is validated externally, leaving the reported outcomes non-tautological.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs operating on semantic patterns can be made to respect rigid structural dependencies of formal verification when supplied with dependency metadata and error feedback
invented entities (1)
-
Semantic-Structural Gap
no independent evidence
Reference graph
Works this paper leans on
-
[1]
https://model-checking.github.io/ verify-rust-std
Verify Rust Standard Library Effort, 2024. https://model-checking.github.io/ verify-rust-std
work page 2024
-
[2]
Pranjal Aggarwal, Bryan Parno, and Sean Welleck. Alphaverus: Bootstrapping formally verified code generation through self-improving translation and tree- finement. InForty-second International Conference on Machine Learning (ICML 2025), 2025
work page 2025
-
[3]
Anthropic. Claude Sonnet 4, 2025. https://www.anthropic.com/news/claude-4
work page 2025
-
[4]
The prusti project: Formal verification for rust
Vytautas Astrauskas, Aurel Bíl`y, Jonáš Fiala, Zachary Grannan, Christoph Math- eja, Peter Müller, Federico Poli, and Alexander J Summers. The prusti project: Formal verification for rust. InNASA Formal Methods Symposium, pages 88–108. Springer, 2022
work page 2022
-
[5]
Springer Science & Business Media, 2013
Yves Bertot and Pierre Castéran.Interactive theorem proving and program de- velopment: Coq’Art: the calculus of inductive constructions. Springer Science & Business Media, 2013
work page 2013
-
[6]
Harrison Chase. LangChain, October 2022. https://github.com/langchain-ai/ langchain
work page 2022
-
[7]
Automated proof generation for rust code via self-evolution
Tianyu Chen, Shuai Lu, Shan Lu, Yeyun Gong, Chenyuan Yang, Xuheng Li, Md Rakib Hossain Misu, Hao Yu, Nan Duan, Peng Cheng, et al. Automated proof generation for rust code via self-evolution. InThe Thirteenth International Conference on Learning Representations (ICLR 25), 2025
work page 2025
-
[8]
Atmosphere: Practical verified kernels with rust and verus
Xiangdong Chen, Zhaofeng Li, Jerry Zhang, Vikram Narayanan, and Anton Burtsev. Atmosphere: Practical verified kernels with rust and verus. SOSP ’25, page 752–767, New York, NY, USA, 2025. Association for Computing Machinery
work page 2025
-
[9]
Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient smt solver. InInter- national conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 337–340. Springer, 2008
work page 2008
-
[10]
Creusot: A foundry for the deductive verification of rust programs
Xavier Denis, Jacques-Henri Jourdan, and Claude Marché. Creusot: A foundry for the deductive verification of rust programs. InInternational Conference on Formal Engineering Methods, pages 90–105. Springer, 2022
work page 2022
-
[11]
CertiKOS: An extensible architecture for building certified concurrent OS kernels
Ronghui Gu, Zhong Shao, Hao Chen, Xiongnan Newman Wu, Jieung Kim, Vilhelm Sjöberg, and David Costanzo. CertiKOS: An extensible architecture for building certified concurrent OS kernels. In12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 653–669, 2016
work page 2016
-
[12]
Travis Hance, Jon Howell, Oded Padon, and Bryan Parno. Leaf: Modularity for temporary sharing in separation logic.Proceedings of the ACM on Programming Languages, 7(OOPSLA2):31–58, 2023
work page 2023
-
[13]
Shard- ing the state machine: Automated modular reasoning for complex concurrent systems
Travis Hance, Yi Zhou, Andrea Lattuada, Reto Achermann, Alex Conway, Ryan Stutsman, Gerd Zellweger, Chris Hawblitzel, Jon Howell, and Bryan Parno. Shard- ing the state machine: Automated modular reasoning for complex concurrent systems. In17th USENIX Symposium on Operating Systems Design and Implemen- tation (OSDI 23), pages 911–929, 2023
work page 2023
-
[14]
Miri: Practical undefined behavior detection for rust
Ralf Jung, Benjamin Kimock, Christian Poveda, Eduardo Sánchez Muñoz, Oli Scherer, and Qian Wang. Miri: Practical undefined behavior detection for rust. Proc. ACM Program. Lang., 10(POPL), January 2026
work page 2026
-
[15]
Ralf Jung, David Swasey, Filip Sieczkowski, Kasper Svendsen, Aaron Turon, Lars Birkedal, and Derek Dreyer. Iris: Monoids and invariants as an orthogonal basis for concurrent reasoning.ACM SIGPLAN Notices, 50(1):637–650, 2015. KVerus: Scalable and Resilient Formal Verification Proof Generation for Rust Code Conference’17, July 2017, Washington, DC, USA
work page 2015
-
[16]
seL4: Formal verification of an OS kernel
Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, et al. seL4: Formal verification of an OS kernel. InProceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 207–220, 2009
work page 2009
-
[17]
Verus: A practical foundation for systems verification
Andrea Lattuada, Travis Hance, Jay Bosamiya, Matthias Brun, Chanhee Cho, Hay- ley LeBlanc, Pranav Srinivasan, Reto Achermann, Tej Chajed, Chris Hawblitzel, et al. Verus: A practical foundation for systems verification. InProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, pages 438–454, 2024
work page 2024
-
[18]
Andrea Lattuada, Travis Hance, Chanhee Cho, Matthias Brun, Isitha Subasinghe, Yi Zhou, Jon Howell, Bryan Parno, and Chris Hawblitzel. Verus: Verifying rust programs using linear ghost types.Proceedings of the ACM on Programming Languages, 7(OOPSLA1):286–315, 2023
work page 2023
-
[19]
Dafny: An automatic program verifier for functional correct- ness
K Rustan M Leino. Dafny: An automatic program verifier for functional correct- ness. InInternational conference on logic for programming artificial intelligence and reasoning, pages 348–370. Springer, 2010
work page 2010
-
[20]
Trigger selection strategies to stabi- lize program verifiers
K Rustan M Leino and Clément Pit-Claudel. Trigger selection strategies to stabi- lize program verifiers. InInternational Conference on Computer Aided Verification, pages 361–381. Springer, 2016
work page 2016
-
[21]
Compcert-a formally verified optimizing compiler
Xavier Leroy, Sandrine Blazy, Daniel Kästner, Bernhard Schommer, Markus Pister, and Christian Ferdinand. Compcert-a formally verified optimizing compiler. In ERTS 2016: Embedded Real Time Software and Systems, 8th European Congress, 2016
work page 2016
-
[22]
Jialin Li, Andrea Lattuada, Yi Zhou, Jonathan Cameron, Jon Howell, Bryan Parno, and Chris Hawblitzel. Linear types for large-scale systems verification.Proceed- ings of the ACM on Programming Languages, 6(OOPSLA1):1–28, 2022
work page 2022
-
[23]
Proof automation with large language models
Minghai Lu, Benjamin Delaware, and Tianyi Zhang. Proof automation with large language models. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, pages 1509–1520, 2024
work page 2024
-
[24]
Specgen: Automated generation of formal program specifications via large language models
Lezhi Ma, Shangqing Liu, Yi Li, Xiaofei Xie, and Lei Bu. Specgen: Automated generation of formal program specifications via large language models. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), pages 16–28. IEEE, 2025
work page 2025
-
[25]
The lean 4 theorem prover and programming language
Leonardo de Moura and Sebastian Ullrich. The lean 4 theorem prover and programming language. InInternational Conference on Automated Deduction, pages 625–635. Springer, 2021
work page 2021
-
[26]
Eric Mugnier, Emmanuel Anaya Gonzalez, Nadia Polikarpova, Ranjit Jhala, and Zhou Yuanyuan. Laurel: Unblocking automated verification with large language models.Proceedings of the ACM on Programming Languages, 9(OOPSLA1):1519– 1545, 2025
work page 2025
-
[27]
Tobias Nipkow, Markus Wenzel, and Lawrence C Paulson.Isabelle/HOL: a proof assistant for higher-order logic. Springer, 2002
work page 2002
-
[28]
Asterinas: A Linux ABI-Compatible, Rust-Based Framekernel OS with a Small and Sound TCB
Yuke Peng, Hongliang Tian, Zhang Junyang, Ruihan Li, Chengjun Chen, Jianfeng Jiang, Jinyi Xian, Xiaolin Wang, Chenren Xu, Diyu Zhou, et al. Asterinas: A Linux ABI-Compatible, Rust-Based Framekernel OS with a Small and Sound TCB. In2025 USENIX Annual Technical Conference (USENIX ATC 25), 2025
work page 2025
-
[29]
Pratap Singh, Joshua Gancher, and Bryan Parno. Owlc: Compiling security pro- tocols to verified, secure, high-performance libraries.Cryptology ePrint Archive, 2025
work page 2025
-
[30]
Anvil: Verifying liveness of cluster management controllers
Xudong Sun, Wenjie Ma, Jiawei Tyler Gu, Zicheng Ma, Tej Chajed, Jon Howell, Andrea Lattuada, Oded Padon, Lalith Suresh, Adriana Szekeres, et al. Anvil: Verifying liveness of cluster management controllers. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), pages 649–666, 2024
work page 2024
-
[31]
MIRAI Developement Team. Mirai: an abstract interpreter for the rust compiler’s mid-level intermediate representation (mir)., 2025. https://github.com/endorlabs/ MIRAI
work page 2025
-
[32]
The rocq proof assistant v9.1.0, 2025
The Rocq Development Team. The rocq proof assistant v9.1.0, 2025. https://rocq- prover.org
work page 2025
-
[33]
A verus compiler front-end for ides, 2026
The Verus Developement Team. A verus compiler front-end for ides, 2026. https://github.com/verus-lang/verus-analyzer
work page 2026
-
[34]
Rango: Adap- tive retrieval-augmented proving for automated software verification
Kyle Thompson, Nuno Saavedra, Pedro Carrott, Kevin Fisher, Alex Sanchez- Stern, Yuriy Brun, João F Ferreira, Sorin Lerner, and Emily First. Rango: Adap- tive retrieval-augmented proving for automated software verification. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), pages 347–359. IEEE, 2025
work page 2025
-
[35]
Verifying dynamic trait objects in rust
Alexa VanHattum, Daniel Schwartz-Narbonne, Nathan Chong, and Adrian Samp- son. Verifying dynamic trait objects in rust. InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice, pages 321– 330, 2022
work page 2022
-
[36]
Minghua Wang, Jingling Xue, Lin Huang, Yuan Zi, and Tao Wei. Unsafecop: Towards memory safety for real-world unsafe rust co de with p ractical bounded model checking. InInternational Symposium on Formal Methods, pages 307–324. Springer, 2024
work page 2024
-
[37]
Chenyuan Yang, Xuheng Li, Md Rakib Hossain Misu, Jianan Yao, Weidong Cui, Yeyun Gong, Chris Hawblitzel, Shuvendu Lahiri, Jacob R Lorch, Shuai Lu, et al. Autoverus: Automated proof generation for rust code.Proceedings of the ACM on Programming Languages, 9(OOPSLA2):3454–3482, 2025
work page 2025
-
[38]
CortenMM: Efficient Memory Management with Strong Correctness Guarantees
Junyang Zhang, Xiangcan Xu, Yonghao Zou, Zhe Tang, Xinyi Wan, Kang Hu, Siyuan Wang, Wenbo Xu, Di Wang, Hao Chen, Hongliang Tian, Lin Huang, Shoumeng Yan, Yuval Tamir, Yingwei Luo, Xiaolin Wang, Huashan Yu, Zhenlin Wang, and Diyu Zhou. CortenMM: Efficient Memory Management with Strong Correctness Guarantees. InProceedings of the ACM SIGOPS 31th Symposium o...
work page 2025
-
[39]
Litao Zhou, Jianxing Qin, Qinshi Wang, Andrew W Appel, and Qinxiang Cao. Vst-a: A foundationally sound annotation verifier.Proceedings of the ACM on Programming Languages (POPL), 8:2069–2098, 2024
work page 2069
-
[40]
VeriSMo: A verified security module for confidential VMs
Ziqiao Zhou, Weiteng Chen, Sishuai Gong, Chris Hawblitzel, Weidong Cui, et al. VeriSMo: A verified security module for confidential VMs. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), pages 599–614, 2024. A Proof Failures Caused by Incorrect Specifications While verifying the state machine of Asterinas’ RCU (Read-Copy- U...
work page 2024
-
[41]
We highlight the corrected lines added by human expert
]; 18} 19} Listing 7: Proof failure due to incorrect specification of protocol_lock_skip. We highlight the corrected lines added by human expert. Conference’17, July 2017, Washington, DC, USA Yuwei Liu, Xinyi Wan, Yanhao Wang, Minghua Wang, Lin Huang, and Tao Wei 1#[inductive(initialize)] 2fninitialize_inductive(post:Self, cpu_num:CpuId) { 3assert(post.wf...
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.