pith. sign in

arxiv: 2605.03822 · v1 · submitted 2026-05-05 · 💻 cs.SE · cs.CR

KVerus: Scalable and Resilient Formal Verification Proof Generation for Rust Code

Pith reviewed 2026-05-07 15:47 UTC · model grok-4.3

classification 💻 cs.SE cs.CR
keywords formal verificationRustproof generationdynamic knowledge baseself-adaptive verificationcross-file dependencieslemma indexingproof repair
0
0 comments X

The pith

KVerus builds a dynamic knowledge base to generate and repair formal proofs for Rust code across changing dependencies and toolchains.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents KVerus as a retrieval-augmented system for Verus-based formal verification of Rust programs. It identifies the Semantic-Structural Gap as the core barrier that prevents standard language models from producing reliable proofs in real evolving codebases. KVerus counters this by automatically constructing and updating a knowledge base that combines program dependency analysis, semantic indexing of lemmas, and error-driven refinement loops. This lets the system synthesize proofs for cross-file dependencies and fix them when the code or verification environment changes. The approach is evaluated on both isolated functions and full repository modules, with additional results from applying it to an operating-system kernel.

Core claim

KVerus constructs a dynamic knowledge base of code metadata, lemma semantics, and toolchain specifics. By combining dependency-aware program analysis, semantic lemma indexing, and error-driven self-refinement, it navigates intricate cross-file dependencies to synthesize proofs and automatically repair proofs when faced with common evolutionary changes.

What carries the argument

Dynamic knowledge base built from dependency analysis, semantic lemma indexing, and error-driven self-refinement, which supports self-adaptive proof synthesis and repair across module boundaries.

If this is right

  • Verifies 80.2 percent of tasks on single-file benchmarks, compared with 56.9 percent for the prior AutoVerus system.
  • Reaches 51.0 percent success on repository-level benchmarks containing cross-file dependencies, versus 4.5 percent for multi-round prompting.
  • Produces upstream-accepted proofs for 23 previously unverified functions representing 21 percent of the proof code in a kernel memory-management module.
  • Degrades less than prior methods when the verification toolchain receives breaking updates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same dependency-aware indexing and repair loop could be reused with other formal verifiers that expose similar error messages and lemma libraries.
  • Teams maintaining verified libraries would spend less time on proof repair after routine refactors if the knowledge base is kept current.
  • Extending the base to track performance or security invariants alongside functional proofs could support combined assurance arguments in one workflow.

Load-bearing premise

The assumption that analysis of dependencies and past errors will keep producing a knowledge base sufficient to bridge semantic code patterns with rigid structural proof requirements even as modules and toolchains continue to evolve.

What would settle it

A sequence of cross-module code changes that introduce new structural dependencies not captured by the existing analysis or indexing, after which KVerus repeatedly fails to produce or repair correct proofs without manual lemma additions.

Figures

Figures reproduced from arXiv: 2605.03822 by Lin Huang, Minghua Wang, Tao Wei, Xinyi Wan, Yanhao Wang, Yuwei Liu.

Figure 1
Figure 1. Figure 1: The workflow of KVerus. 3.1 Preprocessor: Code and Lemma Preprocessing The Preprocessor module analyzes the source code of the target ver￾ification library to construct a metadata dependency graph and to index lemma functions for subsequent retrieval and proof synthesis. Metadata Dependency Graph. We build metadata dependency graph as a typed directed graph over Verus/Rust language objects. Each node is a … view at source ↗
Figure 2
Figure 2. Figure 2: Error types after the refinement view at source ↗
read the original abstract

Formal verification provides the highest assurance of software correctness and security, but its application to large-scale, evolving systems remains a major challenge. While large language models (LLMs) have shown promise in automating proof generation, they often fail in real-world settings due to their inability to handle complex cross-module dependencies or changes in the codebase or the verification toolchain. We identify the fundamental problem as the Semantic-Structural Gap: LLMs operate on semantic code patterns, whereas formal verification is governed by rigid structural dependencies, a disconnect that leads to brittle, unsustainable proofs. To bridge this gap, we propose a new paradigm of self-adaptive verification and present KVerus, a retrieval-augmented system for Verus-based Rust verification that can adapt to an evolving software environment. KVerus constructs a dynamic knowledge base of code metadata, lemma semantics, and toolchain specifics. By combining dependency-aware program analysis, semantic lemma indexing, and error-driven self-refinement, it can navigate intricate cross-file dependencies to synthesize proofs and automatically repair proofs when faced with common evolutionary changes. Across three single-file benchmarks, KVerus verifies 80.2% of tasks, outperforming the state-of-the-art AutoVerus (56.9%) and degrades less than AutoVerus under breaking Verus updates. On three repository-level benchmarks with cross-file dependencies, KVerus achieves a 51.0% success rate, compared to 4.5% for a multi-round prompting baseline. Finally, on the Asterinas Rust OS kernel, KVerus produces upstream-accepted proofs that verify 23 previously unverified functions (21.0% of proof code) in the memory-management module. KVerus represents a significant step towards making formal verification a scalable and sustainable practice for modern, security-critical software.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces KVerus, a retrieval-augmented system for Verus-based Rust verification. It constructs a dynamic knowledge base via dependency-aware program analysis, semantic lemma indexing, and error-driven self-refinement to address the Semantic-Structural Gap between LLM semantic patterns and rigid formal dependencies. The system is evaluated on three single-file benchmarks (80.2% success vs. 56.9% for AutoVerus), three repository-level benchmarks with cross-file dependencies (51.0% vs. 4.5% for multi-round prompting), and a case study on the Asterinas kernel where it produces 23 upstream-accepted proofs covering 21% of the memory-management module's proof code. It also shows reduced degradation under Verus toolchain updates.

Significance. If the reported results hold, KVerus represents a meaningful advance toward scalable, resilient formal verification for evolving Rust codebases. The concrete empirical gains over baselines, combined with the real-world contribution of upstream-accepted proofs to an OS kernel, provide tangible evidence of practical utility. The emphasis on automatic adaptation to cross-module changes and toolchain evolution is a notable strength, and the paper supplies implementation details and degradation results that support reproducibility of the core claims.

minor comments (3)
  1. The abstract and evaluation sections refer to 'three single-file benchmarks' and 'three repository-level benchmarks' without naming them or providing a high-level summary table of task counts and characteristics; adding this would improve immediate readability and allow readers to assess scope more quickly.
  2. The introduction of the 'Semantic-Structural Gap' is presented as a fundamental problem but lacks a concise formal characterization or pointer to related concepts in the literature; a short definitional paragraph early in the paper would clarify its scope.
  3. Figure captions and axis labels in the benchmark comparison plots could be expanded to explicitly state the number of tasks per benchmark and the exact baseline configurations used, reducing the need to cross-reference the text.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and constructive review, which accurately captures the core contributions of KVerus in addressing the Semantic-Structural Gap for scalable Rust verification. The recommendation for minor revision is appreciated, and we note the emphasis on empirical gains, real-world impact on the Asterinas kernel, and reproducibility. Since the report lists no specific major comments, we have no point-by-point rebuttals to provide. We will incorporate minor editorial improvements in the revised version to further enhance clarity and implementation details.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper's central claims consist of empirical success rates (80.2% on single-file benchmarks, 51.0% on repository-level tasks, and 23 upstream-accepted proofs in Asterinas) measured against external baselines and third-party code. No equations, fitted parameters, self-definitional constructs, or load-bearing self-citations appear in the derivation; the Semantic-Structural Gap is introduced as a descriptive framing rather than a formal premise that the results are forced to satisfy. Implementation details for dependency analysis, semantic indexing, and error-driven refinement are presented as independent engineering choices whose effectiveness is validated externally, leaving the reported outcomes non-tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests on the premise that LLMs guided by structural metadata can produce correct proofs and that the identified Semantic-Structural Gap is the dominant failure mode; no free parameters or new physical entities are introduced.

axioms (1)
  • domain assumption LLMs operating on semantic patterns can be made to respect rigid structural dependencies of formal verification when supplied with dependency metadata and error feedback
    This is the central bridging claim stated in the abstract as the solution to the Semantic-Structural Gap.
invented entities (1)
  • Semantic-Structural Gap no independent evidence
    purpose: Conceptual explanation for why standard LLM prompting fails on formal verification tasks involving cross-module dependencies and toolchain evolution
    Introduced in the abstract as the fundamental problem the system is designed to solve; no independent empirical test of the gap itself is described.

pith-pipeline@v0.9.0 · 5643 in / 1602 out tokens · 47653 ms · 2026-05-07T15:47:44.743899+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages

  1. [1]

    https://model-checking.github.io/ verify-rust-std

    Verify Rust Standard Library Effort, 2024. https://model-checking.github.io/ verify-rust-std

  2. [2]

    Alphaverus: Bootstrapping formally verified code generation through self-improving translation and tree- finement

    Pranjal Aggarwal, Bryan Parno, and Sean Welleck. Alphaverus: Bootstrapping formally verified code generation through self-improving translation and tree- finement. InForty-second International Conference on Machine Learning (ICML 2025), 2025

  3. [3]

    Claude Sonnet 4, 2025

    Anthropic. Claude Sonnet 4, 2025. https://www.anthropic.com/news/claude-4

  4. [4]

    The prusti project: Formal verification for rust

    Vytautas Astrauskas, Aurel Bíl`y, Jonáš Fiala, Zachary Grannan, Christoph Math- eja, Peter Müller, Federico Poli, and Alexander J Summers. The prusti project: Formal verification for rust. InNASA Formal Methods Symposium, pages 88–108. Springer, 2022

  5. [5]

    Springer Science & Business Media, 2013

    Yves Bertot and Pierre Castéran.Interactive theorem proving and program de- velopment: Coq’Art: the calculus of inductive constructions. Springer Science & Business Media, 2013

  6. [6]

    LangChain, October 2022

    Harrison Chase. LangChain, October 2022. https://github.com/langchain-ai/ langchain

  7. [7]

    Automated proof generation for rust code via self-evolution

    Tianyu Chen, Shuai Lu, Shan Lu, Yeyun Gong, Chenyuan Yang, Xuheng Li, Md Rakib Hossain Misu, Hao Yu, Nan Duan, Peng Cheng, et al. Automated proof generation for rust code via self-evolution. InThe Thirteenth International Conference on Learning Representations (ICLR 25), 2025

  8. [8]

    Atmosphere: Practical verified kernels with rust and verus

    Xiangdong Chen, Zhaofeng Li, Jerry Zhang, Vikram Narayanan, and Anton Burtsev. Atmosphere: Practical verified kernels with rust and verus. SOSP ’25, page 752–767, New York, NY, USA, 2025. Association for Computing Machinery

  9. [9]

    Z3: An efficient smt solver

    Leonardo De Moura and Nikolaj Bjørner. Z3: An efficient smt solver. InInter- national conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 337–340. Springer, 2008

  10. [10]

    Creusot: A foundry for the deductive verification of rust programs

    Xavier Denis, Jacques-Henri Jourdan, and Claude Marché. Creusot: A foundry for the deductive verification of rust programs. InInternational Conference on Formal Engineering Methods, pages 90–105. Springer, 2022

  11. [11]

    CertiKOS: An extensible architecture for building certified concurrent OS kernels

    Ronghui Gu, Zhong Shao, Hao Chen, Xiongnan Newman Wu, Jieung Kim, Vilhelm Sjöberg, and David Costanzo. CertiKOS: An extensible architecture for building certified concurrent OS kernels. In12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pages 653–669, 2016

  12. [12]

    Leaf: Modularity for temporary sharing in separation logic.Proceedings of the ACM on Programming Languages, 7(OOPSLA2):31–58, 2023

    Travis Hance, Jon Howell, Oded Padon, and Bryan Parno. Leaf: Modularity for temporary sharing in separation logic.Proceedings of the ACM on Programming Languages, 7(OOPSLA2):31–58, 2023

  13. [13]

    Shard- ing the state machine: Automated modular reasoning for complex concurrent systems

    Travis Hance, Yi Zhou, Andrea Lattuada, Reto Achermann, Alex Conway, Ryan Stutsman, Gerd Zellweger, Chris Hawblitzel, Jon Howell, and Bryan Parno. Shard- ing the state machine: Automated modular reasoning for complex concurrent systems. In17th USENIX Symposium on Operating Systems Design and Implemen- tation (OSDI 23), pages 911–929, 2023

  14. [14]

    Miri: Practical undefined behavior detection for rust

    Ralf Jung, Benjamin Kimock, Christian Poveda, Eduardo Sánchez Muñoz, Oli Scherer, and Qian Wang. Miri: Practical undefined behavior detection for rust. Proc. ACM Program. Lang., 10(POPL), January 2026

  15. [15]

    Iris: Monoids and invariants as an orthogonal basis for concurrent reasoning.ACM SIGPLAN Notices, 50(1):637–650, 2015

    Ralf Jung, David Swasey, Filip Sieczkowski, Kasper Svendsen, Aaron Turon, Lars Birkedal, and Derek Dreyer. Iris: Monoids and invariants as an orthogonal basis for concurrent reasoning.ACM SIGPLAN Notices, 50(1):637–650, 2015. KVerus: Scalable and Resilient Formal Verification Proof Generation for Rust Code Conference’17, July 2017, Washington, DC, USA

  16. [16]

    seL4: Formal verification of an OS kernel

    Gerwin Klein, Kevin Elphinstone, Gernot Heiser, June Andronick, David Cock, Philip Derrin, Dhammika Elkaduwe, Kai Engelhardt, Rafal Kolanski, Michael Norrish, et al. seL4: Formal verification of an OS kernel. InProceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, pages 207–220, 2009

  17. [17]

    Verus: A practical foundation for systems verification

    Andrea Lattuada, Travis Hance, Jay Bosamiya, Matthias Brun, Chanhee Cho, Hay- ley LeBlanc, Pranav Srinivasan, Reto Achermann, Tej Chajed, Chris Hawblitzel, et al. Verus: A practical foundation for systems verification. InProceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, pages 438–454, 2024

  18. [18]

    Verus: Verifying rust programs using linear ghost types.Proceedings of the ACM on Programming Languages, 7(OOPSLA1):286–315, 2023

    Andrea Lattuada, Travis Hance, Chanhee Cho, Matthias Brun, Isitha Subasinghe, Yi Zhou, Jon Howell, Bryan Parno, and Chris Hawblitzel. Verus: Verifying rust programs using linear ghost types.Proceedings of the ACM on Programming Languages, 7(OOPSLA1):286–315, 2023

  19. [19]

    Dafny: An automatic program verifier for functional correct- ness

    K Rustan M Leino. Dafny: An automatic program verifier for functional correct- ness. InInternational conference on logic for programming artificial intelligence and reasoning, pages 348–370. Springer, 2010

  20. [20]

    Trigger selection strategies to stabi- lize program verifiers

    K Rustan M Leino and Clément Pit-Claudel. Trigger selection strategies to stabi- lize program verifiers. InInternational Conference on Computer Aided Verification, pages 361–381. Springer, 2016

  21. [21]

    Compcert-a formally verified optimizing compiler

    Xavier Leroy, Sandrine Blazy, Daniel Kästner, Bernhard Schommer, Markus Pister, and Christian Ferdinand. Compcert-a formally verified optimizing compiler. In ERTS 2016: Embedded Real Time Software and Systems, 8th European Congress, 2016

  22. [22]

    Linear types for large-scale systems verification.Proceed- ings of the ACM on Programming Languages, 6(OOPSLA1):1–28, 2022

    Jialin Li, Andrea Lattuada, Yi Zhou, Jonathan Cameron, Jon Howell, Bryan Parno, and Chris Hawblitzel. Linear types for large-scale systems verification.Proceed- ings of the ACM on Programming Languages, 6(OOPSLA1):1–28, 2022

  23. [23]

    Proof automation with large language models

    Minghai Lu, Benjamin Delaware, and Tianyi Zhang. Proof automation with large language models. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, pages 1509–1520, 2024

  24. [24]

    Specgen: Automated generation of formal program specifications via large language models

    Lezhi Ma, Shangqing Liu, Yi Li, Xiaofei Xie, and Lei Bu. Specgen: Automated generation of formal program specifications via large language models. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), pages 16–28. IEEE, 2025

  25. [25]

    The lean 4 theorem prover and programming language

    Leonardo de Moura and Sebastian Ullrich. The lean 4 theorem prover and programming language. InInternational Conference on Automated Deduction, pages 625–635. Springer, 2021

  26. [26]

    Laurel: Unblocking automated verification with large language models.Proceedings of the ACM on Programming Languages, 9(OOPSLA1):1519– 1545, 2025

    Eric Mugnier, Emmanuel Anaya Gonzalez, Nadia Polikarpova, Ranjit Jhala, and Zhou Yuanyuan. Laurel: Unblocking automated verification with large language models.Proceedings of the ACM on Programming Languages, 9(OOPSLA1):1519– 1545, 2025

  27. [27]

    Springer, 2002

    Tobias Nipkow, Markus Wenzel, and Lawrence C Paulson.Isabelle/HOL: a proof assistant for higher-order logic. Springer, 2002

  28. [28]

    Asterinas: A Linux ABI-Compatible, Rust-Based Framekernel OS with a Small and Sound TCB

    Yuke Peng, Hongliang Tian, Zhang Junyang, Ruihan Li, Chengjun Chen, Jianfeng Jiang, Jinyi Xian, Xiaolin Wang, Chenren Xu, Diyu Zhou, et al. Asterinas: A Linux ABI-Compatible, Rust-Based Framekernel OS with a Small and Sound TCB. In2025 USENIX Annual Technical Conference (USENIX ATC 25), 2025

  29. [29]

    Owlc: Compiling security pro- tocols to verified, secure, high-performance libraries.Cryptology ePrint Archive, 2025

    Pratap Singh, Joshua Gancher, and Bryan Parno. Owlc: Compiling security pro- tocols to verified, secure, high-performance libraries.Cryptology ePrint Archive, 2025

  30. [30]

    Anvil: Verifying liveness of cluster management controllers

    Xudong Sun, Wenjie Ma, Jiawei Tyler Gu, Zicheng Ma, Tej Chajed, Jon Howell, Andrea Lattuada, Oded Padon, Lalith Suresh, Adriana Szekeres, et al. Anvil: Verifying liveness of cluster management controllers. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), pages 649–666, 2024

  31. [31]

    Mirai: an abstract interpreter for the rust compiler’s mid-level intermediate representation (mir)., 2025

    MIRAI Developement Team. Mirai: an abstract interpreter for the rust compiler’s mid-level intermediate representation (mir)., 2025. https://github.com/endorlabs/ MIRAI

  32. [32]

    The rocq proof assistant v9.1.0, 2025

    The Rocq Development Team. The rocq proof assistant v9.1.0, 2025. https://rocq- prover.org

  33. [33]

    A verus compiler front-end for ides, 2026

    The Verus Developement Team. A verus compiler front-end for ides, 2026. https://github.com/verus-lang/verus-analyzer

  34. [34]

    Rango: Adap- tive retrieval-augmented proving for automated software verification

    Kyle Thompson, Nuno Saavedra, Pedro Carrott, Kevin Fisher, Alex Sanchez- Stern, Yuriy Brun, João F Ferreira, Sorin Lerner, and Emily First. Rango: Adap- tive retrieval-augmented proving for automated software verification. In2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), pages 347–359. IEEE, 2025

  35. [35]

    Verifying dynamic trait objects in rust

    Alexa VanHattum, Daniel Schwartz-Narbonne, Nathan Chong, and Adrian Samp- son. Verifying dynamic trait objects in rust. InProceedings of the 44th International Conference on Software Engineering: Software Engineering in Practice, pages 321– 330, 2022

  36. [36]

    Unsafecop: Towards memory safety for real-world unsafe rust co de with p ractical bounded model checking

    Minghua Wang, Jingling Xue, Lin Huang, Yuan Zi, and Tao Wei. Unsafecop: Towards memory safety for real-world unsafe rust co de with p ractical bounded model checking. InInternational Symposium on Formal Methods, pages 307–324. Springer, 2024

  37. [37]

    Autoverus: Automated proof generation for rust code.Proceedings of the ACM on Programming Languages, 9(OOPSLA2):3454–3482, 2025

    Chenyuan Yang, Xuheng Li, Md Rakib Hossain Misu, Jianan Yao, Weidong Cui, Yeyun Gong, Chris Hawblitzel, Shuvendu Lahiri, Jacob R Lorch, Shuai Lu, et al. Autoverus: Automated proof generation for rust code.Proceedings of the ACM on Programming Languages, 9(OOPSLA2):3454–3482, 2025

  38. [38]

    CortenMM: Efficient Memory Management with Strong Correctness Guarantees

    Junyang Zhang, Xiangcan Xu, Yonghao Zou, Zhe Tang, Xinyi Wan, Kang Hu, Siyuan Wang, Wenbo Xu, Di Wang, Hao Chen, Hongliang Tian, Lin Huang, Shoumeng Yan, Yuval Tamir, Yingwei Luo, Xiaolin Wang, Huashan Yu, Zhenlin Wang, and Diyu Zhou. CortenMM: Efficient Memory Management with Strong Correctness Guarantees. InProceedings of the ACM SIGOPS 31th Symposium o...

  39. [39]

    Vst-a: A foundationally sound annotation verifier.Proceedings of the ACM on Programming Languages (POPL), 8:2069–2098, 2024

    Litao Zhou, Jianxing Qin, Qinshi Wang, Andrew W Appel, and Qinxiang Cao. Vst-a: A foundationally sound annotation verifier.Proceedings of the ACM on Programming Languages (POPL), 8:2069–2098, 2024

  40. [40]

    VeriSMo: A verified security module for confidential VMs

    Ziqiao Zhou, Weiteng Chen, Sishuai Gong, Chris Hawblitzel, Weidong Cui, et al. VeriSMo: A verified security module for confidential VMs. In18th USENIX Symposium on Operating Systems Design and Implementation (OSDI 24), pages 599–614, 2024. A Proof Failures Caused by Incorrect Specifications While verifying the state machine of Asterinas’ RCU (Read-Copy- U...

  41. [41]

    We highlight the corrected lines added by human expert

    ]; 18} 19} Listing 7: Proof failure due to incorrect specification of protocol_lock_skip. We highlight the corrected lines added by human expert. Conference’17, July 2017, Washington, DC, USA Yuwei Liu, Xinyi Wan, Yanhao Wang, Minghua Wang, Lin Huang, and Tao Wei 1#[inductive(initialize)] 2fninitialize_inductive(post:Self, cpu_num:CpuId) { 3assert(post.wf...