Recognition: no theorem link
ENCRUST: Encapsulated Substitution and Agentic Refinement on a Live Scaffold for Safe C-to-Rust Translation
Pith reviewed 2026-05-10 20:33 UTC · model grok-4.3
The pith
ENCRUST uses ABI-preserving wrappers and agentic refinement to translate real-world C programs to safe Rust while preserving test correctness and reducing unsafe constructs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ENCRUST decouples boundary adaptation from function logic via an ABI-preserving wrapper that splits each function into a caller-transparent shim retaining the original raw-pointer signature and a safe inner function targeted by the LLM with a clean prompt. A deterministic type-directed elimination pass then removes the wrappers after successful translation. Phase two uses an LLM agent operating on the whole integrated codebase under a baseline-aware verification gate to resolve remaining unsafe constructs such as static mut globals, skipped wrapper pairs, and failed translations. Evaluation on 7 GNU Coreutils programs and 8 libraries from the Laertes benchmark shows substantial unsafe-constr
What carries the argument
The ABI-preserving wrapper pattern that maintains original signatures for callers while exposing clean signatures to the LLM for safe inner functions, enabling independent per-function translation and rollback before whole-program agentic refinement.
If this is right
- Functions can receive new type signatures independently without forcing updates to every caller.
- Any per-function translation that fails triggers automatic rollback while leaving the rest of the project intact.
- Unsafe constructs that span multiple units, such as static mutable globals, become addressable through agentic whole-program reasoning.
- The final Rust output remains compilable under real dependency graphs and matches original behavior on all test vectors.
- Safety gains apply uniformly to both utility programs and reusable libraries in the evaluated set.
Where Pith is reading between the lines
- The wrapper-and-elimination pattern could be reused for other interface-preserving refactorings even without an LLM.
- Combining the verification gate with existing static analyzers might catch issues the current test vectors miss.
- The approach could be tested on programs known to contain memory bugs to check whether the process removes the original defects.
- Developers might inspect intermediate scaffold states to steer the agent when the automated path stalls on large codebases.
Load-bearing premise
The LLM will consistently generate code that is semantically equivalent to the original C and free of new memory-safety problems when guided by the wrapper prompts and verification gate.
What would settle it
A program that passes every original test vector after translation but is later shown by a tool such as Miri to contain a use-after-free or data race.
Figures
read the original abstract
We present Encapsulated Substitution and Agentic Refinement on a Live Scaffold for Safe C-to-Rust Translation, a two-phase pipeline for translating real-world C projects to safe Rust. Existing approaches either produce unsafe output without memory-safety guarantees or translate functions in isolation, failing to detect cross-unit type mismatches or handle unsafe constructs requiring whole-program reasoning. Furthermore, function-level LLM pipelines require coordinated caller updates when type signatures change, while project-scale systems often fail to produce compilable output under real-world dependency complexity. Encrust addresses these limitations by decoupling boundary adaptation from function logic via an Application Binary Interface (ABI)-preserving wrapper pattern and validating each intermediate state against the integrated codebase. Phase 1 (Encapsulated Substitution) translates each function using an ABI-preserving wrapper that splits it into two components: a caller-transparent shim retaining the original raw-pointer signature, and a safe inner function targeted by the LLM with a clean, scope-limited prompt. This enables independent per-function type changes with automatic rollback on failure, without coordinated caller updates. A deterministic, type-directed wrapper elimination pass then removes wrappers after successful translation. Phase 2 (Agentic Refinement) resolves unsafe constructs beyond per-function scope, including static mut globals, skipped wrapper pairs, and failed translations, using an LLM agent operating on the whole codebase under a baseline-aware verification gate. We evaluate Encrust on 7 GNU Coreutils programs and 8 libraries from the Laertes benchmark, showing substantial unsafe-construct reduction across all 15 programs while maintaining full test-vector correctness.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents ENCRUST, a two-phase pipeline for safe C-to-Rust translation of real-world programs. Phase 1 (Encapsulated Substitution) employs ABI-preserving wrappers to decouple caller shims from LLM-targeted safe inner functions, enabling independent per-function translation and deterministic wrapper elimination. Phase 2 (Agentic Refinement) uses an LLM agent on the whole codebase to resolve remaining unsafe constructs (e.g., static mut globals) under a baseline-aware verification gate. Evaluation on 7 GNU Coreutils programs and 8 Laertes libraries reports substantial unsafe-construct reduction across all 15 programs while preserving full test-vector correctness.
Significance. If the results hold, this provides a practical systems contribution to automated C-to-Rust translation by addressing cross-unit type mismatches and whole-program unsafe constructs without requiring coordinated manual updates. The encapsulated wrapper pattern and live-scaffold validation are strengths that improve over isolated function translation or fully unsafe outputs in prior work. The empirical evaluation on established benchmarks like Coreutils and Laertes supplies concrete, reproducible evidence of applicability to non-trivial codebases.
major comments (1)
- [Evaluation] Evaluation section: The central claim that the pipeline produces memory-safe, semantically equivalent Rust code rests on 'full test-vector correctness' after wrapper elimination and agentic refinement. However, the manuscript provides no quantitative details on test coverage (e.g., line/branch coverage percentages), differential testing against the original C, or use of tools such as Miri to detect latent undefined behavior or new memory-safety violations in unexercised paths. This is load-bearing because real-world C programs frequently contain pointer arithmetic and cross-unit interactions where test suites offer only partial coverage; passing tests alone does not rule out semantic drift introduced by the LLM phases.
minor comments (2)
- [Abstract] The abstract and introduction could more explicitly quantify the unsafe-construct reduction (e.g., average percentage drop or per-program counts) rather than stating 'substantial' to strengthen the impact statement.
- [Phase 1 description] Notation for the ABI wrapper components (shim vs. inner function) is introduced clearly in the text but would benefit from a small diagram or pseudocode listing in the Phase 1 description for readers unfamiliar with the pattern.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of ENCRUST's contributions and the constructive major comment on evaluation. We address the concern directly below and will revise the manuscript to improve transparency on verification scope.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: The central claim that the pipeline produces memory-safe, semantically equivalent Rust code rests on 'full test-vector correctness' after wrapper elimination and agentic refinement. However, the manuscript provides no quantitative details on test coverage (e.g., line/branch coverage percentages), differential testing against the original C, or use of tools such as Miri to detect latent undefined behavior or new memory-safety violations in unexercised paths. This is load-bearing because real-world C programs frequently contain pointer arithmetic and cross-unit interactions where test suites offer only partial coverage; passing tests alone does not rule out semantic drift introduced by the LLM phases.
Authors: We agree that test-vector correctness alone provides only partial evidence of semantic equivalence and does not fully rule out latent issues in unexercised paths. The evaluation used the standard, extensive test suites of the GNU Coreutils and Laertes benchmarks, which are the established validation mechanisms for these programs. To strengthen the manuscript, we will revise the Evaluation section to report available coverage statistics from the benchmark documentation and add an explicit discussion of limitations, including the lack of Miri runs and differential testing. The live-scaffold validation and baseline-aware gate in Phase 2 already enforce per-step test passage against the original C, which mitigates some drift risk, but we will clarify that this does not replace broader static or dynamic analysis techniques. revision: partial
Circularity Check
No circularity: empirical systems evaluation with independent test-based claims
full rationale
The paper describes a two-phase LLM-assisted C-to-Rust translation pipeline (Encapsulated Substitution followed by Agentic Refinement) and supports its claims solely through empirical evaluation on 15 external programs (7 GNU Coreutils + 8 Laertes libraries). No mathematical derivations, fitted parameters, self-referential definitions, or load-bearing self-citations appear in the provided text. The central result—unsafe-construct reduction while preserving test-vector correctness—is presented as an observed outcome of running the pipeline, not as a quantity derived from or equivalent to its own inputs by construction. The evaluation uses standard external benchmarks and test suites whose coverage properties are independent of the paper's method. This is a standard applied-systems paper whose validity rests on falsifiable experimental outcomes rather than any closed logical loop.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLM agents can reliably translate and refine C constructs to safe Rust when provided with ABI-preserving wrappers and baseline verification gates
Reference graph
Works this paper leans on
-
[1]
Periklis Akritidis et al . 2010. Cling: A memory allocator to mitigate dangling pointers. In19th USENIX Security Symposium (USENIX Security 10)
2010
-
[2]
Xuemeng Cai, Jiakun Liu, Xiping Huang, Yijun Yu, Haitao Wu, Chunmiao Li, Bo Wang, Imam Nur Bani Yusuf, and Lingxiao Jiang. 2025. Rustmap: Towards project-scale c-to-rust migration via program analysis and llm. InInternational Conference on Engineering of Complex Computer Systems. Springer, 283–302
2025
- [3]
-
[4]
Mehmet Emre, Ryan Schroeder, Kyle Dewey, and Ben Hardekopf. 2021. Translating C to safer Rust.Proceedings of the ACM on Programming Languages5, OOPSLA (2021), 1–29
2021
- [5]
-
[6]
Muhammad Farrukh, Smeet Shah, Baris Coskun, and Michalis Polychronakis. 2025. Safetrans: Llm-assisted transpilation from c to rust.arXiv preprint arXiv:2505.10708(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
Andrea Fioraldi, Dominik Maier, Heiko Eißfeldt, and Marc Heuse. 2020. {AFL++}: Combining incremental steps of fuzzing research. In14th USENIX workshop on offensive technologies (WOOT 20)
2020
-
[8]
Galois. 2018. C2Rust. https://galois.com/blog/2018/08/c2rust/
2018
-
[9]
Jaemin Hong and Sukyoung Ryu. 2025. Type-migrating C-to-Rust translation using a large language model.Empirical Software Engineering30, 1 (2025), 3
2025
-
[10]
Ralf Jung, Jacques-Henri Jourdan, Robbert Krebbers, and Derek Dreyer. 2018. RustBelt: Securing the Foundations of the Rust Programming Language.Proceedings of the ACM on Programming Languages2, POPL (2018), 66:1–66:34. doi:10.1145/3158154
-
[11]
Michael Ling, Yijun Yu, Haitao Wu, Yuan Wang, James R Cordy, and Ahmed E Hassan. 2022. In rust we trust: a transpiler from unsafe c to safer rust. InProceedings of the ACM/IEEE 44th international conference on software engineering: companion proceedings. 354–355
2022
- [12]
-
[13]
Feng Luo, Kexing Ji, Cuiyun Gao, Shuzheng Gao, Jia Feng, Kui Liu, Xin Xia, and Michael R Lyu. 2025. Integrating Rules and Semantics for LLM-Based C-to-Rust Translation. In2025 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 685–696
2025
-
[14]
Nicholas D. Matsakis and Felix S. Klock. 2014. The Rust Language. InProceedings of the 2014 ACM SIGAda Annual Conference on High Integrity Language Technology (HILT ’14). ACM, 103–104. doi:10.1145/2663171.2663188
-
[15]
Vikram Nitin, Rahul Krishna, Luiz Lemos do Valle, and Baishakhi Ray. 2025. C2 SAFERRUST: Transforming C Projects into Safer Rust with NeuroSymbolic Techniques.IEEE Transactions on Software Engineering(2025)
2025
-
[16]
Oleksii Oleksenko, Dmitrii Kuvaiskii, Pramod Bhatotia, Pascal Felber, and Christof Fetzer. 2017. Intel MPX explained: An empirical study of intel MPX and software-based bounds checking approaches.arXiv preprint arXiv:1702.00719 (2017). Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: April 2018. ENCRUST: Encapsulated Substitution and A...
- [17]
- [18]
-
[19]
László Szekeres, Mathias Payer, Tao Wei, and Dawn Song. 2013. SoK: Eternal War in Memory. InProceedings of the 2013 IEEE Symposium on Security and Privacy (SP ’13). IEEE Computer Society, 48–62. doi:10.1109/SP.2013.13
-
[20]
Chaofan Wang, Tingrui Yu, Beijun Shen, Jie Wang, Dong Chen, Wenrui Zhang, Yuling Shi, Chen Xie, and Xiaodong Gu
- [21]
- [22]
-
[23]
Hanliang Zhang, Cristina David, Yijun Yu, and Meng Wang. 2023. Ownership guided C to Rust translation. In International Conference on Computer Aided Verification. Springer, 459–482
2023
-
[24]
Tianyang Zhou, Ziyi Zhang, Haowen Lin, Somesh Jha, Mihai Christodorescu, Kirill Levchenko, and Varun Chan- drasekaran. 2025. SACTOR: LLM-Driven Correct and Idiomatic C to Rust Translation with Static Analysis and FFI-Based Verification.arXiv preprint arXiv:2503.12511(2025). Received 20 February 2007; revised 12 March 2009; accepted 5 June 2009 Proc. ACM P...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.