arxiv: 2604.04527 · v1 · submitted 2026-04-06 · 💻 cs.SE · cs.AI· cs.PL

Recognition: no theorem link

ENCRUST: Encapsulated Substitution and Agentic Refinement on a Live Scaffold for Safe C-to-Rust Translation

HoHyun Sim , Hyeonjoong Cho , Ali Shokri , Zhoulai Fu , Binoy Ravindran

Authors on Pith no claims yet

Pith reviewed 2026-05-10 20:33 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.PL

keywords C-to-Rust translationmemory safetyLLM agentABI wrapperunsafe code reductionprogram migrationsoftware modernizationRust refactoring

0 comments

The pith

ENCRUST uses ABI-preserving wrappers and agentic refinement to translate real-world C programs to safe Rust while preserving test correctness and reducing unsafe constructs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a two-phase pipeline for converting complex C code into memory-safe Rust. The first phase wraps each function so that callers see the original interface while an LLM rewrites a safe inner version, allowing changes without coordinated updates elsewhere. The second phase deploys an LLM agent to fix any remaining unsafe elements that cross function boundaries, always checking the result against the original test suite. A sympathetic reader would care because large C codebases are difficult and risky to rewrite by hand, and leaving unsafe blocks in Rust still exposes programs to memory errors. If the pipeline succeeds as described, it provides a structured way to modernize legacy code with fewer safety compromises.

Core claim

ENCRUST decouples boundary adaptation from function logic via an ABI-preserving wrapper that splits each function into a caller-transparent shim retaining the original raw-pointer signature and a safe inner function targeted by the LLM with a clean prompt. A deterministic type-directed elimination pass then removes the wrappers after successful translation. Phase two uses an LLM agent operating on the whole integrated codebase under a baseline-aware verification gate to resolve remaining unsafe constructs such as static mut globals, skipped wrapper pairs, and failed translations. Evaluation on 7 GNU Coreutils programs and 8 libraries from the Laertes benchmark shows substantial unsafe-constr

What carries the argument

The ABI-preserving wrapper pattern that maintains original signatures for callers while exposing clean signatures to the LLM for safe inner functions, enabling independent per-function translation and rollback before whole-program agentic refinement.

If this is right

Functions can receive new type signatures independently without forcing updates to every caller.
Any per-function translation that fails triggers automatic rollback while leaving the rest of the project intact.
Unsafe constructs that span multiple units, such as static mutable globals, become addressable through agentic whole-program reasoning.
The final Rust output remains compilable under real dependency graphs and matches original behavior on all test vectors.
Safety gains apply uniformly to both utility programs and reusable libraries in the evaluated set.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The wrapper-and-elimination pattern could be reused for other interface-preserving refactorings even without an LLM.
Combining the verification gate with existing static analyzers might catch issues the current test vectors miss.
The approach could be tested on programs known to contain memory bugs to check whether the process removes the original defects.
Developers might inspect intermediate scaffold states to steer the agent when the automated path stalls on large codebases.

Load-bearing premise

The LLM will consistently generate code that is semantically equivalent to the original C and free of new memory-safety problems when guided by the wrapper prompts and verification gate.

What would settle it

A program that passes every original test vector after translation but is later shown by a tool such as Miri to contain a use-after-free or data race.

Figures

Figures reproduced from arXiv: 2604.04527 by Ali Shokri, Binoy Ravindran, HoHyun Sim, Hyeonjoong Cho, Zhoulai Fu.

**Figure 1.** Figure 1: Encrust two-phase translation pipeline. Phase 1 (left): for each function, the LLM generates a wrapper/safe-function pair verified through a compile-and-test loop; successfully translated pairs are collected into rust_safe/. After all functions are translated, type-directed wrapper elimination rewrites call sites to invoke safe functions directly, yielding a wrapper-free crate rust_safe_remap/. Phase 2 (ri… view at source ↗

read the original abstract

We present Encapsulated Substitution and Agentic Refinement on a Live Scaffold for Safe C-to-Rust Translation, a two-phase pipeline for translating real-world C projects to safe Rust. Existing approaches either produce unsafe output without memory-safety guarantees or translate functions in isolation, failing to detect cross-unit type mismatches or handle unsafe constructs requiring whole-program reasoning. Furthermore, function-level LLM pipelines require coordinated caller updates when type signatures change, while project-scale systems often fail to produce compilable output under real-world dependency complexity. Encrust addresses these limitations by decoupling boundary adaptation from function logic via an Application Binary Interface (ABI)-preserving wrapper pattern and validating each intermediate state against the integrated codebase. Phase 1 (Encapsulated Substitution) translates each function using an ABI-preserving wrapper that splits it into two components: a caller-transparent shim retaining the original raw-pointer signature, and a safe inner function targeted by the LLM with a clean, scope-limited prompt. This enables independent per-function type changes with automatic rollback on failure, without coordinated caller updates. A deterministic, type-directed wrapper elimination pass then removes wrappers after successful translation. Phase 2 (Agentic Refinement) resolves unsafe constructs beyond per-function scope, including static mut globals, skipped wrapper pairs, and failed translations, using an LLM agent operating on the whole codebase under a baseline-aware verification gate. We evaluate Encrust on 7 GNU Coreutils programs and 8 libraries from the Laertes benchmark, showing substantial unsafe-construct reduction across all 15 programs while maintaining full test-vector correctness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Encrust's ABI-wrapper decoupling plus agentic whole-program cleanup is a practical engineering move past isolated-function translators, though the safety story still rests on test passing rather than stronger verification.

read the letter

The main thing to know is that this paper describes a two-phase pipeline: first, per-function translation behind ABI-preserving wrappers that split each call into a raw-pointer shim and a safe inner target, then an agentic refinement step that cleans up globals, skipped cases, and failures across the whole codebase. The wrappers let them change types independently and roll back without touching callers, followed by a deterministic elimination pass. That setup directly targets the cross-unit and dependency problems that break simpler LLM pipelines on real projects. They show it on 7 Coreutils programs and 8 Laertes libraries, with big drops in unsafe constructs and all tests still passing. The design choices around the shim/inner split and the live-scaffold validation are the clearest advance over prior isolated or unsafe-output work. It handles the coordination headache that comes with signature changes in a straightforward way. The soft spot is exactly what the stress-test flags: test-vector correctness after refinement does not rule out semantic drift or new memory issues on unexercised paths. Real C code has pointer arithmetic and static state that tests often miss, and the abstract gives no details on Miri runs, differential testing, or failure-mode analysis. The baseline-aware gate sounds high-level, so we do not yet see mechanical proof that safety is enforced beyond compilation and the given tests. This is for people building or evaluating automated C-to-Rust migration tools. A reader who wants concrete patterns for LLM-assisted refactoring will find usable ideas here. It deserves a serious referee because the problem matters and the approach is grounded in actual limitations of earlier systems, even if the evaluation section will need more evidence on coverage and error rates before the safety claims land solidly.

Referee Report

1 major / 2 minor

Summary. The manuscript presents ENCRUST, a two-phase pipeline for safe C-to-Rust translation of real-world programs. Phase 1 (Encapsulated Substitution) employs ABI-preserving wrappers to decouple caller shims from LLM-targeted safe inner functions, enabling independent per-function translation and deterministic wrapper elimination. Phase 2 (Agentic Refinement) uses an LLM agent on the whole codebase to resolve remaining unsafe constructs (e.g., static mut globals) under a baseline-aware verification gate. Evaluation on 7 GNU Coreutils programs and 8 Laertes libraries reports substantial unsafe-construct reduction across all 15 programs while preserving full test-vector correctness.

Significance. If the results hold, this provides a practical systems contribution to automated C-to-Rust translation by addressing cross-unit type mismatches and whole-program unsafe constructs without requiring coordinated manual updates. The encapsulated wrapper pattern and live-scaffold validation are strengths that improve over isolated function translation or fully unsafe outputs in prior work. The empirical evaluation on established benchmarks like Coreutils and Laertes supplies concrete, reproducible evidence of applicability to non-trivial codebases.

major comments (1)

[Evaluation] Evaluation section: The central claim that the pipeline produces memory-safe, semantically equivalent Rust code rests on 'full test-vector correctness' after wrapper elimination and agentic refinement. However, the manuscript provides no quantitative details on test coverage (e.g., line/branch coverage percentages), differential testing against the original C, or use of tools such as Miri to detect latent undefined behavior or new memory-safety violations in unexercised paths. This is load-bearing because real-world C programs frequently contain pointer arithmetic and cross-unit interactions where test suites offer only partial coverage; passing tests alone does not rule out semantic drift introduced by the LLM phases.

minor comments (2)

[Abstract] The abstract and introduction could more explicitly quantify the unsafe-construct reduction (e.g., average percentage drop or per-program counts) rather than stating 'substantial' to strengthen the impact statement.
[Phase 1 description] Notation for the ABI wrapper components (shim vs. inner function) is introduced clearly in the text but would benefit from a small diagram or pseudocode listing in the Phase 1 description for readers unfamiliar with the pattern.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment of ENCRUST's contributions and the constructive major comment on evaluation. We address the concern directly below and will revise the manuscript to improve transparency on verification scope.

read point-by-point responses

Referee: [Evaluation] Evaluation section: The central claim that the pipeline produces memory-safe, semantically equivalent Rust code rests on 'full test-vector correctness' after wrapper elimination and agentic refinement. However, the manuscript provides no quantitative details on test coverage (e.g., line/branch coverage percentages), differential testing against the original C, or use of tools such as Miri to detect latent undefined behavior or new memory-safety violations in unexercised paths. This is load-bearing because real-world C programs frequently contain pointer arithmetic and cross-unit interactions where test suites offer only partial coverage; passing tests alone does not rule out semantic drift introduced by the LLM phases.

Authors: We agree that test-vector correctness alone provides only partial evidence of semantic equivalence and does not fully rule out latent issues in unexercised paths. The evaluation used the standard, extensive test suites of the GNU Coreutils and Laertes benchmarks, which are the established validation mechanisms for these programs. To strengthen the manuscript, we will revise the Evaluation section to report available coverage statistics from the benchmark documentation and add an explicit discussion of limitations, including the lack of Miri runs and differential testing. The live-scaffold validation and baseline-aware gate in Phase 2 already enforce per-step test passage against the original C, which mitigates some drift risk, but we will clarify that this does not replace broader static or dynamic analysis techniques. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical systems evaluation with independent test-based claims

full rationale

The paper describes a two-phase LLM-assisted C-to-Rust translation pipeline (Encapsulated Substitution followed by Agentic Refinement) and supports its claims solely through empirical evaluation on 15 external programs (7 GNU Coreutils + 8 Laertes libraries). No mathematical derivations, fitted parameters, self-referential definitions, or load-bearing self-citations appear in the provided text. The central result—unsafe-construct reduction while preserving test-vector correctness—is presented as an observed outcome of running the pipeline, not as a quantity derived from or equivalent to its own inputs by construction. The evaluation uses standard external benchmarks and test suites whose coverage properties are independent of the paper's method. This is a standard applied-systems paper whose validity rests on falsifiable experimental outcomes rather than any closed logical loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on the effectiveness of LLM prompting for code translation and the semantic preservation of the ABI wrapper pattern and elimination pass; no free parameters or invented entities are introduced.

axioms (1)

domain assumption LLM agents can reliably translate and refine C constructs to safe Rust when provided with ABI-preserving wrappers and baseline verification gates
The pipeline depends on LLM performance for both per-function substitution and whole-program fixes.

pith-pipeline@v0.9.0 · 5606 in / 1294 out tokens · 28245 ms · 2026-05-10T20:33:33.004219+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

24 extracted references · 13 canonical work pages · 1 internal anchor

[1]

Periklis Akritidis et al . 2010. Cling: A memory allocator to mitigate dangling pointers. In19th USENIX Security Symposium (USENIX Security 10)

2010
[2]

Xuemeng Cai, Jiakun Liu, Xiping Huang, Yijun Yu, Haitao Wu, Chunmiao Li, Bo Wang, Imam Nur Bani Yusuf, and Lingxiao Jiang. 2025. Rustmap: Towards project-scale c-to-rust migration via program analysis and llm. InInternational Conference on Engineering of Complex Computer Systems. Springer, 283–302

2025
[3]

Saman Dehghan, Tianran Sun, Tianxiang Wu, Zihan Li, and Reyhaneh Jabbarvand. 2025. Translating Large-Scale C Repositories to Idiomatic Rust.arXiv preprint arXiv:2511.20617(2025)

work page arXiv 2025
[4]

Mehmet Emre, Ryan Schroeder, Kyle Dewey, and Ben Hardekopf. 2021. Translating C to safer Rust.Proceedings of the ACM on Programming Languages5, OOPSLA (2021), 1–29

2021
[5]

Hasan Ferit Eniser, Hanliang Zhang, Cristina David, Meng Wang, Maria Christakis, Brandon Paulsen, Joey Dodds, and Daniel Kroening. 2024. Towards translating real-world code with llms: A study of translating to rust.arXiv preprint arXiv:2405.11514(2024)

work page arXiv 2024
[6]

Muhammad Farrukh, Smeet Shah, Baris Coskun, and Michalis Polychronakis. 2025. Safetrans: Llm-assisted transpilation from c to rust.arXiv preprint arXiv:2505.10708(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

Andrea Fioraldi, Dominik Maier, Heiko Eißfeldt, and Marc Heuse. 2020. {AFL++}: Combining incremental steps of fuzzing research. In14th USENIX workshop on offensive technologies (WOOT 20)

2020
[8]

Galois. 2018. C2Rust. https://galois.com/blog/2018/08/c2rust/

2018
[9]

Jaemin Hong and Sukyoung Ryu. 2025. Type-migrating C-to-Rust translation using a large language model.Empirical Software Engineering30, 1 (2025), 3

2025
[10]

Ralf Jung, Jacques-Henri Jourdan, Robbert Krebbers, and Derek Dreyer. 2018. RustBelt: Securing the Foundations of the Rust Programming Language.Proceedings of the ACM on Programming Languages2, POPL (2018), 66:1–66:34. doi:10.1145/3158154

work page doi:10.1145/3158154 2018
[11]

Michael Ling, Yijun Yu, Haitao Wu, Yuan Wang, James R Cordy, and Ahmed E Hassan. 2022. In rust we trust: a transpiler from unsafe c to safer rust. InProceedings of the ACM/IEEE 44th international conference on software engineering: companion proceedings. 354–355

2022
[12]

Yuchen Liu, Junhao Hu, Yingdi Shan, Ge Li, Yanzhen Zou, Yihong Dong, and Tao Xie. 2025. LLMigrate: Transforming" Lazy" Large Language Models into Efficient Source Code Migrators.arXiv preprint arXiv:2503.23791(2025)

work page arXiv 2025
[13]

Feng Luo, Kexing Ji, Cuiyun Gao, Shuzheng Gao, Jia Feng, Kui Liu, Xin Xia, and Michael R Lyu. 2025. Integrating Rules and Semantics for LLM-Based C-to-Rust Translation. In2025 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 685–696

2025
[14]

Matsakis and Felix S

Nicholas D. Matsakis and Felix S. Klock. 2014. The Rust Language. InProceedings of the 2014 ACM SIGAda Annual Conference on High Integrity Language Technology (HILT ’14). ACM, 103–104. doi:10.1145/2663171.2663188

work page doi:10.1145/2663171.2663188 2014
[15]

Vikram Nitin, Rahul Krishna, Luiz Lemos do Valle, and Baishakhi Ray. 2025. C2 SAFERRUST: Transforming C Projects into Safer Rust with NeuroSymbolic Techniques.IEEE Transactions on Software Engineering(2025)

2025
[16]

Oleksii Oleksenko, Dmitrii Kuvaiskii, Pramod Bhatotia, Pascal Felber, and Christof Fetzer. 2017. Intel MPX explained: An empirical study of intel MPX and software-based bounds checking approaches.arXiv preprint arXiv:1702.00719 (2017). Proc. ACM Program. Lang., Vol. 1, No. 1, Article . Publication date: April 2018. ENCRUST: Encapsulated Substitution and A...

work page arXiv 2017
[17]

Manish Shetty, Naman Jain, Adwait Godbole, Sanjit A Seshia, and Koushik Sen. 2024. Syzygy: Dual code-test c to (safe) rust translation using llms and dynamic analysis.arXiv preprint arXiv:2412.14234(2024)

work page arXiv 2024
[18]

Momoko Shiraishi, Yinzhi Cao, and Takahiro Shinagawa. 2024. SmartC2Rust: Iterative, Feedback-Driven C-to-Rust Translation via Large Language Models for Safety and Equivalence.arXiv preprint arXiv:2409.10506(2024)

work page arXiv 2024
[19]

László Szekeres, Mathias Payer, Tao Wei, and Dawn Song. 2013. SoK: Eternal War in Memory. InProceedings of the 2013 IEEE Symposium on Security and Privacy (SP ’13). IEEE Computer Society, 48–62. doi:10.1109/SP.2013.13

work page doi:10.1109/sp.2013.13 2013
[20]

Chaofan Wang, Tingrui Yu, Beijun Shen, Jie Wang, Dong Chen, Wenrui Zhang, Yuling Shi, Chen Xie, and Xiaodong Gu
[21]

Evoc2rust: A skeleton-guided framework for project-level c-to-rust translation.arXiv preprint arXiv:2508.04295 (2025)

work page arXiv 2025
[22]

Aidan ZH Yang, Yoshiki Takashima, Brandon Paulsen, Josiah Dodds, and Daniel Kroening. 2024. Vert: Verified equivalent rust transpilation with large language models as few-shot learners.arXiv preprint arXiv:2404.18852(2024)

work page arXiv 2024
[23]

Hanliang Zhang, Cristina David, Yijun Yu, and Meng Wang. 2023. Ownership guided C to Rust translation. In International Conference on Computer Aided Verification. Springer, 459–482

2023
[24]

Tianyang Zhou, Ziyi Zhang, Haowen Lin, Somesh Jha, Mihai Christodorescu, Kirill Levchenko, and Varun Chan- drasekaran. 2025. SACTOR: LLM-Driven Correct and Idiomatic C to Rust Translation with Static Analysis and FFI-Based Verification.arXiv preprint arXiv:2503.12511(2025). Received 20 February 2007; revised 12 March 2009; accepted 5 June 2009 Proc. ACM P...

work page arXiv 2025