Documentation-Guided Agentic Codebase Migration from C to Rust

Anh Nguyen Hoang; Bach Le; Minh Le-Anh; Nghi D. Q. Bui

arxiv: 2605.14634 · v3 · pith:XP2EYFBBnew · submitted 2026-05-14 · 💻 cs.SE

Documentation-Guided Agentic Codebase Migration from C to Rust

Minh Le-Anh , Anh Nguyen Hoang , Bach Le , Nghi D. Q. Bui This is my paper

Pith reviewed 2026-05-20 21:14 UTC · model grok-4.3

classification 💻 cs.SE

keywords C to Rust migrationagentic code generationdocumentation-guided translationrepository-level migrationLLM-based code translationlegacy codebase modernizationRust memory safety

0 comments

The pith

Architecture-aware documentation guides agents to migrate entire C repositories to Rust

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces RustPrint, a framework that turns a C codebase into detailed architecture-aware documentation capturing module structure, data flow, APIs, and design rationale, then treats this documentation as a blueprint for coding agents. The agents use the blueprint to plan Rust crates, implement modules, verify compilation, reduce unsafe code, and refine the output by spotting mismatches when new documentation is generated from the Rust side and by running translated tests to catch runtime issues. Experiments on eight real C repositories from 11K to 84K lines show the system produces compilable Rust code for every target under both open-weight and closed-weight LLM backbones, where earlier translators fail entirely. A reader would care because legacy C code often needs memory safety improvements that manual migration cannot scale to large repositories.

Core claim

RustPrint first converts the source C repository into architecture-aware documentation that captures module structure, data flow, APIs, and design rationale. Coding agents then use this documentation as a blueprint to plan Rust crates, implement modules, check for compilability, reduce unsafe code, and iteratively refine the translated code. The system compares the documentation generated from the Rust output to the source documentation to identify mismatches for repair and also translates and runs the original test suites to guide fixes based on runtime failures.

What carries the argument

Architecture-aware documentation generated from the source repository, used as a migration blueprint that agents follow for planning, implementation, compilation checks, and repair via documentation mismatches and test failures.

If this is right

RustPrint produces compilable Rust code for every one of the eight tested C repositories under both open-weight and closed-weight LLM backbones.
With the Kimi-K2-Instruct backbone the system reaches 93.26 percent feature preservation and 95.17 percent cross-evaluation test pass rate, exceeding the agentic Claude Code baseline.
Prior LLM-based translators Self-Repair and EvoC2Rust fail to produce repository-wide compilable output on the same targets.
Documentation mismatches between source and translated versions, together with test-suite failures, supply targeted repair signals that improve the final Rust code.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same documentation-first coordination could be tried for other source-to-target language pairs if comparable architecture documentation can be extracted automatically.
Performance may vary with the fidelity of the initial documentation, so controlled tests that degrade the blueprint detail would reveal how much accuracy is required.
The repair loop that compares generated documentation and runs tests might apply to other agentic coding tasks such as large-scale refactoring or feature addition.

Load-bearing premise

The architecture-aware documentation generated from the source repository accurately captures module structure, data flow, APIs, and design rationale in sufficient detail to serve as an effective migration blueprint that agents can use for planning and repair.

What would settle it

Running the framework on one of the same repositories but with deliberately incomplete or inaccurate documentation and checking whether the agents still produce fully compilable, feature-preserving Rust code would test whether the blueprint quality is essential.

Figures

Figures reproduced from arXiv: 2605.14634 by Anh Nguyen Hoang, Bach Le, Minh Le-Anh, Nghi D. Q. Bui.

**Figure 2.** Figure 2: Per-repository feature preservation scores (%), comparing RustPrint to ClaudeCode under [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Feature preservation across refinement iterations (0–5) for RustPrint on the eight benchmark [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: SafeRate (A) and SafeRate (F) across translation methods and model backbones [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

read the original abstract

Migrating legacy C repositories to Rust promises stronger memory safety, but existing translators often work at the level of files or functions and miss architectural intent. We present RustPrint, a documentation-guided agentic framework for repository-level C-to-Rust migration. RustPrint first converts the source repository into architecture-aware documentation and treats it as a migration blueprint capturing module structure, data flow, APIs, and design rationale. Coding agents then use this blueprint to plan crates, implement modules, check compilability, reduce unsafe code, and iteratively refine the translated repository. RustPrint next compares documentation from the Rust output against the source documentation and uses mismatches as repair signals. It also translates and runs source test suites so runtime failures can guide targeted fixes. Experiments on eight real-world C repositories ranging from 11K to 84K LoC show that RustPrint compiles every target under both an open-weight (Kimi-K2-Instruct) and a closed-weight (GPT-5.4) backbone, while prior LLM-based translators (Self-Repair, EvoC2Rust) fail repository-wide. With the open-weight Kimi-K2-Instruct backbone, RustPrint exceeds an agentic Claude Code baseline on feature preservation (93.26% vs. 52.52%) and on cross-evaluation test pass rate (95.17% vs. 79.85%). These results suggest that documentation-guided coordination is a useful direction for scalable codebase migration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RustPrint centers LLM-generated architecture docs as the blueprint for agentic C-to-Rust migration and reports full compilation plus strong preservation metrics on eight real repos where prior translators fail.

read the letter

The core idea is straightforward: turn the source C repo into architecture-aware documentation that spells out modules, data flow, APIs, and rationale, then hand that document to coding agents as the main planning artifact. The agents plan crates, write code, run compilability checks, and fix mismatches by comparing the new Rust docs back to the original ones. They also port and run the test suite for runtime signals. On eight real C projects between 11k and 84k LoC the method compiles everything under both an open-weight model and GPT-5.4, while Self-Repair and EvoC2Rust do not. With the open model it also beats an agentic Claude baseline on feature preservation (93 % vs 52 %) and cross-eval test pass rate (95 % vs 80 %). That scale and the concrete head-to-head numbers are the clearest contribution so far in this line of work. The documentation step plus the mismatch-repair loop is presented as the distinguishing mechanism, and the abstract gives enough detail on the pipeline to see how it differs from file- or function-level translators. The experiments use real repositories and report both compilation success and two preservation metrics, which is more than most prior LLM migration papers manage. The main soft spot is the lack of any direct check on whether the generated documentation actually captures the important invariants and unsafe patterns. There is no precision/recall number against ground-truth headers, no human rating of rationale coverage, and no ablation that removes the documentation component. Without those, it remains possible that the gains come mainly from the extra repair iterations or the test-suite feedback rather than from the blueprint itself. The paper is aimed at people working on repository-scale automated migration and on agentic coding systems. The empirical results on actual codebases are solid enough to justify a referee's time, even if the documentation-fidelity question needs tightening in revision.

Referee Report

2 major / 2 minor

Summary. The paper introduces RustPrint, a documentation-guided agentic framework for repository-level C-to-Rust migration. It first generates architecture-aware documentation from the source C codebase to serve as a migration blueprint capturing module structure, data flow, APIs, and design rationale. LLM-based coding agents then use this blueprint to plan crates, implement modules, check compilability, reduce unsafe code, and iteratively refine the translation. The framework compares documentation from the Rust output against the source for repair signals and translates/runs source test suites to guide fixes. Experiments on eight real-world C repositories (11K–84K LoC) report that RustPrint achieves full compilation success under both Kimi-K2-Instruct and GPT-5.4 backbones, while prior methods (Self-Repair, EvoC2Rust) fail repository-wide; with the open-weight backbone it also outperforms an agentic Claude Code baseline on feature preservation (93.26% vs. 52.52%) and cross-evaluation test pass rate (95.17% vs. 79.85%).

Significance. If the results hold under scrutiny, the work provides empirical support for documentation-guided agentic coordination as a scalable approach to repository-wide migration, addressing a key limitation of prior file- or function-level translators that miss architectural intent. The evaluation on multiple large, real-world repositories and across open- and closed-weight LLM backbones is a clear strength, as is the use of concrete, multi-faceted metrics (compilation success, feature preservation, and runtime test pass rates) rather than synthetic benchmarks. These elements position the paper as a useful contribution to automated software migration and LLM-agent tooling in software engineering.

major comments (2)

[§3] §3 (Documentation Generation and Blueprint Usage): the central claim attributes repository-wide compilation and the 93.26% feature-preservation gain to the architecture-aware documentation serving as an effective migration blueprint, yet the manuscript reports no quantitative fidelity metric (e.g., API extraction precision/recall against ground-truth headers or human-rated coverage of cross-module invariants) and no ablation that removes the documentation component while retaining the agentic repair loops and test feedback.
[§4.2] §4.2 (Baseline Comparisons): the reported superiority over the agentic Claude Code baseline (93.26% vs. 52.52% feature preservation) does not specify whether the baseline was given equivalent access to source-derived architectural documentation or the same iterative repair and test-execution harness; without this control, the performance delta cannot be confidently attributed to the documentation-guided mechanism.

minor comments (2)

[Abstract] The abstract and §4 refer to “cross-evaluation test pass rate” without a concise definition or pointer to the exact protocol used to generate and execute the cross-evaluated test suites.
[§4] Figure captions and table headers in the experimental section would benefit from explicit column definitions (e.g., what “feature preservation” counts as a preserved feature) to improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment of the work's significance and for the constructive comments. We address each major comment below and describe the revisions we will make to strengthen the claims regarding the documentation blueprint and baseline controls.

read point-by-point responses

Referee: [§3] §3 (Documentation Generation and Blueprint Usage): the central claim attributes repository-wide compilation and the 93.26% feature-preservation gain to the architecture-aware documentation serving as an effective migration blueprint, yet the manuscript reports no quantitative fidelity metric (e.g., API extraction precision/recall against ground-truth headers or human-rated coverage of cross-module invariants) and no ablation that removes the documentation component while retaining the agentic repair loops and test feedback.

Authors: We agree that a quantitative fidelity metric for the generated documentation and an ablation isolating its contribution would provide stronger support for the central claim. In the revised manuscript we will add (i) precision/recall metrics for API and module-structure extraction against ground-truth headers on a representative subset of the eight repositories and (ii) an ablation in which the agentic loops and test-feedback harness operate without the architecture-aware documentation. These additions will be reported in an expanded §3 and §5. revision: yes
Referee: [§4.2] §4.2 (Baseline Comparisons): the reported superiority over the agentic Claude Code baseline (93.26% vs. 52.52% feature preservation) does not specify whether the baseline was given equivalent access to source-derived architectural documentation or the same iterative repair and test-execution harness; without this control, the performance delta cannot be confidently attributed to the documentation-guided mechanism.

Authors: The agentic Claude Code baseline was run with the identical iterative repair and test-execution harness used by RustPrint but without access to the source-derived architectural documentation. We will revise §4.2 to state this configuration explicitly and to clarify that the documentation blueprint is the sole differing component. If the referee considers an additional controlled run necessary, we can perform it in the revision. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation on external benchmarks

full rationale

The paper describes an agentic migration system (RustPrint) that generates architecture-aware documentation from C repositories and uses it to guide LLM agents for translation, compilation checking, and repair. All load-bearing claims rest on direct experimental measurements across eight real-world repositories (11K–84K LoC), including repository-wide compilation success, feature preservation (93.26%), and cross-evaluation test pass rates (95.17%), compared against independent baselines (Self-Repair, EvoC2Rust, agentic Claude Code). No equations, fitted parameters, self-citations, or uniqueness theorems are invoked to derive results; the architecture-aware documentation is treated as an input artifact whose effectiveness is measured externally rather than assumed by construction. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The approach rests primarily on the domain assumption that LLMs can reliably interpret and act on architecture-aware documentation for large-scale code changes; no free parameters or invented physical entities are introduced.

axioms (1)

domain assumption Large language models can effectively interpret architecture-aware documentation to coordinate repository-level code planning, implementation, and repair.
This assumption underpins the agentic use of the generated blueprint and is invoked throughout the framework description.

invented entities (1)

Architecture-aware documentation as migration blueprint no independent evidence
purpose: Captures module structure, data flow, APIs, and design rationale to guide agents in producing faithful Rust translations.
Core new artifact introduced by the framework; no independent evidence outside the paper's experiments is provided.

pith-pipeline@v0.9.0 · 5798 in / 1612 out tokens · 74214 ms · 2026-05-20T21:14:55.923131+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

RustPrint first converts the source repository into architecture-aware documentation and treats it as a migration blueprint capturing module structure, data flow, APIs, and design rationale.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Experiments on eight real-world C repositories ranging from 11K to 84K LoC

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.