Raw Pointer Rewriting with LLMs for Translating C to Safer Rust

Chengpeng Wang; Mingwei Zheng; Pengxiang Huang; Xiangyu Zhang; Xuwei Liu; Yifei Gao

arxiv: 2505.04852 · v3 · submitted 2025-05-07 · 💻 cs.SE · cs.AI· cs.PL

Raw Pointer Rewriting with LLMs for Translating C to Safer Rust

Yifei Gao , Chengpeng Wang , Pengxiang Huang , Xuwei Liu , Mingwei Zheng , Xiangyu Zhang This is my paper

Pith reviewed 2026-05-22 15:32 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.PL

keywords raw pointer rewritingC to Rust translationLLM promptingmemory safetyunsafe code eliminationpointer liftingcode change analysistranspilation repair

0 comments

The pith

PR2 uses decision-tree LLM prompting to lift raw pointers to Rust structures, eliminating 18.57% of them in C-to-Rust translations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces PR2 to reduce unsafe raw pointers in Rust code generated from C by tools like C2RUST. It guides LLMs with decision-tree prompts to rewrite local pointers into native Rust constructs such as references or collections, then applies code change analysis to repair compilation and runtime errors. Tested on 28 real-world C projects using gpt-4o-mini, the method removes an average of 18.57% of local raw pointers at a cost of about five hours and one dollar per project. A reader would care because fewer raw pointers move the output closer to Rust's memory-safety guarantees while preserving functional behavior through automated repairs.

Core claim

PR2 employs decision-tree-based prompting to guide the pointer lifting process and leverages code change analysis to repair errors introduced during rewriting, effectively addressing errors encountered during compilation and test case execution, and it is shown that PR2 successfully eliminates 18.57% of local raw pointers across these projects.

What carries the argument

Decision-tree-based prompting for LLM-guided pointer lifting combined with code change analysis for error repair.

If this is right

Translated Rust programs contain fewer raw pointers and therefore fewer unsafe blocks.
The safety guarantees of C-to-Rust transpilation improve without manual pointer refactoring.
The automated process scales to multiple real-world projects at modest time and monetary cost.
The same repair-driven rewriting can target other unsafe Rust patterns beyond pointers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Integrating PR2 into C2RUST could produce safer default outputs for new translations.
The prompting and repair strategy might extend to lifting other C idioms that map poorly to safe Rust.
Combining the approach with static pointer analysis could raise the elimination rate above 18.57%.
Testing the same pipeline with different LLMs or larger projects would reveal scalability limits.

Load-bearing premise

LLM-generated pointer rewrites, once repaired via code-change analysis, produce functionally equivalent and safe Rust without introducing new memory-safety violations that are not caught by the existing test suites.

What would settle it

Running the rewritten Rust programs through exhaustive memory-safety checkers such as Miri on inputs outside the original test suites and detecting new pointer-related violations.

read the original abstract

There has been a growing interest in translating C code to Rust due to Rust's robust memory and thread safety guarantees. Tools such as C2RUST enable syntax-guided transpilation from C to semantically equivalent Rust code. However, the resulting Rust programs often rely heavily on unsafe constructs, particularly raw pointers, which undermines Rust's safety guarantees. This paper aims to improve the memory safety of Rust programs generated by C2RUST by eliminating raw pointers. Specifically, we propose a raw pointer rewriting technique that lifts raw pointers in individual functions to appropriate Rust data structures. Technically, PR2 employs decision-tree-based prompting to guide the pointer lifting process. It also leverages code change analysis to guide the repair of errors introduced during rewriting, effectively addressing errors encountered during compilation and test case execution. We implement PR2 and evaluate it using gpt-4o-mini on 28 real-world C projects. It is shown that PR2 successfully eliminates 18.57% of local raw pointers across these projects, significantly enhancing the safety of the translated Rust code. On average, PR2 completes the transformation of a project in 5.02 hours, at a cost of $1.13.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents PR2, a technique that applies decision-tree-based LLM prompting (using gpt-4o-mini) together with code-change analysis to lift raw pointers in C2Rust-translated Rust functions to safer Rust constructs. On 28 real-world C projects the method is reported to eliminate 18.57 % of local raw pointers while requiring on average 5.02 hours and $1.13 per project.

Significance. If the rewrites preserve functional equivalence and introduce no undetected memory-safety violations, the work supplies a concrete, costed pipeline that directly attacks the dominant source of unsafe code in C-to-Rust transpilation. The evaluation on external open-source projects and the explicit measurement of wall-clock time and monetary cost are strengths that would be useful to practitioners.

major comments (3)

[Evaluation] Evaluation section: functional equivalence and safety are asserted after compilation and execution of the original test suites, yet no Miri runs, AddressSanitizer/ThreadSanitizer instrumentation, or manual inspection of the rewritten functions are described. Any aliasing or use-after-free introduced by the LLM that is not exercised by the existing tests would remain undetected, directly undermining the central claim that safety is “significantly enhanced.”
[Results] Results paragraph reporting 18.57 %: the aggregate figure is given without per-project breakdown, standard deviation, or confidence interval. Because the 28 projects vary widely in size and pointer density, it is impossible to judge whether the reported improvement is consistent or driven by a few outliers.
[Approach] Approach section on decision-tree prompting: the strategy for constructing the decision tree and for mapping pointer-lifting choices to Rust types is described only at a high level. Without an explicit algorithm or example trace, it is difficult to assess reproducibility or to determine whether the prompting systematically avoids introducing new unsafe patterns.

minor comments (2)

[Abstract] Abstract: the phrase “significantly enhancing the safety” is not supported by any metric beyond raw-pointer count; a brief qualifier on the verification method would improve precision.
[Evaluation] Table or figure captions: several tables list project names and pointer counts but omit the total number of functions or lines of code per project, making it hard to normalize the 18.57 % figure.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment point by point below, indicating where revisions will be made to strengthen the paper.

read point-by-point responses

Referee: [Evaluation] Evaluation section: functional equivalence and safety are asserted after compilation and execution of the original test suites, yet no Miri runs, AddressSanitizer/ThreadSanitizer instrumentation, or manual inspection of the rewritten functions are described. Any aliasing or use-after-free introduced by the LLM that is not exercised by the existing tests would remain undetected, directly undermining the central claim that safety is “significantly enhanced.”

Authors: We agree that the current evaluation, which relies on successful compilation and passage of existing test suites, does not exhaustively rule out all possible memory-safety issues such as undetected aliasing or use-after-free errors. In the revised manuscript we will add Miri runs on the rewritten functions (where applicable) and report the results, providing stronger evidence for the safety improvements. revision: yes
Referee: [Results] Results paragraph reporting 18.57 %: the aggregate figure is given without per-project breakdown, standard deviation, or confidence interval. Because the 28 projects vary widely in size and pointer density, it is impossible to judge whether the reported improvement is consistent or driven by a few outliers.

Authors: We acknowledge that an aggregate statistic alone makes it difficult to assess consistency across heterogeneous projects. We will revise the Results section to include a per-project breakdown of pointer elimination rates, the standard deviation, and a confidence interval for the mean improvement. revision: yes
Referee: [Approach] Approach section on decision-tree prompting: the strategy for constructing the decision tree and for mapping pointer-lifting choices to Rust types is described only at a high level. Without an explicit algorithm or example trace, it is difficult to assess reproducibility or to determine whether the prompting systematically avoids introducing new unsafe patterns.

Authors: The decision tree is built by classifying pointer usage patterns (e.g., array access, struct fields, mutability) and mapping them to Rust constructs such as references or Box. We will expand the Approach section with an explicit algorithm description and an example prompting trace to improve reproducibility and demonstrate how unsafe patterns are avoided. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results measured directly on external projects

full rationale

The paper's central result is an empirical measurement: PR2 eliminates 18.57% of local raw pointers across 28 real-world C projects, obtained by running the implemented technique (decision-tree prompting plus code-change repair) on external open-source codebases and counting the observed changes. No derivation chain reduces a claimed prediction or first-principles result to its own inputs by construction; there are no equations, fitted parameters renamed as predictions, self-definitional loops, or load-bearing self-citations that justify the headline outcome. The reported percentage and runtime/cost figures are direct observations from the evaluation procedure, which can be reproduced on the same projects without reference to any internal definition of the result itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the empirical capability of current LLMs to perform correct semantic rewrites when guided by a decision tree; no new mathematical constants or entities are introduced.

axioms (1)

domain assumption LLMs prompted with decision trees can reliably identify and lift raw pointers to equivalent safe Rust structures
The entire rewriting pipeline depends on this unproven but central modeling assumption about LLM behavior.

pith-pipeline@v0.9.0 · 5755 in / 1203 out tokens · 41656 ms · 2026-05-22T15:32:58.027186+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PR2 employs decision-tree-based prompting to guide the pointer lifting process... eliminates 18.57% of local raw pointers
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

leverages code change analysis to guide the repair of errors

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

ORBIT: Guided Agentic Orchestration for Autonomous C-to-Rust Transpilation
cs.SE 2026-04 unverdicted novelty 6.0

ORBIT achieves 100% compilation success and 91.7% test success on 24 mostly large programs from CRUST-Bench by using dependency-aware orchestration and iterative verification, outperforming prior static and baseline tools.
Project-Level C-to-Rust Translation via Pointer Knowledge Graphs
cs.SE 2025-10 unverdicted novelty 6.0

PtrTrans builds a Pointer Knowledge Graph with points-to flows, struct abstractions, and Rust annotations to guide LLMs toward project-level C-to-Rust translations that cut unsafe code by 99.9% and raise functional co...
Dependency-Guided Repository-Level C-to-Rust Translation with Reinforcement Alignment
cs.SE 2026-04 unverdicted novelty 5.0

DepTrans translates entire C repositories to Rust at 60.7% compilation success and 43.5% functional accuracy by combining reinforcement-aligned syntax training with dependency-guided iterative refinement.