pith. sign in

arxiv: 2505.10708 · v2 · submitted 2025-05-15 · 💻 cs.CR · cs.SE

SafeTrans: LLM-assisted Transpilation from C to Rust

Pith reviewed 2026-05-22 14:09 UTC · model grok-4.3

classification 💻 cs.CR cs.SE
keywords LLMtranspilationC to Rustiterative repairfew-shot promptingmemory safetyvulnerability persistencecode generation
0
0 comments X

The pith

LLMs can transpile C code to Rust at 80 percent success when an iterative repair loop supplies error-specific context and example fixes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether large language models can automate the conversion of existing C programs into memory-safe Rust code. It introduces a framework that first produces an initial translation and then repeatedly prompts the model with details about each compilation or runtime error plus short example fixes for that error category. This guided repair raises the success rate from 54 percent to 80 percent on the strongest model across more than 2,600 test programs and two real projects. The work also tracks security implications and finds that several classes of C vulnerabilities survive in the generated Rust. A sympathetic reader cares because manual porting of large C codebases is slow and error-prone, so any reliable automation could accelerate the shift to safer systems languages.

Core claim

SafeTrans demonstrates that an LLM-based transpiler equipped with a few-shot guided repair loop can convert C programs into compilable and runnable Rust in the majority of cases. The loop works by feeding the model contextual error information and one or more concrete code examples that illustrate the correct resolution for each error type, allowing successive corrections without unbounded iteration. On 2,653 C programs the best model reaches 80 percent successful translations after repair, up from 54 percent without it, and the same pattern holds for two larger real-world C projects. The study additionally reports that common C vulnerability patterns, such as buffer issues, continue to be,

What carries the argument

The few-shot guided repair technique, which augments each repair prompt with error-type context plus example code snippets that show the proper fix for that specific error category.

If this is right

  • Iterative repair produces large gains in successful translations for the strongest models but smaller gains for weaker ones.
  • Certain C vulnerabilities such as buffer overflows translate directly into the generated Rust code.
  • The framework succeeds on both synthetic test suites and actual open-source C projects.
  • Different LLMs exhibit distinct baseline translation quality and different improvement curves under repair.
  • The same error-guided prompting pattern can be reused with any code-generating LLM.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the repair examples can be generated automatically from error logs rather than hand-written, the whole process could scale to millions of lines of legacy code without manual curation.
  • The survival of vulnerabilities indicates that a post-translation static analysis or security-focused prompt stage would be a natural next addition.
  • The technique may transfer to other language pairs where compiler errors provide reliable, structured feedback for iterative correction.
  • Measuring not only compilation success but also runtime behavior equivalence on larger benchmarks would clarify how much functional fidelity is preserved.

Load-bearing premise

Supplying contextual information and a few example code snippets for each error type will reliably steer the LLM to correct, non-regressive fixes without introducing new errors or requiring unbounded iterations.

What would settle it

Apply the identical repair loop and example set to an independent collection of several thousand fresh C programs and measure whether the final success rate stays near 80 percent or falls sharply while tracking whether the same vulnerability classes persist.

Figures

Figures reproduced from arXiv: 2505.10708 by Baris Coskun, Michalis Polychronakis, Muhammad Farrukh, Tapti Palit.

Figure 1
Figure 1. Figure 1: High-level architecture of SafeTrans’ transpilation, compilation, repair, and validation pipeline. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Percentage of correct Rust translations out of 2,653 [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Final translation success rate and error breakdown [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

Rust is a strong contender for a memory-safe alternative to C as a "systems" language, but porting the vast amount of existing C code to Rust remains daunting. In this paper, we evaluate the potential of large language models (LLMs) to automate the transpilation of C code to idiomatic Rust. We present SafeTrans, a generic framework that leverages LLMs to i) transpile C code into Rust, and ii) iteratively repair compilation and runtime errors. A key novelty of our approach is a few-shot guided repair technique for translation errors, which provides contextual information and example code snippets for specific error types, guiding the LLM toward the correct solution. Another novel aspect of our work is the evaluation of the security implications of the transpilation process, showing how some vulnerability classes in C persist in the translated Rust code. SafeTrans was evaluated with six leading LLMs on 2,653 C programs and two real-world C projects. Our results show that iterative repair improves the rate of successful translations from 54% to 80% for the best-performing LLM (gpt-4o).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces SafeTrans, a framework that uses LLMs to transpile C programs to Rust and then applies iterative repair of compilation and runtime errors via few-shot prompts supplying error-specific context and code examples. It evaluates the approach on 2,653 C programs plus two real-world projects using six LLMs, reports that iterative repair raises successful translation rates from 54% to 80% for gpt-4o, and examines how certain C vulnerability classes can persist after translation.

Significance. If the empirical results hold, the work provides concrete evidence that LLM-based iterative repair can meaningfully improve automated C-to-Rust transpilation at scale. The large program corpus, multi-LLM comparison, use of standard compilation/runtime oracles, and security analysis together offer a useful data point on both the promise and the limitations of LLM-assisted migration to memory-safe languages.

major comments (1)
  1. [Evaluation / Results] The headline result (54% → 80% for gpt-4o) rests on the iterative repair loop. The manuscript reports only final success rates and does not quantify average or maximum iterations per program, convergence behavior, or the frequency of regressions in which a repair introduces a new error type outside the supplied few-shot examples. This information is needed to evaluate whether the repair process remains practical under realistic iteration caps.
minor comments (2)
  1. [Evaluation] Clarify whether the 2,653 programs include only synthetic benchmarks or also subsets of the two real-world projects, and ensure table captions and text use identical program counts.
  2. [Security Analysis] The security section would benefit from a short table enumerating the vulnerability classes examined and the fraction that remained after translation for each LLM.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of our work. We address the major comment below and will incorporate additional analysis of the iterative repair process in the revised manuscript.

read point-by-point responses
  1. Referee: [Evaluation / Results] The headline result (54% → 80% for gpt-4o) rests on the iterative repair loop. The manuscript reports only final success rates and does not quantify average or maximum iterations per program, convergence behavior, or the frequency of regressions in which a repair introduces a new error type outside the supplied few-shot examples. This information is needed to evaluate whether the repair process remains practical under realistic iteration caps.

    Authors: We agree that quantifying the behavior of the iterative repair loop would provide a more complete evaluation of practicality. In the revised manuscript we will add a dedicated subsection (and associated figures/tables) reporting the average and maximum number of repair iterations required for successful translations across the full corpus for each of the six LLMs. We will also present cumulative success-rate curves to illustrate convergence behavior. For regressions, we will analyze our experimental logs to report the frequency with which a repair step introduces a new error category outside the supplied few-shot examples and discuss how the error-specific prompting strategy limits such regressions. These additions will be based on data already collected during our experiments. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical success rates measured against external oracles

full rationale

The paper reports empirical measurements of transpilation success rates (54% to 80% with iterative repair) obtained by running SafeTrans on 2,653 C programs and two real-world projects using external LLMs (gpt-4o and others) and standard compilation/runtime oracles. No mathematical derivations, equations, or first-principles claims exist that could reduce to self-definitions or fitted inputs. The few-shot repair technique is described as a prompting strategy evaluated directly against observable error fixes, with no self-citation chains or uniqueness theorems invoked to justify core results. The evaluation is self-contained and independently falsifiable via the same external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework depends on the domain assumption that LLMs respond productively to error-specific few-shot examples; no new physical or mathematical entities are postulated and no parameters are fitted to the success metric itself.

axioms (1)
  • domain assumption LLMs can be guided to produce correct code repairs when given targeted examples and context for specific error categories
    This premise underpins the iterative repair loop and is invoked to explain the jump from 54% to 80% success.

pith-pipeline@v0.9.0 · 5731 in / 1287 out tokens · 64731 ms · 2026-05-22T14:09:00.342214+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ORBIT: Guided Agentic Orchestration for Autonomous C-to-Rust Transpilation

    cs.SE 2026-04 unverdicted novelty 6.0

    ORBIT achieves 100% compilation success and 91.7% test success on 24 mostly large programs from CRUST-Bench by using dependency-aware orchestration and iterative verification, outperforming prior static and baseline tools.

  2. ENCRUST: Encapsulated Substitution and Agentic Refinement on a Live Scaffold for Safe C-to-Rust Translation

    cs.SE 2026-04 unverdicted novelty 6.0

    ENCRUST decouples C-to-Rust translation via ABI wrappers and agentic refinement to reduce unsafe constructs across 15 real programs while preserving full test correctness.

  3. Project-Level C-to-Rust Translation via Pointer Knowledge Graphs

    cs.SE 2025-10 unverdicted novelty 6.0

    PtrTrans builds a Pointer Knowledge Graph with points-to flows, struct abstractions, and Rust annotations to guide LLMs toward project-level C-to-Rust translations that cut unsafe code by 99.9% and raise functional co...

  4. Search-Based Multi-Trajectory Refinement for Safe C-to-Rust Translation with Large Language Models

    cs.PL 2025-05 unverdicted novelty 5.0

    LAC2R uses MCTS to systematically explore multiple LLM refinement trajectories for C-to-Rust translation and reports superior safety and correctness on small-scale benchmarks.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · cited by 4 Pith papers · 3 internal anchors

  1. [1]

    [n. d.]. An AVL Tree Implementation In C. https://github.com/xieqing/avl- tree/tree/master

  2. [2]

    [n. d.]. url.h. https://github.com/jwerle/url.h/tree/master

  3. [3]

    Artisan-Lab. 2025. RAPx: Rust Analysis Platform. https://github.com/Artisan- Lab/RAPx. https://github.com/Artisan-Lab/RAPx Accessed: 2025-05-13

  4. [4]

    Yechan Bae, Youngsuk Kim, Ammar Askar, Jungwon Lim, and Taesoo Kim. 2021. Rudra: finding memory safety bugs in Rust at the ecosystem scale. InProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles. 84–99

  5. [5]

    Yubo Bai and Tapti Palit. 2025. RustAssure: Differential Symbolic Testing for LLM-Transpiled C-to-Rust Code.arXiv preprint arXiv:2510.07604(2025)

  6. [6]

    Efe Bozkir, Süleyman Özdel, Ka Hei Carrie Lau, Mengdi Wang, Hong Gao, and Enkelejda Kasneci. 2024. Embedding large language models into extended re- ality: Opportunities and challenges for inclusion, engagement, and privacy. In Proceedings of the 6th ACM Conference on Conversational User Interfaces. 1–7

  7. [7]

    Pantazis Deligiannis, Akash Lal, Nikita Mehrotra, and Aseem Rastogi. 2023. Fixing Rust compilation errors using LLMs.arXiv preprint arXiv:2308.05177(2023)

  8. [8]

    Mehmet Emre, Ryan Schroeder, Kyle Dewey, and Ben Hardekopf. 2021. Trans- lating C to Safer Rust.Proc. ACM Program. Lang.5, OOPSLA, Article 121 (oct 2021)

  9. [9]

    Hasan Ferit Eniser, Hanliang Zhang, Cristina David, Meng Wang, Maria Chris- takis, Brandon Paulsen, Joey Dodds, and Daniel Kroening. 2024. Towards trans- lating real-world code with LLMs: A study of translating to Rust.arXiv preprint arXiv:2405.11514(2024)

  10. [10]

    Mikhail R Gadelha, Felipe R Monteiro, Jeremy Morse, Lucas C Cordeiro, Bernd Fischer, and Denis A Nicole. 2018. ESBMC 5.0: an industrial-strength C model checker. InProceedings of the 33rd ACM/IEEE International Conference on Auto- mated Software Engineering. 888–891

  11. [11]

    Jaemin Hong. 2023. Improving Automatic C-to-Rust Translation with Static Analysis. InProceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE). 273–277

  12. [12]

    Jaemin Hong and Sukyoung Ryu. 2023. Concrat: An automatic C-to-Rust lock API translator for concurrent programs. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 716–728

  13. [13]

    Jaemin Hong and Sukyoung Ryu. 2024. Don’t Write, but Return: Replacing Output Parameters with Algebraic Data Types in C-to-Rust Translation.Proceedings of the ACM on Programming Languages8, PLDI (2024), 716–740

  14. [14]

    Jaemin Hong and Sukyoung Ryu. 2024. To Tag, or Not to Tag: Translating C’s Unions to Rust’s Tagged Unions. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering. 40–52

  15. [15]

    Rasha Ahmad Husein, Hala Aburajouh, and Cagatay Catal. 2025. Large language models for code completion: A systematic literature review.Computer Standards & Interfaces92 (2025), 103917. doi:10.1016/j.csi.2024.103917

  16. [16]

    Ali Reza Ibrahimzada, Kaiyao Ke, Mrigank Pawagi, Muhammad Salman Abid, Rangeet Pan, Saurabh Sinha, and Reyhaneh Jabbarvand. 2025. AlphaTrans: A Neuro-Symbolic Compositional Approach for Repository-Level Code Translation and Validation.Proceedings of the ACM on Software Engineering2, FSE (2025), 2454–2476

  17. [17]

    Ali Reza Ibrahimzada, Brandon Paulsen, Reyhaneh Jabbarvand, Joey Dodds, and Daniel Kroening. 2025. MatchFixAgent: Language-Agnostic Autonomous Repository-Level Code Translation Validation and Repair.arXiv preprint arXiv:2509.16187(2025)

  18. [18]

    Immunant. 2022. C2Rust. https://github.com/immunant/c2rust

  19. [19]

    Hamed Jelodar, Mohammad Meymani, and Roozbeh Razavi-Far. 2025. Large Language Models (LLMs) for Source Code Analysis: applications, models and datasets.arXiv preprint arXiv:2503.17502(2025)

  20. [20]

    Yoonsang Kim, Zainab Aamir, Mithilesh Singh, Saeed Boorboor, Klaus Mueller, and Arie E. Kaufman. 2025. Explainable XR: Understanding User Behaviors of XR Environments using LLM-assisted Analytics Framework.IEEE Transactions on Visualization and Computer Graphics(2025)

  21. [21]

    Ruishi Li, Bo Wang, Tianyu Li, Prateek Saxena, and Ashish Kundu. 2024. Trans- lating C to Rust: Lessons from a user study.arXiv preprint arXiv:2411.14174 (2024)

  22. [22]

    Tianyu Li, Ruishi Li, Bo Wang, Brandon Paulsen, Umang Mathur, and Prateek Saxena. 2025. Adversarial Agent Collaboration for C to Rust Translation.arXiv preprint arXiv:2510.03879(2025)

  23. [23]

    Tianle Li, Ge Zhang, Quy Duc Do, Xiang Yue, and Wenhu Chen. 2024. Long-context LLMs struggle with long in-context learning.arXiv preprint arXiv:2404.02060(2024)

  24. [24]

    Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

    Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics12 (02 2024), 157–

  25. [25]

    doi:10.1162/tacl_a_00638 arXiv:https://direct.mit.edu/tacl/article- pdf/doi/10.1162/tacl_a_00638/2336043/tacl_a_00638.pdf

  26. [26]

    Muhammad Muzammil, Abisheka Pitumpe, Xigao Li, Amir Rahmati, and Nick Nikiforakis. 2025. The Poorest Man in Babylon: A Longitudinal Study of Cryp- tocurrency Investment Scams. InProceedings of The Web Conference (WWW)

  27. [27]

    Vikram Nitin, Rahul Krishna, and Baishakhi Ray. 2024. Spectra: Enhancing the code translation ability of language models by generating multi-modal specifica- tions.arXiv preprint arXiv:2405.18574(2024)

  28. [28]

    Vikram Nitin, Rahul Krishna, Luiz Lemos do Valle, and Baishakhi Ray. 2025. C2SaferRust: Transforming C Projects into Safer Rust with NeuroSymbolic Tech- niques.arXiv preprint arXiv:2501.14257(2025)

  29. [29]

    Guangsheng Ou, Mingwei Liu, Yuxuan Chen, Xin Peng, and Zibin Zheng. 2024. Repository-level code translation benchmark targeting Rust.arXiv preprint arXiv:2411.13990(2024)

  30. [30]

    Rangeet Pan, Ali Reza Ibrahimzada, Rahul Krishna, Divya Sankar, Lam- bert Pouguem Wassi, Michele Merler, Boris Sobolev, Raju Pavuluri, Saurabh Sinha, and Reyhaneh Jabbarvand. 2024. Lost in translation: A study of bugs introduced by large language models while translating code. InProceedings of the IEEE/ACM 46th International Conference on Software Enginee...

  31. [31]

    Ruchir Puri, David S Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, Julian Dolby, Jie Chen, Mihir Choudhury, Lindsey Decker, et al

  32. [32]

    Codenet: A large-scale AI for code dataset for learning a diversity of coding tasks.arXiv preprint arXiv:2105.12655(2021)

  33. [33]

    Baptiste Roziere, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lample

  34. [34]

    InAdvances in Neural Information Processing Systems (NeurIPS), Vol

    Unsupervised Translation of Programming Languages. InAdvances in Neural Information Processing Systems (NeurIPS), Vol. 33. 20601–20611

  35. [35]

    Ranjan Sapkota, Shaina Raza, Maged Shoman, Achyut Paudel, and Manoj Karkee

  36. [36]

    Image, Text, and Speech Data Augmentation using Multimodal LLMs for Deep Learning: A Survey.arXiv preprint arXiv:2501.18648(2025)

  37. [37]

    Scott Olson. [n. d.]. Miri: an Interpreter for Rust’s Mid-level Intermediate Repre- sentation. https://github.com/rust-lang/miri

  38. [38]

    Manish Shetty, Naman Jain, Adwait Godbole, Sanjit A Seshia, and Koushik Sen

  39. [39]

    Syzygy: Dual Code-Test C to (safe) Rust Translation using LLMs and Dynamic Analysis.arXiv preprint arXiv:2412.14234(2024)

  40. [40]

    Momoko Shiraishi and Takahiro Shinagawa. 2024. Context-aware Code Segmen- tation for C-to-Rust Translation using Large Language Models.arXiv preprint arXiv:2409.10506(2024)

  41. [41]

    HoHyun Sim, Hyeonjoong Cho, Yeonghyeon Go, Zhoulai Fu, Ali Shokri, and Binoy Ravindran. 2025. Large Language Model-Powered Agent for C to Rust Code Translation. arXiv:2505.15858 [cs.PL] https://arxiv.org/abs/2505.15858

  42. [42]

    Marc Szafraniec, Baptiste Roziere, Hugh Leather, Francois Charton, Patrick La- batut, and Gabriel Synnaeve. 2022. Code translation with compiler representa- tions.arXiv preprint arXiv:2207.03578(2022)

  43. [43]

    Norbert Tihanyi, Tamas Bisztray, Ridhi Jain, Mohamed Amine Ferrag, Lucas C Cordeiro, and Vasileios Mavroeidis. 2023. The FormAI dataset: Generative AI in software security through the lens of formal verification. InProceedings of the 19th International Conference on Predictive Models and Data Analytics in Software Engineering. 33–43

  44. [44]

    Chaofan Wang, Tingrui Yu, Chen Xie, Jie Wang, Dong Chen, Wenrui Zhang, Yuling Shi, Xiaodong Gu, and Beijun Shen. 2025. EvoC2Rust: A Skeleton-guided Framework for Project-Level C-to-Rust Translation. arXiv:2508.04295 [cs.SE] https://arxiv.org/abs/2508.04295

  45. [45]

    HanXiang Xu, ShenAo Wang, Ningke Li, Kailong Wang, Yanjie Zhao, Kai Chen, Ting Yu, Yang Liu, and HaoYu Wang. 2024. Large language models for cyber security: A systematic literature review.arXiv preprint arXiv:2405.04760(2024)

  46. [46]

    Aidan ZH Yang, Yoshiki Takashima, Brandon Paulsen, Josiah Dodds, and Daniel Kroening. 2024. VERT: Verified equivalent Rust transpilation with large language models as few-shot learners.arXiv preprint arXiv:2404.18852(2024)

  47. [47]

    Zhen Yang, Fang Liu, Zhongxing Yu, Jacky Wai Keung, Jia Li, Shuo Liu, Yifan Hong, Xiaoxue Ma, Zhi Jin, and Ge Li. 2024. Exploring and unleashing the power of large language models in automated code translation.Proceedings of the ACM on Software Engineering1, FSE (2024), 1585–1608

  48. [48]

    Hanliang Zhang, Cristina David, Meng Wang, Brandon Paulsen, and Daniel Kroening. 2024. Scalable, validated code translation of entire projects using large language models.arXiv preprint arXiv:2412.08035(2024)

  49. [49]

    Hanliang Zhang, Cristina David, Yijun Yu, and Meng Wang. 2023. Ownership guided C to Rust translation. InInternational Conference on Computer Aided Verification. Springer, 459–482

  50. [50]

    Tianyang Zhou, Haowen Lin, Somesh Jha, Mihai Christodorescu, Kirill Levchenko, and Varun Chandrasekaran. 2025. LLM-Driven Multi-step Translation from C to Rust using Static Analysis.arXiv preprint arXiv:2503.12511(2025). ReCode ’26, April 12–18, 2026, Rio de Janeiro, Brazil Farrukh et al. A Appendix A.1 Comparison with Prior C-to-Rust Translation Tools Ta...