arxiv: 2604.15485 · v1 · submitted 2026-04-16 · 💻 cs.SE

Recognition: unknown

LLM4C2Rust: Large Language Models for Automated Memory-Safe Code Transpilation

Sarah Bedell , Nazanin Siavash , Armin Moin

Authors on Pith no claims yet

Pith reviewed 2026-05-10 10:24 UTC · model grok-4.3

classification 💻 cs.SE

keywords C to Rust transpilationmemory safetyretrieval-augmented generationlarge language modelscode modernizationunsafe code eliminationCoreutils evaluation

0 comments

The pith

A retrieval-augmented generation pipeline with large language models improves memory safety when automatically converting C code to Rust.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that pairs an LLM with retrieved Rust documentation and compiler error messages to transpile C into Rust while targeting fewer memory-unsafe constructs. Code is broken into balanced segments so the model receives focused context at each step. Experiments with GPT-4o, GPT-4-Turbo, and o3-mini show the RAG version raises correctness rates and cuts raw pointer dereferences and unsafe type casts, with some Coreutils programs reaching zero of both. This matters because legacy C systems remain vulnerable to memory errors and manual rewriting to Rust is slow and costly. The work therefore tests whether LLM guidance can make automated migration practical for real programs.

Core claim

The authors claim that their RAG-assisted pipeline, which segments C code into balanced blocks and supplies context drawn from Rust documentation plus compiler error references, generally raises both correctness and security metrics for C-to-Rust transpilation, and that several Coreutils programs reach complete elimination of raw pointer dereferences and unsafe type casts in the generated Rust.

What carries the argument

The RAG-enhanced pipeline that segments source into balanced blocks and retrieves context from Rust documentation and compiler error references to steer the LLM output.

If this is right

Automated conversion of legacy C code can reduce the number of memory-unsafe constructs that reach production.
Software modernization projects gain a lower-cost path to Rust when manual rewriting is replaced by guided LLM steps.
The same retrieval-plus-segmentation pattern can be reused for other unsafe-to-safe language pairs.
Compiler feedback loops become a reusable signal for steering LLM code generation beyond the initial transpilation task.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be tested on larger codebases such as entire operating-system kernels to measure whether elimination rates hold at scale.
Combining the pipeline with static analysis tools after generation might catch remaining safety issues that the LLM still misses.
Future work could measure whether the same RAG context also speeds up human review of the generated Rust.

Load-bearing premise

Retrieved documentation and error messages plus balanced segmentation are enough to steer the LLM to memory-safe Rust without new errors or undetected hallucinations.

What would settle it

Applying the same pipeline to a broader set of C programs and observing that raw pointer dereferences or unsafe casts remain at or above the non-RAG baseline levels.

Figures

Figures reproduced from arXiv: 2604.15485 by Armin Moin, Nazanin Siavash, Sarah Bedell.

**Figure 2.** Figure 2: Transpiled C code into Rust using the o3-mini model. [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Comprehensive Overview of Proposed Approach [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Annotated Python function showing the four components of the LLM-based transpilation call: (1) [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

read the original abstract

Memory safety has long been a critical challenge in software engineering, particularly for legacy systems written in memory-unsafe languages such as C and C++. Rust, one of the youngest modern programming languages, offers built-in memory-safety guarantees that make it a strong candidate for secure systems development. Consequently, transpiling C/C++ code into memory-safe Rust code has become a growing area of research. However, manual transpilation is often time-consuming and error-prone. Additionally, rule-based automated approaches are not as flexible or cost-effective as methods enabled by state-of-the-art AI models, techniques, and methods, such as those that deploy Large Language Models (LLMs), for example, Generative Pretrained Transformers (GPT). In this paper, we propose a Retrieval-Augmented Generation (RAG)-assisted framework that integrates an LLM with a Small Language Model (SLM) to perform C/C++-to-Rust transpilation with a focus on enhancing memory safety. The framework deploys a segmentation strategy that processes C/C++ code in balanced blocks, guiding the LLM with retrieved context from Rust documentation and compiler error references. Our experiments using three OpenAI models (GPT-4o, GPT-4-Turbo, and o3-Mini) demonstrate that the RAG-enhanced pipeline generally improves both code correctness and security for C-to-Rust code transpilation. Several Coreutils programs achieve complete elimination of Raw Pointer Dereferences (RPDs) and Unsafe Type Casts (UTCs) in the final Rust output, indicating the potential of LLM-based transpilation for advancing automated software modernization and repair, as well as memory-safe code generation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper has a practical RAG pipeline with segmentation that eliminates some unsafe patterns in Coreutils C-to-Rust translations, but its evaluation of actual correctness is too light to back the security claims strongly.

read the letter

The punchline here is that the authors have a RAG-assisted framework using code segmentation and an LLM paired with an SLM that turns C into Rust and manages to wipe out raw pointer derefs and unsafe casts in several Coreutils programs. That's the positive result they show with GPT-4o, GPT-4-Turbo, and o3-Mini. They do a good job laying out a concrete method. Breaking the code into balanced blocks and pulling relevant Rust documentation and error messages makes sense for keeping the generation on track. Applying it to real utility code rather than toy examples is also a plus, and running it across three models gives some sense of consistency. Where it falls short is the strength of the evidence for the main claims. The abstract talks about general improvements in correctness and security, but it doesn't give specific metrics, comparison to baselines, or explain how they confirmed the translated code does what the original did. Counting eliminated unsafe patterns and checking if it compiles catches some issues, but it can miss cases where the logic is wrong or new errors sneak in. The stress-test point about undetected hallucinations or semantic mismatches seems fair based on what's described. This paper is aimed at folks working on automated code translation and software modernization, especially those interested in using LLMs for security improvements. Someone building similar tools could find the segmentation and retrieval strategy useful to experiment with. I would send it for peer review. The idea is solid enough and the experiments on actual programs make it worth a closer look from referees who can push for better evaluation details.

Referee Report

3 major / 2 minor

Summary. The paper proposes LLM4C2Rust, a RAG-assisted framework combining LLMs (GPT-4o, GPT-4-Turbo, o3-Mini) with an SLM for C/C++-to-Rust transpilation. It uses balanced code segmentation and retrieves context from Rust documentation and compiler error references to improve memory safety. Experiments on Coreutils programs claim that the RAG-enhanced pipeline generally improves code correctness and security, with several programs achieving complete elimination of Raw Pointer Dereferences (RPDs) and Unsafe Type Casts (UTCs).

Significance. If the experimental claims are substantiated with quantitative metrics and verification of functional equivalence, the work could advance automated legacy code modernization by demonstrating a practical LLM-based approach to generating memory-safe Rust, potentially reducing manual effort and security risks in systems programming.

major comments (3)

[Abstract] Abstract: The central claim that the RAG-enhanced pipeline 'generally improves both code correctness and security' is unsupported by any quantitative metrics, baselines, statistical tests, or details on evaluation methodology (e.g., how correctness was measured beyond pattern counts or whether runtime behavior was compared to the original C). This is load-bearing for the paper's contribution.
[Abstract] Abstract: The report of 'complete elimination of Raw Pointer Dereferences (RPDs) and Unsafe Type Casts (UTCs)' in several Coreutils programs does not specify the full security evaluation (static patterns only?) or confirm absence of semantic mismatches, new bugs, or hallucinations via differential testing against original C behavior.
[Abstract] Abstract: The framework description mentions integration with a Small Language Model (SLM) but provides no details on which SLM, its specific role in the pipeline, or how it interacts with the LLM and RAG components to mitigate errors.

minor comments (2)

[Abstract] Abstract: The segmentation strategy is described as using 'balanced blocks' without defining the balancing criteria or block size selection process.
[Abstract] Abstract: Results are presented for three OpenAI models without clarifying whether improvements are consistent across models or aggregated.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their detailed and constructive feedback on our manuscript. We address each of the major comments point by point below, providing clarifications and indicating planned revisions to the paper.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the RAG-enhanced pipeline 'generally improves both code correctness and security' is unsupported by any quantitative metrics, baselines, statistical tests, or details on evaluation methodology (e.g., how correctness was measured beyond pattern counts or whether runtime behavior was compared to the original C). This is load-bearing for the paper's contribution.

Authors: We agree that the abstract would benefit from more explicit support for this claim. The full manuscript's evaluation section provides quantitative metrics, including correctness rates based on compilation success and test execution, comparisons to non-RAG baselines using the same LLMs, and counts of unsafe patterns. While we did not conduct formal statistical tests due to the modest number of evaluated programs, we will revise the abstract to include representative quantitative results and a concise description of the evaluation methodology. revision: yes
Referee: [Abstract] Abstract: The report of 'complete elimination of Raw Pointer Dereferences (RPDs) and Unsafe Type Casts (UTCs)' in several Coreutils programs does not specify the full security evaluation (static patterns only?) or confirm absence of semantic mismatches, new bugs, or hallucinations via differential testing against original C behavior.

Authors: The security analysis is based on static detection of RPDs and UTCs. For programs with complete elimination, we additionally conducted differential testing by comparing the runtime behavior of the original C code and the transpiled Rust code on Coreutils test cases to verify semantic equivalence and rule out new bugs or hallucinations. We will revise the abstract to specify this combined static and dynamic evaluation approach. revision: yes
Referee: [Abstract] Abstract: The framework description mentions integration with a Small Language Model (SLM) but provides no details on which SLM, its specific role in the pipeline, or how it interacts with the LLM and RAG components to mitigate errors.

Authors: We acknowledge that the abstract does not detail the SLM. The manuscript describes the overall RAG-assisted framework, but we will revise the abstract and expand the framework section to specify the SLM used, its role in post-processing the LLM outputs to enhance safety, and its interaction with the RAG component and LLM for error mitigation. revision: yes

Circularity Check

0 steps flagged

No circularity: experimental claims rest on external benchmarks

full rationale

The paper proposes a RAG-assisted LLM framework for C-to-Rust transpilation and reports experimental outcomes on Coreutils programs using GPT-4o, GPT-4-Turbo, and o3-Mini. No equations, derivations, fitted parameters, or self-referential definitions appear in the provided text or abstract. Central claims of improved correctness and security (e.g., RPD/UTC elimination) are presented as direct results of running the pipeline against compiler checks and static pattern counts, without any reduction of those results to the inputs by construction. No load-bearing self-citations or uniqueness theorems are invoked. The work is therefore self-contained as an empirical evaluation against external oracles.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied empirical paper in software engineering. No formal axioms, free parameters, or invented physical entities are introduced; the framework consists of described engineering choices rather than postulated theoretical constructs.

pith-pipeline@v0.9.0 · 5606 in / 1201 out tokens · 44130 ms · 2026-05-10T10:24:09.080975+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 19 canonical work pages · 1 internal anchor

[1]

Defense Advanced Research Projects Agency. [n. d.]. TRACTOR: Translating All C To Rust. http://web.archive.org/ web/20080207010024/http://www.808multimedia.com/winnt/kernel.htm. Accessed: 07/18/2025

work page arXiv 2025
[2]

Garima Agrawal, Tharindu Kumarage, Zeyad Alghamdi, and Huan Liu. 2024. Mindful-RAG: A Study of Points of Failure in Retrieval Augmented Generation. In2024 2nd International Conference on Foundation and Large Language Models (FLLM). 607–611. doi:10.1109/FLLM63129.2024.10852457

work page doi:10.1109/fllm63129.2024.10852457 2024
[3]

Sacha-Élie Ayoun, Xavier Denis, Petar Maksimović, and Philippa Gardner. 2025. A hybrid approach to semi-automated Rust verification.Proceedings of the ACM on Programming Languages9, PLDI (2025), 970–992. 1https://github.com/qas-lab/reu-sarah-bedell LLM4C2Rust: Large Language Models for Automated Memory-Safe Code Transpilation 19

2025
[4]

Emery D Berger and Benjamin G Zorn. 2006. DieHard: Probabilistic memory safety for unsafe languages.Acm sigplan notices41, 6 (2006), 158–168

2006
[5]

Sahil Bhatia, Jie Qiu, Niranjan Hasabnis, Sanjit Seshia, and Alvin Cheung. 2024. Verified code transpilation with LLMs. Advances in Neural Information Processing Systems37 (2024), 41394–41424

2024
[6]

Kenneth Ward Church, Jiameng Sun, Richard Yue, Peter Vickers, Walid Saba, and Raman Chandrasekar. 2024. Emerging trends: a gentle introduction to RAG.Natural Language Engineering30, 4 (2024), 870–881. doi:10.1017/ S1351324924000044

2024
[7]

Mehmet Emre, Ryan Schroeder, Kyle Dewey, and Ben Hardekopf. 2021. Translating C to safer Rust.Proc. ACM Program. Lang.5, OOPSLA, Article 121 (Oct. 2021), 29 pages. doi:10.1145/3485498

work page doi:10.1145/3485498 2021
[8]

Jonáš Fiala, Shachar Itzhaky, Peter Müller, Nadia Polikarpova, and Ilya Sergey. 2023. Leveraging rust types for program synthesis.Proceedings of the ACM on Programming Languages7, PLDI (2023), 1414–1437

2023
[9]

Lennard Gäher, Michael Sammler, Ralf Jung, Robbert Krebbers, and Derek Dreyer. 2024. Refinedrust: A type system for high-assurance verification of Rust programs.Proceedings of the ACM on Programming Languages8, PLDI (2024), 1115–1139

2024
[10]

Desta Haileselassie Hagos, Rick Battle, and Danda B. Rawat. 2024. Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives.IEEE Transactions on Artificial Intelligence5, 12 (2024), 5873–5893. doi:10.1109/TAI.2024.3444742

work page doi:10.1109/tai.2024.3444742 2024
[11]

Jaemin Hong and Sukyoung Ryu. 2023. Concrat: An Automatic C-to-Rust Lock API Translator for Concurrent Programs. In2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). 716–728. doi:10.1109/ ICSE48619.2023.00069

work page arXiv 2023
[12]

Jaemin Hong and Sukyoung Ryu. 2024. To Tag, or Not to Tag: Translating C’s Unions to Rust’s Tagged Unions. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering(Sacramento, CA, USA) (ASE ’24). Association for Computing Machinery, New York, NY, USA, 40–52. doi:10.1145/3691620.3694985

work page doi:10.1145/3691620.3694985 2024
[13]

Jaemin Hong and Sukyoung Ryu. 2024. Type-migrating C-to-Rust translation using a large language model.Empirical Softw. Engg.30, 1 (Oct. 2024), 38 pages. doi:10.1007/s10664-024-10573-2

work page doi:10.1007/s10664-024-10573-2 2024
[14]

2024.Back to the Building Blocks: A Path Toward Secure and Measurable Software

The White House. 2024.Back to the Building Blocks: A Path Toward Secure and Measurable Software. Technical Report. The White House. https://bidenwhitehouse.archives.gov/wp-content/uploads/2024/02/Final-ONCD-Technical- Report.pdf Accessed: 2024-09-30

2024
[15]

2024.National Cybersecurity Strategy Implementation Plan

The White House. 2024.National Cybersecurity Strategy Implementation Plan. Technical Report. The White House. https://bidenwhitehouse.archives.gov/wp-content/uploads/2024/05/National-Cybersecurity-Strategy- Implementation-Plan-Version-2.pdf Accessed: 2024-09-30

2024
[16]

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. 2025. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems43, 2 (2025), 1–55

2025
[17]

Immunant Inc. [n. d.]. c2rust. https://c2rust.com/ Accessed: 07/18/2025

2025
[18]

2018.The Rust Programming Language

Steve Klabnik and Carol Nichols. 2018.The Rust Programming Language. No Starch Press, USA

2018
[19]

Yarman Vural

Robin Ko÷, Mustafa Kağan Gürkan, and Fatoş T. Yarman Vural. 2024. ReRag: A New Architecture for Reducing the Hallucination by Retrieval- Augmented Generation. In2024 9th International Conference on Computer Science and Engineering (UBMK). 961–965. doi:10.1109/UBMK63289.2024.10773428

work page doi:10.1109/ubmk63289.2024.10773428 2024
[20]

Rasmus Krebs and Somnath Mazumdar. 2025. Deploy, but verify: Analysing LLM Generated Code Safety. In2025 33rd Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP). 13–16. doi:10.1109/ PDP66500.2025.00011

work page arXiv 2025
[21]

Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, and Guillaume Lample. 2020. Unsupervised translation of programming languages.arXiv preprint arXiv:2006.03511(2020)

work page arXiv 2020
[22]

Yo-Seob Lee. 2024. Analysis of Small Large Language Models(LLMs).International Journal of Advanced Smart Convergence13, 4 (2024), 155–160

2024
[23]

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-augmented generation for knowledge-intensive NLP tasks. InProceedings of the 34th International Conference on Neural Information Processing Systems(Van...

2020
[24]

Eric Wong

Yihao Li, Pan Liu, Haiyang Wang, Jie Chu, and W. Eric Wong. 2025. Evaluating large language models for software testing.Computer Standards & Interfaces93 (April 2025), 103942. doi:10.1016/j.csi.2024.103942

work page doi:10.1016/j.csi.2024.103942 2025
[25]

Cordy, and Ahmed E

Michael Ling, Yijun Yu, Haitao Wu, Yuan Wang, James R. Cordy, and Ahmed E. Hassan. 2022. In Rust We Trust – A Transpiler from Unsafe C to Safer Rust. In2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 354–355. doi:10.1145/3510454.3528640

work page doi:10.1145/3510454.3528640 2022
[26]

Suqing Liu, Zezhu Yu, Feiran Huang, Yousef Bulbulia, Andreas Bergen, and Michael Liut. 2024. Can Small Language Models With Retrieval-Augmented Generation Replace Large Language Models When Learning Computer Science?. 20 Bedell et al. InProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1(Milan, Italy)(ITiCSE 2024). Asso...

work page doi:10.1145/3649217.3653554 2024
[27]

Vikram Nitin, Rahul Krishna, Luiz Lemos do Valle, and Baishakhi Ray. 2025. C2SaferRust: Transforming C Projects into Safer Rust with NeuroSymbolic Techniques. arXiv:2501.14257 [cs.SE] https://arxiv.org/abs/2501.14257 Accessed: 07/28/2025

work page arXiv 2025
[28]

Michael, and Ben Ryjikov

Ahmet Okutan, Samuel Merten, Christoph C. Michael, and Ben Ryjikov. 2024. Leveraging RAG-LLM to Translate C++ to Rust. In2024 International Conference on Assured Autonomy (ICAA). 102–105. doi:10.1109/ICAA64256.2024.00024

work page doi:10.1109/icaa64256.2024.00024 2024
[29]

Free Software Foundation & GNU Project. 2025. GNU Core Utilities (coreutils) repository – src directory. https: //github.com/coreutils/coreutils/tree/master/src. Accessed: 2025-11-09

2025
[30]

Baptiste Roziere, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lample. 2020. Unsupervised translation of programming languages. InProceedings of the 34th International Conference on Neural Information Processing Systems (Vancouver, BC, Canada)(NIPS ’20). Curran Associates Inc., Red Hook, NY, USA, Article 1730, 11 pages

2020
[31]

Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, and Weizhu Chen. 2023. Enhancing Retrieval- Augmented Large Language Models with Iterative Retrieval-Generation Synergy. InFindings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapo...

work page doi:10.18653/v1/2023.findings-emnlp.620 2023
[32]

HoHyun Sim, Hyeonjoong Cho, Yeonghyeon Go, Zhoulai Fu, Ali Shokri, and Binoy Ravindran. 2025. Large Language Model-Powered Agent for C to Rust Code Translation.arXiv preprint arXiv:2505.15858(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[33]

Alexey Svyatkovskiy, Ying Zhao, Shengyu Fu, and Neel Sundaresan. 2019. Pythia: AI-assisted Code Completion System. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(Anchorage, AK, USA)(KDD ’19). Association for Computing Machinery, New York, NY, USA, 2727–2735. doi:10.1145/3292500.3330699

work page doi:10.1145/3292500.3330699 2019
[34]

Hui Xu, Zhuangbin Chen, Mingshen Sun, Yangfan Zhou, and Michael R Lyu. 2021. Memory-safety challenge considered solved? An in-depth study with all Rust CVEs.ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 1 (2021), 1–25

2021
[35]

Aidan Z. H. Yang, Yoshiki Takashima, Brandon Paulsen, Josiah Dodds, and Daniel Kroening. 2024. VERT: Verified Equivalent Rust Transpilation with Large Language Models as Few-Shot Learners. arXiv:2404.18852 [cs.PL] https: //arxiv.org/abs/2404.18852

work page arXiv 2024
[36]

Zhen Yang, Fang Liu, Zhongxing Yu, Jacky Wai Keung, Jia Li, Shuo Liu, Yifan Hong, Xiaoxue Ma, Zhi Jin, and Ge Li
[37]

Exploring and unleashing the power of large language models in automated code translation.Proceedings of the ACM on Software Engineering1, FSE (2024), 1585–1608

2024