Recognition: unknown
LLM4C2Rust: Large Language Models for Automated Memory-Safe Code Transpilation
Pith reviewed 2026-05-10 10:24 UTC · model grok-4.3
The pith
A retrieval-augmented generation pipeline with large language models improves memory safety when automatically converting C code to Rust.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors claim that their RAG-assisted pipeline, which segments C code into balanced blocks and supplies context drawn from Rust documentation plus compiler error references, generally raises both correctness and security metrics for C-to-Rust transpilation, and that several Coreutils programs reach complete elimination of raw pointer dereferences and unsafe type casts in the generated Rust.
What carries the argument
The RAG-enhanced pipeline that segments source into balanced blocks and retrieves context from Rust documentation and compiler error references to steer the LLM output.
If this is right
- Automated conversion of legacy C code can reduce the number of memory-unsafe constructs that reach production.
- Software modernization projects gain a lower-cost path to Rust when manual rewriting is replaced by guided LLM steps.
- The same retrieval-plus-segmentation pattern can be reused for other unsafe-to-safe language pairs.
- Compiler feedback loops become a reusable signal for steering LLM code generation beyond the initial transpilation task.
Where Pith is reading between the lines
- The approach could be tested on larger codebases such as entire operating-system kernels to measure whether elimination rates hold at scale.
- Combining the pipeline with static analysis tools after generation might catch remaining safety issues that the LLM still misses.
- Future work could measure whether the same RAG context also speeds up human review of the generated Rust.
Load-bearing premise
Retrieved documentation and error messages plus balanced segmentation are enough to steer the LLM to memory-safe Rust without new errors or undetected hallucinations.
What would settle it
Applying the same pipeline to a broader set of C programs and observing that raw pointer dereferences or unsafe casts remain at or above the non-RAG baseline levels.
Figures
read the original abstract
Memory safety has long been a critical challenge in software engineering, particularly for legacy systems written in memory-unsafe languages such as C and C++. Rust, one of the youngest modern programming languages, offers built-in memory-safety guarantees that make it a strong candidate for secure systems development. Consequently, transpiling C/C++ code into memory-safe Rust code has become a growing area of research. However, manual transpilation is often time-consuming and error-prone. Additionally, rule-based automated approaches are not as flexible or cost-effective as methods enabled by state-of-the-art AI models, techniques, and methods, such as those that deploy Large Language Models (LLMs), for example, Generative Pretrained Transformers (GPT). In this paper, we propose a Retrieval-Augmented Generation (RAG)-assisted framework that integrates an LLM with a Small Language Model (SLM) to perform C/C++-to-Rust transpilation with a focus on enhancing memory safety. The framework deploys a segmentation strategy that processes C/C++ code in balanced blocks, guiding the LLM with retrieved context from Rust documentation and compiler error references. Our experiments using three OpenAI models (GPT-4o, GPT-4-Turbo, and o3-Mini) demonstrate that the RAG-enhanced pipeline generally improves both code correctness and security for C-to-Rust code transpilation. Several Coreutils programs achieve complete elimination of Raw Pointer Dereferences (RPDs) and Unsafe Type Casts (UTCs) in the final Rust output, indicating the potential of LLM-based transpilation for advancing automated software modernization and repair, as well as memory-safe code generation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes LLM4C2Rust, a RAG-assisted framework combining LLMs (GPT-4o, GPT-4-Turbo, o3-Mini) with an SLM for C/C++-to-Rust transpilation. It uses balanced code segmentation and retrieves context from Rust documentation and compiler error references to improve memory safety. Experiments on Coreutils programs claim that the RAG-enhanced pipeline generally improves code correctness and security, with several programs achieving complete elimination of Raw Pointer Dereferences (RPDs) and Unsafe Type Casts (UTCs).
Significance. If the experimental claims are substantiated with quantitative metrics and verification of functional equivalence, the work could advance automated legacy code modernization by demonstrating a practical LLM-based approach to generating memory-safe Rust, potentially reducing manual effort and security risks in systems programming.
major comments (3)
- [Abstract] Abstract: The central claim that the RAG-enhanced pipeline 'generally improves both code correctness and security' is unsupported by any quantitative metrics, baselines, statistical tests, or details on evaluation methodology (e.g., how correctness was measured beyond pattern counts or whether runtime behavior was compared to the original C). This is load-bearing for the paper's contribution.
- [Abstract] Abstract: The report of 'complete elimination of Raw Pointer Dereferences (RPDs) and Unsafe Type Casts (UTCs)' in several Coreutils programs does not specify the full security evaluation (static patterns only?) or confirm absence of semantic mismatches, new bugs, or hallucinations via differential testing against original C behavior.
- [Abstract] Abstract: The framework description mentions integration with a Small Language Model (SLM) but provides no details on which SLM, its specific role in the pipeline, or how it interacts with the LLM and RAG components to mitigate errors.
minor comments (2)
- [Abstract] Abstract: The segmentation strategy is described as using 'balanced blocks' without defining the balancing criteria or block size selection process.
- [Abstract] Abstract: Results are presented for three OpenAI models without clarifying whether improvements are consistent across models or aggregated.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive feedback on our manuscript. We address each of the major comments point by point below, providing clarifications and indicating planned revisions to the paper.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that the RAG-enhanced pipeline 'generally improves both code correctness and security' is unsupported by any quantitative metrics, baselines, statistical tests, or details on evaluation methodology (e.g., how correctness was measured beyond pattern counts or whether runtime behavior was compared to the original C). This is load-bearing for the paper's contribution.
Authors: We agree that the abstract would benefit from more explicit support for this claim. The full manuscript's evaluation section provides quantitative metrics, including correctness rates based on compilation success and test execution, comparisons to non-RAG baselines using the same LLMs, and counts of unsafe patterns. While we did not conduct formal statistical tests due to the modest number of evaluated programs, we will revise the abstract to include representative quantitative results and a concise description of the evaluation methodology. revision: yes
-
Referee: [Abstract] Abstract: The report of 'complete elimination of Raw Pointer Dereferences (RPDs) and Unsafe Type Casts (UTCs)' in several Coreutils programs does not specify the full security evaluation (static patterns only?) or confirm absence of semantic mismatches, new bugs, or hallucinations via differential testing against original C behavior.
Authors: The security analysis is based on static detection of RPDs and UTCs. For programs with complete elimination, we additionally conducted differential testing by comparing the runtime behavior of the original C code and the transpiled Rust code on Coreutils test cases to verify semantic equivalence and rule out new bugs or hallucinations. We will revise the abstract to specify this combined static and dynamic evaluation approach. revision: yes
-
Referee: [Abstract] Abstract: The framework description mentions integration with a Small Language Model (SLM) but provides no details on which SLM, its specific role in the pipeline, or how it interacts with the LLM and RAG components to mitigate errors.
Authors: We acknowledge that the abstract does not detail the SLM. The manuscript describes the overall RAG-assisted framework, but we will revise the abstract and expand the framework section to specify the SLM used, its role in post-processing the LLM outputs to enhance safety, and its interaction with the RAG component and LLM for error mitigation. revision: yes
Circularity Check
No circularity: experimental claims rest on external benchmarks
full rationale
The paper proposes a RAG-assisted LLM framework for C-to-Rust transpilation and reports experimental outcomes on Coreutils programs using GPT-4o, GPT-4-Turbo, and o3-Mini. No equations, derivations, fitted parameters, or self-referential definitions appear in the provided text or abstract. Central claims of improved correctness and security (e.g., RPD/UTC elimination) are presented as direct results of running the pipeline against compiler checks and static pattern counts, without any reduction of those results to the inputs by construction. No load-bearing self-citations or uniqueness theorems are invoked. The work is therefore self-contained as an empirical evaluation against external oracles.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
- [1]
-
[2]
Garima Agrawal, Tharindu Kumarage, Zeyad Alghamdi, and Huan Liu. 2024. Mindful-RAG: A Study of Points of Failure in Retrieval Augmented Generation. In2024 2nd International Conference on Foundation and Large Language Models (FLLM). 607–611. doi:10.1109/FLLM63129.2024.10852457
-
[3]
Sacha-Élie Ayoun, Xavier Denis, Petar Maksimović, and Philippa Gardner. 2025. A hybrid approach to semi-automated Rust verification.Proceedings of the ACM on Programming Languages9, PLDI (2025), 970–992. 1https://github.com/qas-lab/reu-sarah-bedell LLM4C2Rust: Large Language Models for Automated Memory-Safe Code Transpilation 19
2025
-
[4]
Emery D Berger and Benjamin G Zorn. 2006. DieHard: Probabilistic memory safety for unsafe languages.Acm sigplan notices41, 6 (2006), 158–168
2006
-
[5]
Sahil Bhatia, Jie Qiu, Niranjan Hasabnis, Sanjit Seshia, and Alvin Cheung. 2024. Verified code transpilation with LLMs. Advances in Neural Information Processing Systems37 (2024), 41394–41424
2024
-
[6]
Kenneth Ward Church, Jiameng Sun, Richard Yue, Peter Vickers, Walid Saba, and Raman Chandrasekar. 2024. Emerging trends: a gentle introduction to RAG.Natural Language Engineering30, 4 (2024), 870–881. doi:10.1017/ S1351324924000044
2024
-
[7]
Mehmet Emre, Ryan Schroeder, Kyle Dewey, and Ben Hardekopf. 2021. Translating C to safer Rust.Proc. ACM Program. Lang.5, OOPSLA, Article 121 (Oct. 2021), 29 pages. doi:10.1145/3485498
-
[8]
Jonáš Fiala, Shachar Itzhaky, Peter Müller, Nadia Polikarpova, and Ilya Sergey. 2023. Leveraging rust types for program synthesis.Proceedings of the ACM on Programming Languages7, PLDI (2023), 1414–1437
2023
-
[9]
Lennard Gäher, Michael Sammler, Ralf Jung, Robbert Krebbers, and Derek Dreyer. 2024. Refinedrust: A type system for high-assurance verification of Rust programs.Proceedings of the ACM on Programming Languages8, PLDI (2024), 1115–1139
2024
-
[10]
Desta Haileselassie Hagos, Rick Battle, and Danda B. Rawat. 2024. Recent Advances in Generative AI and Large Language Models: Current Status, Challenges, and Perspectives.IEEE Transactions on Artificial Intelligence5, 12 (2024), 5873–5893. doi:10.1109/TAI.2024.3444742
- [11]
-
[12]
Jaemin Hong and Sukyoung Ryu. 2024. To Tag, or Not to Tag: Translating C’s Unions to Rust’s Tagged Unions. In Proceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering(Sacramento, CA, USA) (ASE ’24). Association for Computing Machinery, New York, NY, USA, 40–52. doi:10.1145/3691620.3694985
-
[13]
Jaemin Hong and Sukyoung Ryu. 2024. Type-migrating C-to-Rust translation using a large language model.Empirical Softw. Engg.30, 1 (Oct. 2024), 38 pages. doi:10.1007/s10664-024-10573-2
-
[14]
2024.Back to the Building Blocks: A Path Toward Secure and Measurable Software
The White House. 2024.Back to the Building Blocks: A Path Toward Secure and Measurable Software. Technical Report. The White House. https://bidenwhitehouse.archives.gov/wp-content/uploads/2024/02/Final-ONCD-Technical- Report.pdf Accessed: 2024-09-30
2024
-
[15]
2024.National Cybersecurity Strategy Implementation Plan
The White House. 2024.National Cybersecurity Strategy Implementation Plan. Technical Report. The White House. https://bidenwhitehouse.archives.gov/wp-content/uploads/2024/05/National-Cybersecurity-Strategy- Implementation-Plan-Version-2.pdf Accessed: 2024-09-30
2024
-
[16]
Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, et al. 2025. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions.ACM Transactions on Information Systems43, 2 (2025), 1–55
2025
-
[17]
Immunant Inc. [n. d.]. c2rust. https://c2rust.com/ Accessed: 07/18/2025
2025
-
[18]
2018.The Rust Programming Language
Steve Klabnik and Carol Nichols. 2018.The Rust Programming Language. No Starch Press, USA
2018
-
[19]
Robin Ko÷, Mustafa Kağan Gürkan, and Fatoş T. Yarman Vural. 2024. ReRag: A New Architecture for Reducing the Hallucination by Retrieval- Augmented Generation. In2024 9th International Conference on Computer Science and Engineering (UBMK). 961–965. doi:10.1109/UBMK63289.2024.10773428
- [20]
- [21]
-
[22]
Yo-Seob Lee. 2024. Analysis of Small Large Language Models(LLMs).International Journal of Advanced Smart Convergence13, 4 (2024), 155–160
2024
-
[23]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-augmented generation for knowledge-intensive NLP tasks. InProceedings of the 34th International Conference on Neural Information Processing Systems(Van...
2020
-
[24]
Yihao Li, Pan Liu, Haiyang Wang, Jie Chu, and W. Eric Wong. 2025. Evaluating large language models for software testing.Computer Standards & Interfaces93 (April 2025), 103942. doi:10.1016/j.csi.2024.103942
-
[25]
Michael Ling, Yijun Yu, Haitao Wu, Yuan Wang, James R. Cordy, and Ahmed E. Hassan. 2022. In Rust We Trust – A Transpiler from Unsafe C to Safer Rust. In2022 IEEE/ACM 44th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion). 354–355. doi:10.1145/3510454.3528640
-
[26]
Suqing Liu, Zezhu Yu, Feiran Huang, Yousef Bulbulia, Andreas Bergen, and Michael Liut. 2024. Can Small Language Models With Retrieval-Augmented Generation Replace Large Language Models When Learning Computer Science?. 20 Bedell et al. InProceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1(Milan, Italy)(ITiCSE 2024). Asso...
- [27]
-
[28]
Ahmet Okutan, Samuel Merten, Christoph C. Michael, and Ben Ryjikov. 2024. Leveraging RAG-LLM to Translate C++ to Rust. In2024 International Conference on Assured Autonomy (ICAA). 102–105. doi:10.1109/ICAA64256.2024.00024
-
[29]
Free Software Foundation & GNU Project. 2025. GNU Core Utilities (coreutils) repository – src directory. https: //github.com/coreutils/coreutils/tree/master/src. Accessed: 2025-11-09
2025
-
[30]
Baptiste Roziere, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lample. 2020. Unsupervised translation of programming languages. InProceedings of the 34th International Conference on Neural Information Processing Systems (Vancouver, BC, Canada)(NIPS ’20). Curran Associates Inc., Red Hook, NY, USA, Article 1730, 11 pages
2020
-
[31]
Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, and Weizhu Chen. 2023. Enhancing Retrieval- Augmented Large Language Models with Iterative Retrieval-Generation Synergy. InFindings of the Association for Computational Linguistics: EMNLP 2023, Houda Bouamor, Juan Pino, and Kalika Bali (Eds.). Association for Computational Linguistics, Singapo...
-
[32]
HoHyun Sim, Hyeonjoong Cho, Yeonghyeon Go, Zhoulai Fu, Ali Shokri, and Binoy Ravindran. 2025. Large Language Model-Powered Agent for C to Rust Code Translation.arXiv preprint arXiv:2505.15858(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[33]
Alexey Svyatkovskiy, Ying Zhao, Shengyu Fu, and Neel Sundaresan. 2019. Pythia: AI-assisted Code Completion System. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining(Anchorage, AK, USA)(KDD ’19). Association for Computing Machinery, New York, NY, USA, 2727–2735. doi:10.1145/3292500.3330699
-
[34]
Hui Xu, Zhuangbin Chen, Mingshen Sun, Yangfan Zhou, and Michael R Lyu. 2021. Memory-safety challenge considered solved? An in-depth study with all Rust CVEs.ACM Transactions on Software Engineering and Methodology (TOSEM) 31, 1 (2021), 1–25
2021
- [35]
-
[36]
Zhen Yang, Fang Liu, Zhongxing Yu, Jacky Wai Keung, Jia Li, Shuo Liu, Yifan Hong, Xiaoxue Ma, Zhi Jin, and Ge Li
-
[37]
Exploring and unleashing the power of large language models in automated code translation.Proceedings of the ACM on Software Engineering1, FSE (2024), 1585–1608
2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.