Recognition: unknown
ReCodeAgent: A Multi-Agent Workflow for Language-agnostic Translation and Validation of Large-scale Repositories
Pith reviewed 2026-05-10 17:23 UTC · model grok-4.3
The pith
ReCodeAgent is the first multi-agent system to deliver high-success-rate, language-agnostic translation and validation for large code repositories.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ReCodeAgent is an autonomous multi-agent approach for language-agnostic repository-level code translation and validation. Users only need to provide the project in the source PL and specify the target PL for ReCodeAgent to automatically translate and validate the entire repository. ReCodeAgent is the first technique to achieve high translation success rates across many PLs.
What carries the argument
Multi-agent workflow that synthesizes code across programming languages and autonomously invokes each language's existing analysis and validation tools.
If this is right
- Translation and validation succeed across six languages and four pairs without per-pair custom engineering.
- Test pass rates on ground-truth tests improve by 60.8 percent over four prior techniques.
- Average cost stays at 15.3 dollars per project of roughly 2,000 lines.
- Multi-agent design raises test pass rate by 40.4 percent and shortens trajectories by 28 percent relative to single-agent versions.
Where Pith is reading between the lines
- Large legacy codebases could be migrated to newer languages with far less manual porting effort.
- The same autonomous workflow pattern may apply to other repository-scale tasks such as refactoring or security hardening.
- Success on the tested language set suggests the method could scale to additional languages if the agentic tool-use remains reliable.
Load-bearing premise
That providing only the source project and target PL is sufficient for fully autonomous, high-success-rate translation and validation of large-scale repositories without language-pair-specific engineering or human oversight.
What would settle it
Running ReCodeAgent on repositories written in a programming language outside the six evaluated ones and observing whether high test pass rates are maintained without adding custom tools or human intervention.
Figures
read the original abstract
Most repository-level code translation and validation techniques have been evaluated on a single source-target programming language (PL) pair, owing to the complex engineering effort required to adapt new PL pairs. Programming agents can enable PL-agnosticism in repository-level code translation and validation: they can synthesize code across many PLs and autonomously use existing tools specific to each PL's analysis. However, state-of-the-art has yet to offer a fully autonomous agentic approach for repository-level code translation and validation of large-scale programs. This paper proposes ReCodeAgent, an autonomous multi-agent approach for language-agnostic repository-level code translation and validation. Users only need to provide the project in the source PL and specify the target PL for ReCodeAgent to automatically translate and validate the entire repository. ReCodeAgent is the first technique to achieve high translation success rates across many PLs. We compare the effectiveness of ReCodeAgent with four alternative neuro-symbolic and agentic approaches to translate 118 real-world projects, with 1,975 LoC and 43 translation units for each project, on average. The projects cover 6 PLs (C, Go, Java, JavaScript, Python, and Rust) and 4 PL pairs (C-Rust, Go-Rust, Java-Python, Python-JavaScript). Our results demonstrate that ReCodeAgent consistently outperforms prior techniques on translation correctness, improving test pass rate by 60.8% on ground-truth tests, with an average cost of $15.3. We also perform process-centric analysis of ReCodeAgent trajectories to confirm its procedural efficiency. Finally, we investigate how the design choices (a multi-agent vs. single-agent architecture) influence ReCodeAgent performance: on average, the test pass rate drops by 40.4%, and trajectories become 28% longer and persistently inefficient.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes ReCodeAgent, a multi-agent workflow designed to enable language-agnostic translation and validation of large-scale code repositories. The approach requires only the source project and the target programming language as input, aiming to autonomously handle translation across multiple programming languages without pair-specific engineering. The evaluation involves translating 118 real-world projects (average 1,975 LoC, 43 translation units) across 6 PLs and 4 pairs, comparing against four neuro-symbolic and agentic baselines. Key results include a 60.8% improvement in test pass rate on ground-truth tests and an average cost of $15.3, along with analysis showing multi-agent design improves performance by 40.4% over single-agent.
Significance. If the results hold under rigorous scrutiny, this work represents a meaningful advance in repository-level code translation by demonstrating a scalable, agent-based method that reduces reliance on language-pair-specific adaptations. The inclusion of process-centric trajectory analysis and cost metrics strengthens the practical implications for software maintenance and migration tasks. The multi-agent vs. single-agent comparison provides useful insights into agentic system design.
major comments (2)
- [Evaluation] The central claim that ReCodeAgent is language-agnostic and fully autonomous relies on experiments limited to four language pairs (C-Rust, Go-Rust, Java-Python, Python-JavaScript). No results are provided for additional pairs or for demonstrating that the workflow succeeds on unseen pairs without any pair-specific configuration or human intervention in environment setup (e.g., compilers, dependency resolution). This is load-bearing for the 'across many PLs' assertion in the abstract.
- [Abstract and Evaluation] The reported 60.8% improvement in test pass rate lacks accompanying details on experimental protocol, such as how ground-truth tests were obtained, project selection criteria, number of runs, or statistical tests for significance. Without these, it is difficult to assess the reliability of the performance claims.
minor comments (2)
- [Abstract] The abstract mentions 'high translation success rates across many PLs' but the evaluation is on four pairs; consider qualifying this in the abstract for accuracy.
- [Throughout] Ensure consistent use of terminology for 'translation units' and clarify how they are defined in the methodology section.
Simulated Author's Rebuttal
We thank the referee for their insightful review and the recommendation for major revision. We have carefully considered the comments and provide point-by-point responses below, along with our plans for revisions to address the concerns.
read point-by-point responses
-
Referee: [Evaluation] The central claim that ReCodeAgent is language-agnostic and fully autonomous relies on experiments limited to four language pairs (C-Rust, Go-Rust, Java-Python, Python-JavaScript). No results are provided for additional pairs or for demonstrating that the workflow succeeds on unseen pairs without any pair-specific configuration or human intervention in environment setup (e.g., compilers, dependency resolution). This is load-bearing for the 'across many PLs' assertion in the abstract.
Authors: We agree that expanding the evaluation to more language pairs would provide stronger evidence for the language-agnostic claim. However, the current experiments already demonstrate the approach across four diverse pairs involving six languages, with no pair-specific configurations or manual interventions in the workflow—the agents autonomously manage tool usage and environment setup for each target language. The design is intentionally general, as described in Section 3. To further address this, we will revise the abstract to more precisely state the scope of our evaluation (multiple pairs across six PLs) and add a discussion on the generalizability of the multi-agent workflow to unseen pairs based on its architecture. We cannot add new experimental results for additional pairs at this stage without significant additional resources, but the existing results support the autonomy claim. revision: partial
-
Referee: [Abstract and Evaluation] The reported 60.8% improvement in test pass rate lacks accompanying details on experimental protocol, such as how ground-truth tests were obtained, project selection criteria, number of runs, or statistical tests for significance. Without these, it is difficult to assess the reliability of the performance claims.
Authors: We acknowledge the need for greater transparency in the experimental protocol. In the revised version of the manuscript, we will expand the Evaluation section (Section 4) to include: detailed information on how ground-truth tests were sourced from the original project repositories; the criteria used for selecting the 118 real-world projects (e.g., popularity, presence of comprehensive test suites, and diversity across languages); the number of experimental runs performed (we conducted multiple runs to mitigate stochasticity in agent behavior); and results of statistical significance tests (such as paired t-tests or Wilcoxon tests) to support the 60.8% improvement claim. These additions will allow readers to better evaluate the reliability of our findings. revision: yes
Circularity Check
No circularity; empirical results rest on external benchmarks and direct comparisons.
full rationale
The paper is an empirical evaluation of a multi-agent system for code translation. It reports measured improvements (e.g., 60.8% higher test pass rate) from running ReCodeAgent and four baselines on 118 real-world projects spanning four language pairs. No equations, derivations, fitted parameters, or first-principles predictions appear in the provided text. Claims of language-agnostic behavior and outperformance are grounded in the experimental outcomes rather than any self-definitional loop, renamed known result, or load-bearing self-citation chain. The evaluation design is self-contained against the stated benchmarks and does not reduce any central result to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Repository-level code translation and validation can be performed autonomously by agents that synthesize code and use existing PL-specific analysis tools without language-pair-specific engineering.
invented entities (1)
-
ReCodeAgent multi-agent workflow
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Muhammad Salman Abid, Mrigank Pawagi, Sugam Adhikari, Xuyan Cheng, Ryed Badr, Md Wahiduzzaman, Vedant Rathi, Ronghui Qi, Choiyin Li, Lu Liu, et al. 2024. GlueTest: Testing Code Translation via Language Interoperability. In2024 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, 612–617
2024
-
[2]
Lakshya A Agrawal, Aditya Kanade, Navin Goyal, Shuvendu Lahiri, and Sriram Rajamani. 2023. Monitor-guided decoding of code lms with static analysis of repository context. InAdvances in Neural Information Processing Systems, Vol. 36. 32270–32298. https://neurips.cc/media/neurips-2023/Slides/70362.pdf
2023
-
[3]
The Algorithms. 2026. All Algorithms implemented in Python. https://github.com/TheAlgorithms/Python/blob/master/data_structures/ binary_tree/binary_search_tree_recursive.py
2026
-
[4]
The Algorithms. 2026. All Algorithms implemented in Python. https://github.com/TheAlgorithms/Python/blob/master/data_structures/ binary_tree/red_black_tree.py
2026
-
[5]
David Belicza. 2026. TextRank on Go. https://github.com/DavidBelicza/ TextRank
2026
-
[6]
The SWE bench Team. 2026. SWE-bench Leaderboard. https://www.swebench. com/
2026
-
[7]
Hugo Bollon. 2026. Go-edlib : Edit distance and string comparison library. https://github.com/hbollon/go-edlib
2026
- [8]
-
[9]
Xuemeng Cai, Jiakun Liu, Xiping Huang, Yijun Yu, Haitao Wu, Chunmiao Li, Bo Wang, Imam Nur Bani Yusuf, and Lingxiao Jiang. 2025. Rustmap: Towards project-scale c-to-rust migration via program analysis and LLM. InInternational Conference on Engineering of Complex Computer Systems. Springer, 283–302
2025
- [10]
-
[11]
Xinyun Chen, Chang Liu, and Dawn Song. 2018. Tree-to-tree neural networks for program translation.Advances in neural information processing systems31 (2018)
2018
-
[12]
Jimenez, John Yang, Leyton Ho, Tejal Patwardhan, Kevin Liu, and Aleksander Madry
Neil Chowdhury, James Aung, Chan Jun Shern, Oliver Jaffe, Dane Sherburn, Giulio Starace, Evan Mays, Rachel Dias, Marwan Aljubeh, Mia Glaese, Carlos E. Jimenez, John Yang, Leyton Ho, Tejal Patwardhan, Kevin Liu, and Aleksander Madry. 2024. Introducing SWE-bench Verified. https://openai.com/index/ introducing-swe-bench-verified/
2024
-
[13]
Vivid Cortex. 2026. gohistogram - Histograms in Go. https://github.com/ VividCortex/gohistogram
2026
- [14]
-
[15]
Peng Di, Jianguo Li, Hang Yu, Wei Jiang, Wenting Cai, Yang Cao, Chaoyu Chen, Dajun Chen, Hongwei Chen, Liang Chen, et al . 2024. Codefuse-13b: A pretrained multi-lingual code large language model. InProceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice. 418–429
2024
-
[16]
Yihong Dong, Xue Jiang, Zhi Jin, and Ge Li. 2024. Self-collaboration code gener- ation via chatgpt.ACM Transactions on Software Engineering and Methodology 33, 7 (2024), 1–38
2024
- [17]
-
[18]
Montana Flynn. 2026. Stats - Golang Statistics Package. https://github.com/ montanaflynn/stats
2026
-
[19]
The Apache Software Foundation. 2026. Apache Commons CLI. https://github. com/apache/commons-cli
2026
-
[20]
The Apache Software Foundation. 2026. Apache Commons CSV. https://github. com/apache/commons-csv
2026
-
[21]
The Apache Software Foundation. 2026. Apache Commons FileUpload. https: //github.com/apache/commons-fileupload
2026
-
[22]
The Apache Software Foundation. 2026. Apache Commons Validator. https: //github.com/apache/commons-validator
2026
-
[23]
GitHub. 2026. CodeQL. https://codeql.github.com
2026
- [24]
-
[25]
Dong Huang, Jie M Zhang, Michael Luck, Qingwen Bu, Yuhao Qing, and Heming Cui. 2023. Agentcoder: Multi-agent-based code generation with iterative testing and optimisation.arXiv preprint arXiv:2312.13010(2023)
work page internal anchor Pith review arXiv 2023
-
[26]
Ali Reza Ibrahimzada, Kaiyao Ke, Mrigank Pawagi, Muhammad Salman Abid, Rangeet Pan, Saurabh Sinha, and Reyhaneh Jabbarvand. 2025. AlphaTrans: A Neuro-Symbolic Compositional Approach for Repository-Level Code Transla- tion and Validation.Proc. ACM Softw. Eng.2, FSE, Article FSE109 (June 2025), 23 pages. doi:10.1145/3729379
-
[27]
Immunant. 2024. C2Rust Transpiler. https://github.com/immunant/c2rust
2024
-
[28]
Paul Irwin. 2026. Java to CSharp Converter. https://github.com/paulirwin/ JavaToCSharp
2026
-
[29]
Ashraful Islam, Mohammed Eunus Ali, and Md
Md. Ashraful Islam, Mohammed Eunus Ali, and Md Rizwan Parvez. 2024. Map- Coder: Multi-Agent Code Generation for Competitive Problem Solving. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Lun-Wei Ku, Andre Martins, and Vivek Srikumar (Eds.). Association for Computational Linguistics, Ban...
-
[30]
Suman Jain and Inderveer Chana. 2015. Modernization of legacy systems: A generalised roadmap. InProceedings of the Sixth International Conference on Computer and Communication Technology 2015. 62–67
2015
-
[31]
Pooyan Jamshidi, Aakash Ahmad, and Claus Pahl. 2013. Cloud migration research: a systematic review.IEEE transactions on cloud computing1, 2 (2013), 142–157
2013
-
[32]
Mingsheng Jiao, Tingrui Yu, Xuan Li, Guanjie Qiu, Xiaodong Gu, and Beijun Shen. 2023. On the evaluation of neural code translation: Taxonomy and benchmark. In2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1529–1541
2023
-
[33]
Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik R Narasimhan. 2024. SWE-bench: Can Language Models Resolve Real-world Github Issues?. InThe Twelfth International Conference on Learning Representations. https://openreview.net/forum?id=VTF8yNQM66
2024
- [34]
-
[35]
Ravi Khadka, Belfrit V Batlajery, Amir M Saeidi, Slinger Jansen, and Jurri- aan Hage. 2014. How do professionals perceive legacy systems and software modernization?. InProceedings of the 36th International Conference on Software Engineering. 36–47
2014
- [36]
- [37]
- [38]
- [39]
-
[40]
Shuyang Liu, Yang Chen, Rahul Krishna, Saurabh Sinha, Jatin Ganhotra, and Reyhan Jabbarvand. 2025. Process-Centric Analysis of Agentic Software Sys- tems.arXiv preprint arXiv:2512.02393(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [41]
-
[42]
ZhouYang Luo. 2026. A library implementing different string similarity and distance measures using Python. https://github.com/luozhouyang/python- string-similarity/tree/master/strsimpy
2026
- [43]
-
[44]
Microsoft. 2026. Language Server Implementations. https://microsoft.github. io/language-server-protocol/implementors/servers/
2026
-
[45]
Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N Nguyen. 2013. Lexical statistical machine translation for language migration. InProceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering. 651–654
2013
-
[46]
Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N Nguyen. 2014. Migrating code with statistical machine translation. InCompanion Proceedings of the 36th International Conference on Software Engineering. 544–547
2014
-
[47]
Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N Nguyen. 2015. Divide- and-conquer approach for multi-phase statistical migration for source code (t). In2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 585–596
2015
-
[48]
Wasif Nisar. 2022. Modernization framework to enhance the security of legacy information systems.Intelligent Automation & Soft Computing(2022)
2022
- [49]
- [50]
-
[51]
Oracle. 2026. GraalVM. https://www.graalvm.org. Ali Reza Ibrahimzada, Brandon Paulsen, Daniel Kroening, and Reyhaneh Jabbarvand
2026
-
[52]
Siru Ouyang, Wenhao Yu, Kaixin Ma, Zilin Xiao, Zhihan Zhang, Mengzhao Jia, Jiawei Han, Hongming Zhang, and Dong Yu. 2025. RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph. InThe Thirteenth International Conference on Learning Representations. https://openreview.net/ forum?id=dw9VUsSHGB
2025
-
[53]
Rangeet Pan, Ali Reza Ibrahimzada, Rahul Krishna, Divya Sankar, Lam- bert Pouguem Wassi, Michele Merler, Boris Sobolev, Raju Pavuluri, Saurabh Sinha, and Reyhaneh Jabbarvand. 2024. Lost in translation: A study of bugs introduced by large language models while translating code. InProceedings of the IEEE/ACM 46th International Conference on Software Enginee...
2024
-
[54]
Will Pearson. 2026. Python lib for TOML. https://github.com/uiri/toml/tree/ master/toml
2026
-
[55]
James Polera. 2026. gonameparts. https://github.com/polera/gonameparts
2026
-
[56]
Mono Project. 2026. Sharpen - Automated Java->C# coversion. https://github. com/mono/sharpen
2026
-
[57]
Chen Qian, Wei Liu, Hongzhang Liu, Nuo Chen, Yufan Dang, Jiahao Li, Cheng Yang, Weize Chen, Yusheng Su, Xin Cong, and others. 2024. ChatDev: Commu- nicative Agents for Software Development. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 15174–15186
2024
-
[58]
ReCodeAgent. 2026. Artifact Website. https://doi.org/10.5281/zenodo.19337799
-
[59]
Baptiste Roziere, Marie-Anne Lachaux, Lowik Chanussot, and Guillaume Lam- ple. 2020. Unsupervised translation of programming languages.Advances in neural information processing systems33 (2020), 20601–20611
2020
- [60]
-
[61]
Haifeng Ruan, Yuntong Zhang, and Abhik Roychoudhury. 2025. SpecRover: Code Intent Extraction via LLMs. In2025 IEEE/ACM 47th International Confer- ence on Software Engineering (ICSE). 963–974. doi:10.1109/ICSE55347.2025.00080
-
[62]
Manish Shetty, Naman Jain, Adwait Godbole, Sanjit A Seshia, and Koushik Sen
- [63]
-
[64]
HoHyun Sim, Hyeonjoong Cho, Yeonghyeon Go, Zhoulai Fu, Ali Shokri, and Binoy Ravindran. 2025. Large Language Model-Powered Agent for C to Rust Code Translation.arXiv preprint arXiv:2505.15858(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [65]
-
[66]
The Claude Code Team. 2026. Claude Code. https://github.com/anthropics/ claude-code
2026
-
[67]
The Eclipse Team. 2026. Eclipse JDT Language Server. https://github.com/ eclipse-jdtls/eclipse.jdt.ls
2026
-
[68]
The Go Team. 2026. Gopls: The language server for Go. https://go.dev/gopls/
2026
-
[69]
The LLVM Team. 2026. clangd. https://github.com/clangd/clangd
2026
-
[70]
The Python Team. 2026. Conversion functions between RGB and other color systems. https://github.com/python/cpython/blob/3.13/Lib/colorsys.py
2026
-
[71]
The Python Team. 2026. Heap queue algorithm (a.k.a. priority queue). https: //github.com/python/cpython/blob/3.13/Lib/heapq.py
2026
-
[72]
The Python Team. 2026. A parser for HTML and XHTML. https://github.com/ python/cpython/blob/3.13/Lib/html/parser.py
2026
-
[73]
The Qwen Team. 2026. Qwen Embedding. https://huggingface.co/Qwen/ Qwen3-Embedding-0.6B
2026
-
[74]
The Rust Language Team. 2026. Rust Analyzer. https://rust-analyzer.github.io/
2026
-
[75]
The Spyder IDE Team. 2026. Python LSP Server. https://github.com/python- lsp/python-lsp-server
2026
-
[76]
The TypeScript Language Server Team. 2026. TypeScript Language Server. https://github.com/typescript-language-server/typescript-language-server
2026
-
[77]
Sindhu Tipirneni, Ming Zhu, and Chandan K Reddy. 2024. Structcoder: Structure- aware transformer for code generation.ACM Transactions on Knowledge Dis- covery from Data18, 3 (2024), 1–20
2024
-
[78]
Osamu Tonomori. 2026. Checkdigit. https://github.com/osamingo/checkdigit
2026
-
[79]
Go Transpile. 2024. C to Go Translator. https://github.com/gotranspile/cxgo
2024
-
[80]
Tree-Sitter. 2026. Tree-Sitter Library. https://tree-sitter.github.io/tree-sitter/
2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.