TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment

Hanlin Wang; Weitong Chen; Xin Peng; Yiling Lou; Zhenpeng Chen; Zhiqiang Yuan

arxiv: 2409.19894 · v5 · submitted 2024-09-30 · 💻 cs.SE · cs.AI

TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment

Zhiqiang Yuan , Weitong Chen , Hanlin Wang , Xin Peng , Zhenpeng Chen , Yiling Lou This is my paper

Pith reviewed 2026-05-23 20:53 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords code translationlarge language modelsmulti-agent systemsexecution alignmenterror localizationprogram repairsoftware maintenance

0 comments

The pith

TransAGENT corrects errors in LLM code translations by using multi-agent fine-grained execution alignment to locate faulty blocks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TransAGENT, a multi-agent system that improves LLM-based code translation by identifying and fixing errors through detailed execution comparisons between source and target programs. Traditional learning methods struggle with insufficient parallel data, and even strong LLMs produce translations with syntax and semantic flaws that limit their use in software maintenance. By constructing a new benchmark of recent tasks, the work shows that alignment-based localization leads to measurable gains in accuracy and repair tasks. A sympathetic reader would care because reliable cross-language code movement remains a practical bottleneck in development workflows.

Core claim

TransAGENT is a novel multi-agent system that eliminates errors during LLM-based code translation. The main insight is to localize error-prone code blocks via fine-grained execution alignment between source and target code. Evaluated on a newly constructed benchmark of recent programming tasks to mitigate data leakage, TransAGENT outperforms the latest UniTrans by up to 33.3% in translation accuracy and achieves an average improvement of 56.7% over Agentless in program repair performance, with ablation studies and tests across LLMs confirming its effectiveness and generalizability.

What carries the argument

Fine-grained execution alignment between source and target code, performed by a multi-agent system to localize error-prone blocks.

If this is right

Translation accuracy rises by as much as 33.3 percent relative to the prior UniTrans method.
Program repair performance improves by an average of 56.7 percent compared with the Agentless baseline.
The gains hold when the underlying LLM is swapped, indicating broad applicability.
A fresh benchmark of recent tasks reduces the risk that reported numbers reflect memorized training data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The alignment technique could be applied to same-language code repair tasks where test coverage is sparse.
If alignment succeeds with partial executions, it may support migration of legacy systems that lack comprehensive test suites.
Integration with other agent workflows for code generation could create end-to-end pipelines for cross-language refactoring.

Load-bearing premise

Fine-grained execution alignment between source and target code can reliably localize error-prone blocks even without complete test suites and without creating alignment artifacts that hide real differences.

What would settle it

A set of translated programs where the alignment step marks a block as correct yet the block still produces wrong outputs on valid inputs, or marks an incorrect block while missing the actual error location.

Figures

Figures reproduced from arXiv: 2409.19894 by Hanlin Wang, Weitong Chen, Xin Peng, Yiling Lou, Zhenpeng Chen, Zhiqiang Yuan.

**Figure 3.** Figure 3: Workflow of TRANSAGENT behaviors from its aligned block in the source program; and then it leverages LLMs to specifically fix the error block with the observed runtime difference. Semantic Error Fixer is novel in fixing the semantic errors during code translation in such a fine-grained way. In particular, whenever the target program passing all the generated tests, the workflow terminates and the target pr… view at source ↗

**Figure 4.** Figure 4: Source Python and Ground-truth Java Program of [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Prompts in Syntax Error Fixer The patched target program would further go to syntax validation of the next iteration. The iterative process terminates when (i) there are no syntax errors or (ii) there are the same syntax errors occurring at the same buggy location as the previous iteration (to avoid being stuck in an endless loop). Otherwise, if there are syntax errors different from the previous iteration… view at source ↗

**Figure 6.** Figure 6: Prompts in Coder Aligner error information (i.e., Semantic Patch Generation). Different from previous LLM-based code translation work [12], [11] that directly leverages LLMs to fix semantic errors without pinpointing the suspicious location, Semantic Error Fixer can (i) not only narrow down the fixing space by pinpointing the error target block (ii) but also provide detailed error information about the ru… view at source ↗

**Figure 7.** Figure 7: a illustrates the vanilla fix [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

**Figure 7.** Figure 7: Prompts in Semantic Patch Generation tasks in different programming languages, which are released after August 2023. Specifically, we focus on three popular programming languages, i.e., Java, Python, and C++. As the solutions in these websites typically come with only two or three test cases, which can be insufficient for guaranteeing the semantic correctness of code [38], we further leverage gpt-4o-mini [… view at source ↗

**Figure 8.** Figure 8: Example of Mapping Results of TransMap and T [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

read the original abstract

Code translation transforms code between programming languages while preserving functionality, which is critical in software development and maintenance. While traditional learning-based code translation methods have limited effectiveness due to the lack of sufficient parallel training data, Large Language Models (LLMs) have recently advanced this field with their strong code generation and comprehension capabilities. However, code translated by LLMs still suffers from diverse quality issues, such as syntax and semantic errors. In this work, we propose TransAGENT, a novel multi-agent system that eliminates the errors during LLM-based code translation. The main insight of TransAGENT is to localize error-prone code blocks via fine-grained execution alignment between source and target code. We evaluate TransAGENT on a newly constructed benchmark of recent programming tasks to mitigate data leakage. TransAGENT outperforms the latest UniTrans by up to 33.3% in translation accuracy and achieves an average improvement of 56.7% over Agentless in program repair performance. We also conduct an ablation study and evaluate TransAGENT across different LLMs, demonstrating its effectiveness and strong generalizability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TransAgent combines multi-agent orchestration with execution alignment to cut errors in LLM code translation and reports clear gains on a new benchmark, but the evaluation needs more detail to pin down how robust the deltas really are.

read the letter

TransAgent is a multi-agent system that localizes translation errors by aligning fine-grained execution traces between source and target code. The headline results are the 33.3% accuracy lift over UniTrans and the 56.7% repair improvement over Agentless on a benchmark built from recent tasks to limit leakage. That combination of agents plus alignment for error spotting is the concrete new piece, even though agents and execution feedback have appeared separately before. The authors also run ablations and check the system on different LLMs, which helps show the gains are not tied to one model. Those steps are useful and keep the work grounded in practice rather than abstract claims. The evaluation stays empirical with no fitted parameters or circular definitions, so the argument rests on the external comparisons. The main soft spot is the benchmark itself. Without seeing exactly how the tasks were chosen and filtered, it is hard to judge whether the reported numbers could shift under different sampling or if the alignment step introduces its own artifacts when test coverage is incomplete. The abstract does not include variance or significance numbers, so the size of the improvement is still a bit opaque. Those are fixable issues rather than fatal ones. The paper is aimed at people building LLM agents for code maintenance and migration. Anyone already working on translation pipelines or repair loops will get a usable system sketch and a fresh test set to compare against. It is coherent on its own terms and engages the prior literature without obvious contradictions. I would send it to peer review so referees can check the benchmark construction and the alignment mechanics in detail.

Referee Report

3 major / 2 minor

Summary. The paper introduces TransAGENT, a multi-agent system for LLM-based code translation that localizes error-prone blocks via fine-grained execution alignment between source and target code. It constructs a new benchmark of recent programming tasks to reduce data leakage, reports up to 33.3% higher translation accuracy than UniTrans and 56.7% average improvement over Agentless on program repair, and includes ablation studies plus evaluations across multiple LLMs to demonstrate generalizability.

Significance. If the empirical gains hold under scrutiny, the work offers a practical mechanism for improving semantic fidelity in cross-language translation by grounding LLM outputs in execution traces rather than static analysis alone. The new benchmark construction is a constructive contribution for the field, and the multi-agent framing with explicit alignment could generalize to other code maintenance tasks.

major comments (3)

[§3] §3 (Method), execution alignment procedure: the central claim that fine-grained alignment reliably localizes errors without complete test suites is not supported by a concrete algorithm or pseudocode; the description leaves open how partial traces are matched and whether alignment artifacts could mask semantic differences, which directly underpins the reported accuracy deltas.
[§4.1] §4.1 (Benchmark), Table 1: the construction details for the new benchmark (task selection criteria, leakage mitigation steps, and test-suite coverage statistics) are insufficient to assess whether the 33.3% and 56.7% gains are robust or sensitive to post-hoc choices; no inter-rater agreement or leakage audit is reported.
[§4.2] §4.2 (Results), accuracy and repair tables: the improvements are presented as point estimates without statistical significance tests, confidence intervals, or variance across random seeds; this weakens the claim that TransAGENT consistently outperforms the baselines.

minor comments (2)

[Abstract] The abstract and §1 use “up to 33.3%” and “average improvement of 56.7%” without clarifying whether these are relative or absolute gains or on which exact metric subsets.
[Figure 2] Figure 2 (agent workflow) would benefit from explicit labeling of the execution-alignment step and data flow between agents.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify key aspects of the method, benchmark, and evaluation. We address each major point below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [§3] §3 (Method), execution alignment procedure: the central claim that fine-grained alignment reliably localizes errors without complete test suites is not supported by a concrete algorithm or pseudocode; the description leaves open how partial traces are matched and whether alignment artifacts could mask semantic differences, which directly underpins the reported accuracy deltas.

Authors: We agree that a formal algorithmic description would improve clarity and reproducibility. In the revised manuscript we will add pseudocode (as a new Algorithm 1 in §3) that explicitly specifies the trace-matching procedure for partial executions, the similarity metric used, and safeguards against masking semantic differences (e.g., by requiring both syntactic and semantic equivalence checks on aligned blocks). revision: yes
Referee: [§4.1] §4.1 (Benchmark), Table 1: the construction details for the new benchmark (task selection criteria, leakage mitigation steps, and test-suite coverage statistics) are insufficient to assess whether the 33.3% and 56.7% gains are robust or sensitive to post-hoc choices; no inter-rater agreement or leakage audit is reported.

Authors: We acknowledge the need for greater transparency. The revised §4.1 and Table 1 will include: (i) explicit task-selection criteria (problems posted after 2023 on LeetCode/Codeforces with at least three test cases), (ii) leakage-mitigation steps (timestamp filtering plus manual overlap checks against common pre-training corpora), (iii) test-suite coverage statistics (average number of tests per task and branch coverage), and (iv) results of a leakage audit together with inter-rater agreement (Cohen’s κ) for any manual verification steps. revision: yes
Referee: [§4.2] §4.2 (Results), accuracy and repair tables: the improvements are presented as point estimates without statistical significance tests, confidence intervals, or variance across random seeds; this weakens the claim that TransAGENT consistently outperforms the baselines.

Authors: We agree that statistical rigor is required. The revised §4.2 will report: (i) paired statistical significance tests (Wilcoxon signed-rank) with p-values, (ii) 95% confidence intervals computed via bootstrap resampling, and (iii) standard deviation across five random seeds for both translation accuracy and program-repair metrics. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper presents an empirical multi-agent system (TransAGENT) whose core contribution is fine-grained execution alignment for error localization in LLM code translation. Evaluation relies on a newly constructed benchmark and direct comparisons to external baselines (UniTrans, Agentless) with reported accuracy deltas. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the provided text. The derivation chain consists of system design followed by external benchmarking; all performance claims are falsifiable against independent test suites and do not reduce to internal definitions or prior author work by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an applied empirical system paper. No free parameters, mathematical axioms, or invented physical entities are invoked; the contribution is an engineered workflow whose correctness is asserted via benchmark results.

pith-pipeline@v0.9.0 · 5729 in / 999 out tokens · 41436 ms · 2026-05-23T20:53:12.587450+00:00 · methodology

discussion (0)

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation
cs.SE 2026-05 unverdicted novelty 7.0

Many reported failures in LLM-based code translation are false negatives due to evaluation pipeline issues such as improper compilation flags, missing library links, and unconfigured runtime environments rather than i...
uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs
cs.CR 2026-05 unverdicted novelty 6.0

uGen is the first retrieval-augmented multi-agent LLM framework for generating functionally correct microarchitectural attack PoCs, reporting up to 100% success on Spectre-v1 and 80% on Prime+Probe at low cost.
Project-Level C-to-Rust Translation via Pointer Knowledge Graphs
cs.SE 2025-10 unverdicted novelty 6.0

PtrTrans builds a Pointer Knowledge Graph with points-to flows, struct abstractions, and Rust annotations to guide LLMs toward project-level C-to-Rust translations that cut unsafe code by 99.9% and raise functional co...
Neural Code Translation of Legacy Code: APL to C#
cs.SE 2026-05 unverdicted novelty 5.0

Guided LLM strategies with custom datasets and execution-based verification enable functional APL-to-C# translation across a range of program complexities.
Boosting Automatic Java-to-Cangjie Translation with Multi-Stage LLM Training and Error Repair
cs.SE 2026-05 unverdicted novelty 5.0

Multi-stage LLM training plus compiler-guided error repair boosts functional equivalence in Java-to-Cangjie translation by 6.06% over prior methods despite scarce parallel data.
Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation
cs.SE 2026-05 unverdicted novelty 5.0

A large-scale study finds that many LLM code translation failures are false negatives due to improper evaluation configurations rather than incorrect translations.
Large Language Model-Based Agents for Software Engineering: A Survey
cs.SE 2024-09 unverdicted novelty 4.0

A literature survey that collects and categorizes 124 papers on LLM-based agents for software engineering from SE and agent perspectives.

Reference graph

Works this paper leans on

79 extracted references · 79 canonical work pages · cited by 6 Pith papers · 2 internal anchors

[1]

Migrating monoliths to microservices-based customizable multi-tenant cloud-native apps

Sindre Grønstøl Haugeland, Phu Hong Nguyen, Hui Song, and Franck Chauvel. Migrating monoliths to microservices-based customizable multi-tenant cloud-native apps. In 47th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2021, Palermo, Italy, September 1-3, 2021 , pages 170–177. IEEE, 2021

work page 2021
[2]

In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) , pages 3–3, 2021

Transforming monolithic applications to microservices with mono2micro. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) , pages 3–3, 2021

work page 2021
[3]

Legacy web application modernization by generating a rest service layer

Roberto Rodriguez Echeverria, Fernando Macias, Victor Manuel Pavon, Jose Maria Conejero, and Fernando Sanchez Figueroa. Legacy web application modernization by generating a rest service layer. IEEE Latin America Transactions, 13(7):2379–2383, 2015

work page 2015
[4]

Mahdi Fahmideh, Farhad Daneshgar, Ghassan Beydoun, and Fethi A. Rabhi. Challenges in migrating legacy software systems to the cloud an empirical study. CoRR, abs/2004.10724, 2020

work page arXiv 2004
[5]

CARGO: ai-guided dependency analysis for migrating monolithic appli- cations to microservices architecture

Vikram Nitin, Shubhi Asthana, Baishakhi Ray, and Rahul Krishna. CARGO: ai-guided dependency analysis for migrating monolithic appli- cations to microservices architecture. In 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022, Rochester, MI, USA, October 10-14, 2022 , pages 20:1–20:12. ACM, 2022

work page 2022
[6]

Unsupervised translation of programming languages

Baptiste Rozi `ere, Marie-Anne Lachaux, Lowik Chanussot, and Guil- laume Lample. Unsupervised translation of programming languages. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual , 2020

work page 2020
[7]

Leveraging automated unit tests for unsupervised code translation

Baptiste Rozi `ere, Jie Zhang, Franc ¸ois Charton, Mark Harman, Gabriel Synnaeve, and Guillaume Lample. Leveraging automated unit tests for unsupervised code translation. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29,

work page 2022
[8]

Code translation with compiler representations

Marc Szafraniec, Baptiste Rozi `ere, Hugh Leather, Patrick Labatut, Franc ¸ois Charton, and Gabriel Synnaeve. Code translation with compiler representations. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 . OpenRe- view.net, 2023

work page 2023
[9]

Summarize and generate to back-translate: Unsupervised translation of programming languages

Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai- Wei Chang. Summarize and generate to back-translate: Unsupervised translation of programming languages. In Proceedings of the 17th Con- ference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, May 2-6, 2023 , pages 1520–1534. Association f...

work page 2023
[10]

Yiqing Xie, Atharva Naik, Daniel Fried, and Carolyn P. Ros ´e. Data augmentation for code translation with comparable corpora and multiple references. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023 , pages 13725–13739. Association for Computational Linguistics, 2023

work page 2023
[11]

Lost in translation: A study of bugs introduced by large language models while translating code

Rangeet Pan, Ali Reza Ibrahimzada, Rahul Krishna, Divya Sankar, and et al. Lost in translation: A study of bugs introduced by large language models while translating code. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024 , pages 82:1–82:13. ACM, 2024

work page 2024
[12]

Exploring and unleashing the power of large language models in automated code translation

Zhen Yang, Fang Liu, Zhongxing Yu, Jacky Wai Keung, Jia Li, Shuo Liu, Yifan Hong, Xiaoxue Ma, Zhi Jin, and Ge Li. Exploring and unleashing the power of large language models in automated code translation. Proc. ACM Softw. Eng., 1(FSE):1585–1608, 2024

work page 2024
[13]

Reasoning runtime behavior of a program with llm: How far are we? arXiv e-prints, 2024

Junkai Chen, Zhiyuan Pan, Xing Hu, Zhenhao Li, Ge Li, and Xin Xia. Reasoning runtime behavior of a program with llm: How far are we? arXiv e-prints, 2024

work page 2024
[14]

Large language model-based agents for software engineering: A survey, 2024

Junwei Liu, Kaixin Wang, Yixuan Chen, Xin Peng, Zhenpeng Chen, Lingming Zhang, and Yiling Lou. Large language model-based agents for software engineering: A survey, 2024

work page 2024
[15]

Transmap: Pinpointing mistakes in neural code translation

Bo Wang, Ruishi Li, Mingkai Li, and Prateek Saxena. Transmap: Pinpointing mistakes in neural code translation. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023 , pages 999–1011. ACM, 2023

work page 2023
[16]

deepseek-coder-6.7b instruct. 2023

work page 2023
[17]

minimumArrayLength. 2024.01

work page 2024
[18]

minOperations. 2024.03

work page 2024
[19]

minOrAfterOperations. 2024.01

work page 2024
[20]

Sherwood, E

T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques, pages 3–14, 2001

work page 2001
[21]

Gamma: Revisiting template-based automated program repair via mask prediction

Quanjun Zhang, Chunrong Fang, Tongke Zhang, Bowen Yu, Weisong Sun, and Zhenyu Chen. Gamma: Revisiting template-based automated program repair via mask prediction. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxem- bourg, September 11-15, 2023 , pages 535–547. IEEE, 2023

work page 2023
[22]

Copiloting the copilots: Fusing large language models with completion engines for automated program repair

Yuxiang Wei, Chunqiu Steven Xia, and Lingming Zhang. Copiloting the copilots: Fusing large language models with completion engines for automated program repair. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023 , ...

work page 2023
[23]

The plastic surgery hypothesis in the era of large language models

Chunqiu Steven Xia, Yifeng Ding, and Lingming Zhang. The plastic surgery hypothesis in the era of large language models. In 38th IEEE/ACM International Conference on Automated Software Engineer- ing, ASE 2023, Luxembourg, September 11-15, 2023 , pages 522–534. IEEE, 2023

work page 2023
[24]

Less training, more repairing please: revisiting automated program repair via zero-shot learning

Chunqiu Steven Xia and Lingming Zhang. Less training, more repairing please: revisiting automated program repair via zero-shot learning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022 , pages 959–971. ACM, 2022

work page 2022
[25]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V . Le, and Denny Zhou. Chain- of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems 35: Annual Con- ference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, Novemb...

work page 2022
[26]

Le, Ed H

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V . Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self- consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 . OpenReview.net, 2023

work page 2023
[27]

To- wards better chain-of-thought prompting strategies: A survey

Zihan Yu, Liang He, Zhen Wu, Xinyu Dai, and Jiajun Chen. To- wards better chain-of-thought prompting strategies: A survey. CoRR, abs/2310.04959, 2023

work page arXiv 2023
[28]

DOBF: A deobfuscation pre-training objective for programming languages

Marie-Anne Lachaux, Baptiste Rozi `ere, Marc Szafraniec, and Guillaume Lample. DOBF: A deobfuscation pre-training objective for programming languages. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 14967–14979, 2021

work page 2021
[29]

Clement, Dawn Drain, and et al

Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin B. Clement, Dawn Drain, and et al. Codexglue: A machine learning benchmark dataset for code understanding and generation. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021...

work page 2021
[30]

A V ATAR: A parallel corpus for java-python program translation

Wasi Uddin Ahmad, Md Golam Rahman Tushar, Saikat Chakraborty, and Kai-Wei Chang. A V ATAR: A parallel corpus for java-python program translation. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023 , pages 2268–

work page 2023
[31]

Association for Computational Linguistics, 2023

work page 2023
[32]

Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. Lex- ical statistical machine translation for language migration. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013, pages 651–654. ACM, 2013

work page 2013
[33]

Tree-to-tree neural networks for program translation

Xinyun Chen, Chang Liu, and Dawn Song. Tree-to-tree neural networks for program translation. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Workshop Track Proceedings . OpenReview.net, 2018

work page 2018
[34]

Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, and et al

Ruchir Puri, David S. Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, and et al. Project codenet: A large- scale AI for code dataset for learning a diversity of coding tasks. CoRR, abs/2105.12655, 2021

work page arXiv 2021
[35]

Ming Zhu, Karthik Suresh, and Chandan K. Reddy. Multilingual code snippets training for program translation. In Thirty-Sixth AAAI Confer- ence on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022...

work page 2022
[36]

Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation

Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Processing Systems 36: Annual Conference on N...

work page 2023
[37]

Llama-3-8B-Instruct. 2023

work page 2023
[38]

CodeBLEU: a Method for Automatic Evaluation of Code Synthesis

Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. Code- bleu: a method for automatic evaluation of code synthesis. CoRR, abs/2009.10297, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2009
[39]

Elements of survey sampling, volume 15

Ravindra Singh and Naurang Singh Mangat. Elements of survey sampling, volume 15. Springer Science & Business Media, 2013

work page 2013
[40]

Math- ematical statistics with applications

Dennis Wackerly, William Mendenhall, and Richard L Scheaffer. Math- ematical statistics with applications . Cengage Learning, 2014

work page 2014
[41]

Richard Landis and Gary G

J. Richard Landis and Gary G. Koch. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics, 33(2):363–374, 1977

work page 1977
[42]

cxgo: C to Go transpiler. 2024

work page 2024
[43]

https://github.com/mono/sharpen, 2020

Sharpen. https://github.com/mono/sharpen, 2020

work page 2020
[44]

https://github.com/paulirwin/JavaToCSharp, 2024

JavaToCSharp. https://github.com/paulirwin/JavaToCSharp, 2024

work page 2024
[45]

Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. Migrating code with statistical machine translation. In 36th International Con- ference on Software Engineering, ICSE ’14, Companion Proceedings, Hyderabad, India, May 31 - June 07, 2014, pages 544–547. ACM, 2014

work page 2014
[46]

Svetoslav Karaivanov, Veselin Raychev, and Martin T. Vechev. Phrase- based statistical translation of programming languages. In Onward! 2014, Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, part of SPLASH ’14, Portland, OR, USA, October 20-24, 2014 , pages 173–184. ACM, 2014

work page 2014
[47]

Learning to generate pseudo-code from source code using statistical machine translation (T)

Yusuke Oda, Hiroyuki Fudaba, Graham Neubig, Hideaki Hata, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura. Learning to generate pseudo-code from source code using statistical machine translation (T). In 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015 , pages 574–584. IEEE Computer...

work page 2015
[48]

Evaluating Large Language Models Trained on Code

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pond ´e de Oliveira Pinto, Jared Kaplan, Harri Edwards, and et al. Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[49]

Self-collaboration code generation via chatgpt

Yihong Dong, Xue Jiang, Zhi Jin, and Ge Li. Self-collaboration code generation via chatgpt. CoRR, abs/2304.07590, 2023

work page arXiv 2023
[50]

Evaluating and improving chatgpt for unit test generation

Zhiqiang Yuan, Mingwei Liu, Shiji Ding, Kaixin Wang, Yixuan Chen, Xin Peng, and Yiling Lou. Evaluating and improving chatgpt for unit test generation. Proc. ACM Softw. Eng. , 1(FSE):1703–1726, 2024

work page 2024
[51]

Automated repair of programs from large language models

Zhiyu Fan, Xiang Gao, Martin Mirchev, Abhik Roychoudhury, and Shin Hwei Tan. Automated repair of programs from large language models. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023 , pages 1469–1481. IEEE, 2023

work page 2023
[52]

Automated program repair in the era of large pre-trained language models

Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. Automated program repair in the era of large pre-trained language models. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023 , pages 1482–1494. IEEE, 2023

work page 2023
[53]

Toufique Ahmed and Premkumar T. Devanbu. Few-shot training llms for project-specific code-summarization. In 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022, Rochester, MI, USA, October 10-14, 2022 , pages 177:1–177:5. ACM, 2022

work page 2022
[54]

An empirical study on using large language models for multi-intent comment generation

Mingyang Geng, Shangwen Wang, Dezun Dong, Haotian Wang, Ge Li, Zhi Jin, Xiaoguang Mao, and Xiangke Liao. An empirical study on using large language models for multi-intent comment generation. CoRR, abs/2304.11384, 2023

work page arXiv 2023
[55]

Ahead of time mutation based fault localisation using statistical inference

Jinhan Kim, Gabin An, Robert Feldt, and Shin Yoo. Ahead of time mutation based fault localisation using statistical inference. In 32nd IEEE International Symposium on Software Reliability Engineering, ISSRE 2021, Wuhan, China, October 25-28, 2021, pages 253–263. IEEE, 2021

work page 2021
[56]

Metallaxis-fl: mutation-based fault localization

Mike Papadakis and Yves Le Traon. Metallaxis-fl: mutation-based fault localization. Softw. Test. Verification Reliab., 25(5-7):605–628, 2015

work page 2015
[57]

FATOC: bug isolation based multi-fault localization by using OPTICS clustering

Yonghao Wu, Zheng Li, Yong Liu, and Xiang Chen. FATOC: bug isolation based multi-fault localization by using OPTICS clustering. J. Comput. Sci. Technol., 35(5):979–998, 2020

work page 2020
[58]

Hassan, Khaled Wassif, Ramadan Moawad, and Soha Makady

Amr Mansour Mohsen, Hesham A. Hassan, Khaled Wassif, Ramadan Moawad, and Soha Makady. Enhancing bug localization using phase- based approach. IEEE Access, 11:35901–35913, 2023

work page 2023
[59]

Fast changeset-based bug localization with BERT

Agnieszka Ciborowska and Kostadin Damevski. Fast changeset-based bug localization with BERT. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022 , pages 946–957. ACM, 2022

work page 2022
[60]

Trobo: A novel deep transfer model for enhancing cross-project bug localization

Ziye Zhu, Yu Wang, and Yun Li. Trobo: A novel deep transfer model for enhancing cross-project bug localization. In Knowledge Science, Engineering and Management - 14th International Conference, KSEM 2021, Tokyo, Japan, August 14-16, 2021, Proceedings, Part I , volume 12815 of Lecture Notes in Computer Science , pages 529–541. Springer, 2021

work page 2021
[61]

A preliminary evaluation of llm-based fault localization

Sungmin Kang, Gabin An, and Shin Yoo. A preliminary evaluation of llm-based fault localization. CoRR, abs/2308.05487, 2023

work page arXiv 2023
[62]

Pruning dynamic slices with confidence

Xiangyu Zhang, Neelam Gupta, and Rajiv Gupta. Pruning dynamic slices with confidence. In Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, Ottawa, Ontario, Canada, June 11-14, 2006 , pages 169–180. ACM, 2006

work page 2006
[63]

REPT: reverse debugging of failures in deployed software

Weidong Cui, Xinyang Ge, Baris Kasikci, Ben Niu, Upamanyu Sharma, Ruoyu Wang, and Insu Yun. REPT: reverse debugging of failures in deployed software. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8-10, 2018, pages 17–32. USENIX Association, 2018

work page 2018
[64]

Shaping program repair space with existing patches and similar code

Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. Shaping program repair space with existing patches and similar code. In Proceedings of the 27th ACM SIGSOFT International Sympo- sium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, July 16-21, 2018 , pages 298–309. ACM, 2018

work page 2018
[65]

ARJA: automated repair of java pro- grams via multi-objective genetic programming

Yuan Yuan and Wolfgang Banzhaf. ARJA: automated repair of java pro- grams via multi-objective genetic programming. IEEE Trans. Software Eng., 46(10):1040–1067, 2020

work page 2020
[66]

ASTOR: a program repair library for java (demo)

Matias Martinez and Martin Monperrus. ASTOR: a program repair library for java (demo). In Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, Saarbr¨ucken, Germany, July 18-20, 2016 , pages 441–444. ACM, 2016

work page 2016
[67]

Precise condition synthesis for program repair

Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. Precise condition synthesis for program repair. In Proceedings of the 39th International Conference on Software Engi- neering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017 , pages 416–426. IEEE / ACM, 2017

work page 2017
[68]

Nopol: Automatic repair of conditional statement bugs in java programs

Jifeng Xuan, Matias Martinez, Favio Demarco, Maxime Cl ´ement, and et al. Nopol: Automatic repair of conditional statement bugs in java programs. CoRR, abs/1811.04211, 2018

work page arXiv 2018
[69]

Ultra-large repair search space with automatically mined templates: The cardumen mode of astor

Matias Martinez and Martin Monperrus. Ultra-large repair search space with automatically mined templates: The cardumen mode of astor. In Search-Based Software Engineering - 10th International Symposium, SSBSE 2018, Montpellier, France, September 8-9, 2018, Proceedings , volume 11036 of Lecture Notes in Computer Science , pages 65–86. Springer, 2018

work page 2018
[70]

Bissyand ´e

Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawend ´e F. Bissyand ´e. Tbar: revisiting template-based automated program repair. In Proceed- ings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2019, Beijing, China, July 15-19, 2019 , pages 31–42. ACM, 2019

work page 2019
[71]

Bissyand´e, Dongsun Kim, Jacques Klein, Martin Monperrus, and Yves Le Traon

Anil Koyuncu, Kui Liu, Tegawend ´e F. Bissyand´e, Dongsun Kim, Jacques Klein, Martin Monperrus, and Yves Le Traon. Fixminer: Mining relevant fix patterns for automated program repair. Empir. Softw. Eng., 25(3):1980–2024, 2020

work page 1980
[72]

Bissyand ´e

Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawend ´e F. Bissyand ´e. A V ATAR: fixing semantic bugs with fix patterns of static analysis violations. In 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2019, Hangzhou, China, February 24-27, 2019, pages 456–467. IEEE, 2019

work page 2019
[73]

Sequencer: Sequence-to- sequence learning for end-to-end program repair

Zimin Chen, Steve Kommrusch, Michele Tufano, Louis-No ¨el Pouchet, Denys Poshyvanyk, and Martin Monperrus. Sequencer: Sequence-to- sequence learning for end-to-end program repair. IEEE Trans. Software Eng., 47(9):1943–1959, 2021

work page 1943
[74]

Coconut: combining context-aware neural translation models using ensemble for program repair

Thibaud Lutellier, Hung Viet Pham, Lawrence Pang, Yitong Li, Moshi Wei, and Lin Tan. Coconut: combining context-aware neural translation models using ensemble for program repair. In ISSTA ’20: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, USA, July 18-22, 2020 , pages 101–114. ACM, 2020

work page 2020
[75]

Tare: Type-aware neural program repair

Qihao Zhu, Zeyu Sun, Wenjie Zhang, Yingfei Xiong, and Lu Zhang. Tare: Type-aware neural program repair. In 45th IEEE/ACM Interna- tional Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023 , pages 1443–1455. IEEE, 2023

work page 2023
[76]

A survey of learning-based automated program repair

Quanjun Zhang, Chunrong Fang, Yuxiang Ma, Weisong Sun, and Zhenyu Chen. A survey of learning-based automated program repair. ACM Trans. Softw. Eng. Methodol. , 33(2):55:1–55:69, 2024

work page 2024
[77]

Pre-trained model-based automated software vulnerability repair: How far are we? IEEE Trans

Quanjun Zhang, Chunrong Fang, Bowen Yu, Weisong Sun, Tongke Zhang, and Zhenyu Chen. Pre-trained model-based automated software vulnerability repair: How far are we? IEEE Trans. Dependable Secur. Comput., 21(4):2507–2525, 2024

work page 2024
[78]

Fixing rust compilation errors using llms

Pantazis Deligiannis, Akash Lal, Nikita Mehrotra, and Aseem Rastogi. Fixing rust compilation errors using llms. CoRR, abs/2308.05177, 2023

work page arXiv 2023
[79]

Repair is nearly generation: Multilingual program repair with llms

Harshit Joshi, Jos ´e Pablo Cambronero S ´anchez, Sumit Gulwani, Vu Le, Gust Verbruggen, and Ivan Radicek. Repair is nearly generation: Multilingual program repair with llms. In Thirty-Seventh AAAI Confer- ence on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Sympos...

work page 2023

[1] [1]

Migrating monoliths to microservices-based customizable multi-tenant cloud-native apps

Sindre Grønstøl Haugeland, Phu Hong Nguyen, Hui Song, and Franck Chauvel. Migrating monoliths to microservices-based customizable multi-tenant cloud-native apps. In 47th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2021, Palermo, Italy, September 1-3, 2021 , pages 170–177. IEEE, 2021

work page 2021

[2] [2]

In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) , pages 3–3, 2021

Transforming monolithic applications to microservices with mono2micro. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) , pages 3–3, 2021

work page 2021

[3] [3]

Legacy web application modernization by generating a rest service layer

Roberto Rodriguez Echeverria, Fernando Macias, Victor Manuel Pavon, Jose Maria Conejero, and Fernando Sanchez Figueroa. Legacy web application modernization by generating a rest service layer. IEEE Latin America Transactions, 13(7):2379–2383, 2015

work page 2015

[4] [4]

Mahdi Fahmideh, Farhad Daneshgar, Ghassan Beydoun, and Fethi A. Rabhi. Challenges in migrating legacy software systems to the cloud an empirical study. CoRR, abs/2004.10724, 2020

work page arXiv 2004

[5] [5]

CARGO: ai-guided dependency analysis for migrating monolithic appli- cations to microservices architecture

Vikram Nitin, Shubhi Asthana, Baishakhi Ray, and Rahul Krishna. CARGO: ai-guided dependency analysis for migrating monolithic appli- cations to microservices architecture. In 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022, Rochester, MI, USA, October 10-14, 2022 , pages 20:1–20:12. ACM, 2022

work page 2022

[6] [6]

Unsupervised translation of programming languages

Baptiste Rozi `ere, Marie-Anne Lachaux, Lowik Chanussot, and Guil- laume Lample. Unsupervised translation of programming languages. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual , 2020

work page 2020

[7] [7]

Leveraging automated unit tests for unsupervised code translation

Baptiste Rozi `ere, Jie Zhang, Franc ¸ois Charton, Mark Harman, Gabriel Synnaeve, and Guillaume Lample. Leveraging automated unit tests for unsupervised code translation. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29,

work page 2022

[8] [8]

Code translation with compiler representations

Marc Szafraniec, Baptiste Rozi `ere, Hugh Leather, Patrick Labatut, Franc ¸ois Charton, and Gabriel Synnaeve. Code translation with compiler representations. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 . OpenRe- view.net, 2023

work page 2023

[9] [9]

Summarize and generate to back-translate: Unsupervised translation of programming languages

Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai- Wei Chang. Summarize and generate to back-translate: Unsupervised translation of programming languages. In Proceedings of the 17th Con- ference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, May 2-6, 2023 , pages 1520–1534. Association f...

work page 2023

[10] [10]

Yiqing Xie, Atharva Naik, Daniel Fried, and Carolyn P. Ros ´e. Data augmentation for code translation with comparable corpora and multiple references. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023 , pages 13725–13739. Association for Computational Linguistics, 2023

work page 2023

[11] [11]

Lost in translation: A study of bugs introduced by large language models while translating code

Rangeet Pan, Ali Reza Ibrahimzada, Rahul Krishna, Divya Sankar, and et al. Lost in translation: A study of bugs introduced by large language models while translating code. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024 , pages 82:1–82:13. ACM, 2024

work page 2024

[12] [12]

Exploring and unleashing the power of large language models in automated code translation

Zhen Yang, Fang Liu, Zhongxing Yu, Jacky Wai Keung, Jia Li, Shuo Liu, Yifan Hong, Xiaoxue Ma, Zhi Jin, and Ge Li. Exploring and unleashing the power of large language models in automated code translation. Proc. ACM Softw. Eng., 1(FSE):1585–1608, 2024

work page 2024

[13] [13]

Reasoning runtime behavior of a program with llm: How far are we? arXiv e-prints, 2024

Junkai Chen, Zhiyuan Pan, Xing Hu, Zhenhao Li, Ge Li, and Xin Xia. Reasoning runtime behavior of a program with llm: How far are we? arXiv e-prints, 2024

work page 2024

[14] [14]

Large language model-based agents for software engineering: A survey, 2024

Junwei Liu, Kaixin Wang, Yixuan Chen, Xin Peng, Zhenpeng Chen, Lingming Zhang, and Yiling Lou. Large language model-based agents for software engineering: A survey, 2024

work page 2024

[15] [15]

Transmap: Pinpointing mistakes in neural code translation

Bo Wang, Ruishi Li, Mingkai Li, and Prateek Saxena. Transmap: Pinpointing mistakes in neural code translation. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023 , pages 999–1011. ACM, 2023

work page 2023

[16] [16]

deepseek-coder-6.7b instruct. 2023

work page 2023

[17] [17]

minimumArrayLength. 2024.01

work page 2024

[18] [18]

minOperations. 2024.03

work page 2024

[19] [19]

minOrAfterOperations. 2024.01

work page 2024

[20] [20]

Sherwood, E

T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques, pages 3–14, 2001

work page 2001

[21] [21]

Gamma: Revisiting template-based automated program repair via mask prediction

Quanjun Zhang, Chunrong Fang, Tongke Zhang, Bowen Yu, Weisong Sun, and Zhenyu Chen. Gamma: Revisiting template-based automated program repair via mask prediction. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxem- bourg, September 11-15, 2023 , pages 535–547. IEEE, 2023

work page 2023

[22] [22]

Copiloting the copilots: Fusing large language models with completion engines for automated program repair

Yuxiang Wei, Chunqiu Steven Xia, and Lingming Zhang. Copiloting the copilots: Fusing large language models with completion engines for automated program repair. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023 , ...

work page 2023

[23] [23]

The plastic surgery hypothesis in the era of large language models

Chunqiu Steven Xia, Yifeng Ding, and Lingming Zhang. The plastic surgery hypothesis in the era of large language models. In 38th IEEE/ACM International Conference on Automated Software Engineer- ing, ASE 2023, Luxembourg, September 11-15, 2023 , pages 522–534. IEEE, 2023

work page 2023

[24] [24]

Less training, more repairing please: revisiting automated program repair via zero-shot learning

Chunqiu Steven Xia and Lingming Zhang. Less training, more repairing please: revisiting automated program repair via zero-shot learning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022 , pages 959–971. ACM, 2022

work page 2022

[25] [25]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V . Le, and Denny Zhou. Chain- of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems 35: Annual Con- ference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, Novemb...

work page 2022

[26] [26]

Le, Ed H

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V . Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self- consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 . OpenReview.net, 2023

work page 2023

[27] [27]

To- wards better chain-of-thought prompting strategies: A survey

Zihan Yu, Liang He, Zhen Wu, Xinyu Dai, and Jiajun Chen. To- wards better chain-of-thought prompting strategies: A survey. CoRR, abs/2310.04959, 2023

work page arXiv 2023

[28] [28]

DOBF: A deobfuscation pre-training objective for programming languages

Marie-Anne Lachaux, Baptiste Rozi `ere, Marc Szafraniec, and Guillaume Lample. DOBF: A deobfuscation pre-training objective for programming languages. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 14967–14979, 2021

work page 2021

[29] [29]

Clement, Dawn Drain, and et al

Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin B. Clement, Dawn Drain, and et al. Codexglue: A machine learning benchmark dataset for code understanding and generation. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021...

work page 2021

[30] [30]

A V ATAR: A parallel corpus for java-python program translation

Wasi Uddin Ahmad, Md Golam Rahman Tushar, Saikat Chakraborty, and Kai-Wei Chang. A V ATAR: A parallel corpus for java-python program translation. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023 , pages 2268–

work page 2023

[31] [31]

Association for Computational Linguistics, 2023

work page 2023

[32] [32]

Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. Lex- ical statistical machine translation for language migration. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013, pages 651–654. ACM, 2013

work page 2013

[33] [33]

Tree-to-tree neural networks for program translation

Xinyun Chen, Chang Liu, and Dawn Song. Tree-to-tree neural networks for program translation. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Workshop Track Proceedings . OpenReview.net, 2018

work page 2018

[34] [34]

Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, and et al

Ruchir Puri, David S. Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, and et al. Project codenet: A large- scale AI for code dataset for learning a diversity of coding tasks. CoRR, abs/2105.12655, 2021

work page arXiv 2021

[35] [35]

Ming Zhu, Karthik Suresh, and Chandan K. Reddy. Multilingual code snippets training for program translation. In Thirty-Sixth AAAI Confer- ence on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022...

work page 2022

[36] [36]

Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation

Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Processing Systems 36: Annual Conference on N...

work page 2023

[37] [37]

Llama-3-8B-Instruct. 2023

work page 2023

[38] [38]

CodeBLEU: a Method for Automatic Evaluation of Code Synthesis

Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. Code- bleu: a method for automatic evaluation of code synthesis. CoRR, abs/2009.10297, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2009

[39] [39]

Elements of survey sampling, volume 15

Ravindra Singh and Naurang Singh Mangat. Elements of survey sampling, volume 15. Springer Science & Business Media, 2013

work page 2013

[40] [40]

Math- ematical statistics with applications

Dennis Wackerly, William Mendenhall, and Richard L Scheaffer. Math- ematical statistics with applications . Cengage Learning, 2014

work page 2014

[41] [41]

Richard Landis and Gary G

J. Richard Landis and Gary G. Koch. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics, 33(2):363–374, 1977

work page 1977

[42] [42]

cxgo: C to Go transpiler. 2024

work page 2024

[43] [43]

https://github.com/mono/sharpen, 2020

Sharpen. https://github.com/mono/sharpen, 2020

work page 2020

[44] [44]

https://github.com/paulirwin/JavaToCSharp, 2024

JavaToCSharp. https://github.com/paulirwin/JavaToCSharp, 2024

work page 2024

[45] [45]

Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. Migrating code with statistical machine translation. In 36th International Con- ference on Software Engineering, ICSE ’14, Companion Proceedings, Hyderabad, India, May 31 - June 07, 2014, pages 544–547. ACM, 2014

work page 2014

[46] [46]

Svetoslav Karaivanov, Veselin Raychev, and Martin T. Vechev. Phrase- based statistical translation of programming languages. In Onward! 2014, Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, part of SPLASH ’14, Portland, OR, USA, October 20-24, 2014 , pages 173–184. ACM, 2014

work page 2014

[47] [47]

Learning to generate pseudo-code from source code using statistical machine translation (T)

Yusuke Oda, Hiroyuki Fudaba, Graham Neubig, Hideaki Hata, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura. Learning to generate pseudo-code from source code using statistical machine translation (T). In 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015 , pages 574–584. IEEE Computer...

work page 2015

[48] [48]

Evaluating Large Language Models Trained on Code

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pond ´e de Oliveira Pinto, Jared Kaplan, Harri Edwards, and et al. Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[49] [49]

Self-collaboration code generation via chatgpt

Yihong Dong, Xue Jiang, Zhi Jin, and Ge Li. Self-collaboration code generation via chatgpt. CoRR, abs/2304.07590, 2023

work page arXiv 2023

[50] [50]

Evaluating and improving chatgpt for unit test generation

Zhiqiang Yuan, Mingwei Liu, Shiji Ding, Kaixin Wang, Yixuan Chen, Xin Peng, and Yiling Lou. Evaluating and improving chatgpt for unit test generation. Proc. ACM Softw. Eng. , 1(FSE):1703–1726, 2024

work page 2024

[51] [51]

Automated repair of programs from large language models

Zhiyu Fan, Xiang Gao, Martin Mirchev, Abhik Roychoudhury, and Shin Hwei Tan. Automated repair of programs from large language models. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023 , pages 1469–1481. IEEE, 2023

work page 2023

[52] [52]

Automated program repair in the era of large pre-trained language models

Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. Automated program repair in the era of large pre-trained language models. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023 , pages 1482–1494. IEEE, 2023

work page 2023

[53] [53]

Toufique Ahmed and Premkumar T. Devanbu. Few-shot training llms for project-specific code-summarization. In 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022, Rochester, MI, USA, October 10-14, 2022 , pages 177:1–177:5. ACM, 2022

work page 2022

[54] [54]

An empirical study on using large language models for multi-intent comment generation

Mingyang Geng, Shangwen Wang, Dezun Dong, Haotian Wang, Ge Li, Zhi Jin, Xiaoguang Mao, and Xiangke Liao. An empirical study on using large language models for multi-intent comment generation. CoRR, abs/2304.11384, 2023

work page arXiv 2023

[55] [55]

Ahead of time mutation based fault localisation using statistical inference

Jinhan Kim, Gabin An, Robert Feldt, and Shin Yoo. Ahead of time mutation based fault localisation using statistical inference. In 32nd IEEE International Symposium on Software Reliability Engineering, ISSRE 2021, Wuhan, China, October 25-28, 2021, pages 253–263. IEEE, 2021

work page 2021

[56] [56]

Metallaxis-fl: mutation-based fault localization

Mike Papadakis and Yves Le Traon. Metallaxis-fl: mutation-based fault localization. Softw. Test. Verification Reliab., 25(5-7):605–628, 2015

work page 2015

[57] [57]

FATOC: bug isolation based multi-fault localization by using OPTICS clustering

Yonghao Wu, Zheng Li, Yong Liu, and Xiang Chen. FATOC: bug isolation based multi-fault localization by using OPTICS clustering. J. Comput. Sci. Technol., 35(5):979–998, 2020

work page 2020

[58] [58]

Hassan, Khaled Wassif, Ramadan Moawad, and Soha Makady

Amr Mansour Mohsen, Hesham A. Hassan, Khaled Wassif, Ramadan Moawad, and Soha Makady. Enhancing bug localization using phase- based approach. IEEE Access, 11:35901–35913, 2023

work page 2023

[59] [59]

Fast changeset-based bug localization with BERT

Agnieszka Ciborowska and Kostadin Damevski. Fast changeset-based bug localization with BERT. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022 , pages 946–957. ACM, 2022

work page 2022

[60] [60]

Trobo: A novel deep transfer model for enhancing cross-project bug localization

Ziye Zhu, Yu Wang, and Yun Li. Trobo: A novel deep transfer model for enhancing cross-project bug localization. In Knowledge Science, Engineering and Management - 14th International Conference, KSEM 2021, Tokyo, Japan, August 14-16, 2021, Proceedings, Part I , volume 12815 of Lecture Notes in Computer Science , pages 529–541. Springer, 2021

work page 2021

[61] [61]

A preliminary evaluation of llm-based fault localization

Sungmin Kang, Gabin An, and Shin Yoo. A preliminary evaluation of llm-based fault localization. CoRR, abs/2308.05487, 2023

work page arXiv 2023

[62] [62]

Pruning dynamic slices with confidence

Xiangyu Zhang, Neelam Gupta, and Rajiv Gupta. Pruning dynamic slices with confidence. In Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, Ottawa, Ontario, Canada, June 11-14, 2006 , pages 169–180. ACM, 2006

work page 2006

[63] [63]

REPT: reverse debugging of failures in deployed software

Weidong Cui, Xinyang Ge, Baris Kasikci, Ben Niu, Upamanyu Sharma, Ruoyu Wang, and Insu Yun. REPT: reverse debugging of failures in deployed software. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8-10, 2018, pages 17–32. USENIX Association, 2018

work page 2018

[64] [64]

Shaping program repair space with existing patches and similar code

Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. Shaping program repair space with existing patches and similar code. In Proceedings of the 27th ACM SIGSOFT International Sympo- sium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, July 16-21, 2018 , pages 298–309. ACM, 2018

work page 2018

[65] [65]

ARJA: automated repair of java pro- grams via multi-objective genetic programming

Yuan Yuan and Wolfgang Banzhaf. ARJA: automated repair of java pro- grams via multi-objective genetic programming. IEEE Trans. Software Eng., 46(10):1040–1067, 2020

work page 2020

[66] [66]

ASTOR: a program repair library for java (demo)

Matias Martinez and Martin Monperrus. ASTOR: a program repair library for java (demo). In Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, Saarbr¨ucken, Germany, July 18-20, 2016 , pages 441–444. ACM, 2016

work page 2016

[67] [67]

Precise condition synthesis for program repair

Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. Precise condition synthesis for program repair. In Proceedings of the 39th International Conference on Software Engi- neering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017 , pages 416–426. IEEE / ACM, 2017

work page 2017

[68] [68]

Nopol: Automatic repair of conditional statement bugs in java programs

Jifeng Xuan, Matias Martinez, Favio Demarco, Maxime Cl ´ement, and et al. Nopol: Automatic repair of conditional statement bugs in java programs. CoRR, abs/1811.04211, 2018

work page arXiv 2018

[69] [69]

Ultra-large repair search space with automatically mined templates: The cardumen mode of astor

Matias Martinez and Martin Monperrus. Ultra-large repair search space with automatically mined templates: The cardumen mode of astor. In Search-Based Software Engineering - 10th International Symposium, SSBSE 2018, Montpellier, France, September 8-9, 2018, Proceedings , volume 11036 of Lecture Notes in Computer Science , pages 65–86. Springer, 2018

work page 2018

[70] [70]

Bissyand ´e

Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawend ´e F. Bissyand ´e. Tbar: revisiting template-based automated program repair. In Proceed- ings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2019, Beijing, China, July 15-19, 2019 , pages 31–42. ACM, 2019

work page 2019

[71] [71]

Bissyand´e, Dongsun Kim, Jacques Klein, Martin Monperrus, and Yves Le Traon

Anil Koyuncu, Kui Liu, Tegawend ´e F. Bissyand´e, Dongsun Kim, Jacques Klein, Martin Monperrus, and Yves Le Traon. Fixminer: Mining relevant fix patterns for automated program repair. Empir. Softw. Eng., 25(3):1980–2024, 2020

work page 1980

[72] [72]

Bissyand ´e

Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawend ´e F. Bissyand ´e. A V ATAR: fixing semantic bugs with fix patterns of static analysis violations. In 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2019, Hangzhou, China, February 24-27, 2019, pages 456–467. IEEE, 2019

work page 2019

[73] [73]

Sequencer: Sequence-to- sequence learning for end-to-end program repair

Zimin Chen, Steve Kommrusch, Michele Tufano, Louis-No ¨el Pouchet, Denys Poshyvanyk, and Martin Monperrus. Sequencer: Sequence-to- sequence learning for end-to-end program repair. IEEE Trans. Software Eng., 47(9):1943–1959, 2021

work page 1943

[74] [74]

Coconut: combining context-aware neural translation models using ensemble for program repair

Thibaud Lutellier, Hung Viet Pham, Lawrence Pang, Yitong Li, Moshi Wei, and Lin Tan. Coconut: combining context-aware neural translation models using ensemble for program repair. In ISSTA ’20: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, USA, July 18-22, 2020 , pages 101–114. ACM, 2020

work page 2020

[75] [75]

Tare: Type-aware neural program repair

Qihao Zhu, Zeyu Sun, Wenjie Zhang, Yingfei Xiong, and Lu Zhang. Tare: Type-aware neural program repair. In 45th IEEE/ACM Interna- tional Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023 , pages 1443–1455. IEEE, 2023

work page 2023

[76] [76]

A survey of learning-based automated program repair

Quanjun Zhang, Chunrong Fang, Yuxiang Ma, Weisong Sun, and Zhenyu Chen. A survey of learning-based automated program repair. ACM Trans. Softw. Eng. Methodol. , 33(2):55:1–55:69, 2024

work page 2024

[77] [77]

Pre-trained model-based automated software vulnerability repair: How far are we? IEEE Trans

Quanjun Zhang, Chunrong Fang, Bowen Yu, Weisong Sun, Tongke Zhang, and Zhenyu Chen. Pre-trained model-based automated software vulnerability repair: How far are we? IEEE Trans. Dependable Secur. Comput., 21(4):2507–2525, 2024

work page 2024

[78] [78]

Fixing rust compilation errors using llms

Pantazis Deligiannis, Akash Lal, Nikita Mehrotra, and Aseem Rastogi. Fixing rust compilation errors using llms. CoRR, abs/2308.05177, 2023

work page arXiv 2023

[79] [79]

Repair is nearly generation: Multilingual program repair with llms

Harshit Joshi, Jos ´e Pablo Cambronero S ´anchez, Sumit Gulwani, Vu Le, Gust Verbruggen, and Ivan Radicek. Repair is nearly generation: Multilingual program repair with llms. In Thirty-Seventh AAAI Confer- ence on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Sympos...

work page 2023