TransAgent: Enhancing LLM-Based Code Translation via Fine-Grained Execution Alignment
Pith reviewed 2026-05-23 20:53 UTC · model grok-4.3
The pith
TransAGENT corrects errors in LLM code translations by using multi-agent fine-grained execution alignment to locate faulty blocks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TransAGENT is a novel multi-agent system that eliminates errors during LLM-based code translation. The main insight is to localize error-prone code blocks via fine-grained execution alignment between source and target code. Evaluated on a newly constructed benchmark of recent programming tasks to mitigate data leakage, TransAGENT outperforms the latest UniTrans by up to 33.3% in translation accuracy and achieves an average improvement of 56.7% over Agentless in program repair performance, with ablation studies and tests across LLMs confirming its effectiveness and generalizability.
What carries the argument
Fine-grained execution alignment between source and target code, performed by a multi-agent system to localize error-prone blocks.
If this is right
- Translation accuracy rises by as much as 33.3 percent relative to the prior UniTrans method.
- Program repair performance improves by an average of 56.7 percent compared with the Agentless baseline.
- The gains hold when the underlying LLM is swapped, indicating broad applicability.
- A fresh benchmark of recent tasks reduces the risk that reported numbers reflect memorized training data.
Where Pith is reading between the lines
- The alignment technique could be applied to same-language code repair tasks where test coverage is sparse.
- If alignment succeeds with partial executions, it may support migration of legacy systems that lack comprehensive test suites.
- Integration with other agent workflows for code generation could create end-to-end pipelines for cross-language refactoring.
Load-bearing premise
Fine-grained execution alignment between source and target code can reliably localize error-prone blocks even without complete test suites and without creating alignment artifacts that hide real differences.
What would settle it
A set of translated programs where the alignment step marks a block as correct yet the block still produces wrong outputs on valid inputs, or marks an incorrect block while missing the actual error location.
Figures
read the original abstract
Code translation transforms code between programming languages while preserving functionality, which is critical in software development and maintenance. While traditional learning-based code translation methods have limited effectiveness due to the lack of sufficient parallel training data, Large Language Models (LLMs) have recently advanced this field with their strong code generation and comprehension capabilities. However, code translated by LLMs still suffers from diverse quality issues, such as syntax and semantic errors. In this work, we propose TransAGENT, a novel multi-agent system that eliminates the errors during LLM-based code translation. The main insight of TransAGENT is to localize error-prone code blocks via fine-grained execution alignment between source and target code. We evaluate TransAGENT on a newly constructed benchmark of recent programming tasks to mitigate data leakage. TransAGENT outperforms the latest UniTrans by up to 33.3% in translation accuracy and achieves an average improvement of 56.7% over Agentless in program repair performance. We also conduct an ablation study and evaluate TransAGENT across different LLMs, demonstrating its effectiveness and strong generalizability.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces TransAGENT, a multi-agent system for LLM-based code translation that localizes error-prone blocks via fine-grained execution alignment between source and target code. It constructs a new benchmark of recent programming tasks to reduce data leakage, reports up to 33.3% higher translation accuracy than UniTrans and 56.7% average improvement over Agentless on program repair, and includes ablation studies plus evaluations across multiple LLMs to demonstrate generalizability.
Significance. If the empirical gains hold under scrutiny, the work offers a practical mechanism for improving semantic fidelity in cross-language translation by grounding LLM outputs in execution traces rather than static analysis alone. The new benchmark construction is a constructive contribution for the field, and the multi-agent framing with explicit alignment could generalize to other code maintenance tasks.
major comments (3)
- [§3] §3 (Method), execution alignment procedure: the central claim that fine-grained alignment reliably localizes errors without complete test suites is not supported by a concrete algorithm or pseudocode; the description leaves open how partial traces are matched and whether alignment artifacts could mask semantic differences, which directly underpins the reported accuracy deltas.
- [§4.1] §4.1 (Benchmark), Table 1: the construction details for the new benchmark (task selection criteria, leakage mitigation steps, and test-suite coverage statistics) are insufficient to assess whether the 33.3% and 56.7% gains are robust or sensitive to post-hoc choices; no inter-rater agreement or leakage audit is reported.
- [§4.2] §4.2 (Results), accuracy and repair tables: the improvements are presented as point estimates without statistical significance tests, confidence intervals, or variance across random seeds; this weakens the claim that TransAGENT consistently outperforms the baselines.
minor comments (2)
- [Abstract] The abstract and §1 use “up to 33.3%” and “average improvement of 56.7%” without clarifying whether these are relative or absolute gains or on which exact metric subsets.
- [Figure 2] Figure 2 (agent workflow) would benefit from explicit labeling of the execution-alignment step and data flow between agents.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify key aspects of the method, benchmark, and evaluation. We address each major point below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (Method), execution alignment procedure: the central claim that fine-grained alignment reliably localizes errors without complete test suites is not supported by a concrete algorithm or pseudocode; the description leaves open how partial traces are matched and whether alignment artifacts could mask semantic differences, which directly underpins the reported accuracy deltas.
Authors: We agree that a formal algorithmic description would improve clarity and reproducibility. In the revised manuscript we will add pseudocode (as a new Algorithm 1 in §3) that explicitly specifies the trace-matching procedure for partial executions, the similarity metric used, and safeguards against masking semantic differences (e.g., by requiring both syntactic and semantic equivalence checks on aligned blocks). revision: yes
-
Referee: [§4.1] §4.1 (Benchmark), Table 1: the construction details for the new benchmark (task selection criteria, leakage mitigation steps, and test-suite coverage statistics) are insufficient to assess whether the 33.3% and 56.7% gains are robust or sensitive to post-hoc choices; no inter-rater agreement or leakage audit is reported.
Authors: We acknowledge the need for greater transparency. The revised §4.1 and Table 1 will include: (i) explicit task-selection criteria (problems posted after 2023 on LeetCode/Codeforces with at least three test cases), (ii) leakage-mitigation steps (timestamp filtering plus manual overlap checks against common pre-training corpora), (iii) test-suite coverage statistics (average number of tests per task and branch coverage), and (iv) results of a leakage audit together with inter-rater agreement (Cohen’s κ) for any manual verification steps. revision: yes
-
Referee: [§4.2] §4.2 (Results), accuracy and repair tables: the improvements are presented as point estimates without statistical significance tests, confidence intervals, or variance across random seeds; this weakens the claim that TransAGENT consistently outperforms the baselines.
Authors: We agree that statistical rigor is required. The revised §4.2 will report: (i) paired statistical significance tests (Wilcoxon signed-rank) with p-values, (ii) 95% confidence intervals computed via bootstrap resampling, and (iii) standard deviation across five random seeds for both translation accuracy and program-repair metrics. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper presents an empirical multi-agent system (TransAGENT) whose core contribution is fine-grained execution alignment for error localization in LLM code translation. Evaluation relies on a newly constructed benchmark and direct comparisons to external baselines (UniTrans, Agentless) with reported accuracy deltas. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the provided text. The derivation chain consists of system design followed by external benchmarking; all performance claims are falsifiable against independent test suites and do not reduce to internal definitions or prior author work by construction.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 7 Pith papers
-
Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation
Many reported failures in LLM-based code translation are false negatives due to evaluation pipeline issues such as improper compilation flags, missing library links, and unconfigured runtime environments rather than i...
-
uGen: An Agentic Framework for Generating Microarchitectural Attack PoCs
uGen is the first retrieval-augmented multi-agent LLM framework for generating functionally correct microarchitectural attack PoCs, reporting up to 100% success on Spectre-v1 and 80% on Prime+Probe at low cost.
-
Project-Level C-to-Rust Translation via Pointer Knowledge Graphs
PtrTrans builds a Pointer Knowledge Graph with points-to flows, struct abstractions, and Rust annotations to guide LLMs toward project-level C-to-Rust translations that cut unsafe code by 99.9% and raise functional co...
-
Neural Code Translation of Legacy Code: APL to C#
Guided LLM strategies with custom datasets and execution-based verification enable functional APL-to-C# translation across a range of program complexities.
-
Boosting Automatic Java-to-Cangjie Translation with Multi-Stage LLM Training and Error Repair
Multi-stage LLM training plus compiler-guided error repair boosts functional equivalence in Java-to-Cangjie translation by 6.06% over prior methods despite scarce parallel data.
-
Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation
A large-scale study finds that many LLM code translation failures are false negatives due to improper evaluation configurations rather than incorrect translations.
-
Large Language Model-Based Agents for Software Engineering: A Survey
A literature survey that collects and categorizes 124 papers on LLM-based agents for software engineering from SE and agent perspectives.
Reference graph
Works this paper leans on
-
[1]
Migrating monoliths to microservices-based customizable multi-tenant cloud-native apps
Sindre Grønstøl Haugeland, Phu Hong Nguyen, Hui Song, and Franck Chauvel. Migrating monoliths to microservices-based customizable multi-tenant cloud-native apps. In 47th Euromicro Conference on Software Engineering and Advanced Applications, SEAA 2021, Palermo, Italy, September 1-3, 2021 , pages 170–177. IEEE, 2021
work page 2021
-
[2]
Transforming monolithic applications to microservices with mono2micro. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) , pages 3–3, 2021
work page 2021
-
[3]
Legacy web application modernization by generating a rest service layer
Roberto Rodriguez Echeverria, Fernando Macias, Victor Manuel Pavon, Jose Maria Conejero, and Fernando Sanchez Figueroa. Legacy web application modernization by generating a rest service layer. IEEE Latin America Transactions, 13(7):2379–2383, 2015
work page 2015
- [4]
-
[5]
Vikram Nitin, Shubhi Asthana, Baishakhi Ray, and Rahul Krishna. CARGO: ai-guided dependency analysis for migrating monolithic appli- cations to microservices architecture. In 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022, Rochester, MI, USA, October 10-14, 2022 , pages 20:1–20:12. ACM, 2022
work page 2022
-
[6]
Unsupervised translation of programming languages
Baptiste Rozi `ere, Marie-Anne Lachaux, Lowik Chanussot, and Guil- laume Lample. Unsupervised translation of programming languages. In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual , 2020
work page 2020
-
[7]
Leveraging automated unit tests for unsupervised code translation
Baptiste Rozi `ere, Jie Zhang, Franc ¸ois Charton, Mark Harman, Gabriel Synnaeve, and Guillaume Lample. Leveraging automated unit tests for unsupervised code translation. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29,
work page 2022
-
[8]
Code translation with compiler representations
Marc Szafraniec, Baptiste Rozi `ere, Hugh Leather, Patrick Labatut, Franc ¸ois Charton, and Gabriel Synnaeve. Code translation with compiler representations. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 . OpenRe- view.net, 2023
work page 2023
-
[9]
Summarize and generate to back-translate: Unsupervised translation of programming languages
Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai- Wei Chang. Summarize and generate to back-translate: Unsupervised translation of programming languages. In Proceedings of the 17th Con- ference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, May 2-6, 2023 , pages 1520–1534. Association f...
work page 2023
-
[10]
Yiqing Xie, Atharva Naik, Daniel Fried, and Carolyn P. Ros ´e. Data augmentation for code translation with comparable corpora and multiple references. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 6-10, 2023 , pages 13725–13739. Association for Computational Linguistics, 2023
work page 2023
-
[11]
Lost in translation: A study of bugs introduced by large language models while translating code
Rangeet Pan, Ali Reza Ibrahimzada, Rahul Krishna, Divya Sankar, and et al. Lost in translation: A study of bugs introduced by large language models while translating code. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024 , pages 82:1–82:13. ACM, 2024
work page 2024
-
[12]
Exploring and unleashing the power of large language models in automated code translation
Zhen Yang, Fang Liu, Zhongxing Yu, Jacky Wai Keung, Jia Li, Shuo Liu, Yifan Hong, Xiaoxue Ma, Zhi Jin, and Ge Li. Exploring and unleashing the power of large language models in automated code translation. Proc. ACM Softw. Eng., 1(FSE):1585–1608, 2024
work page 2024
-
[13]
Reasoning runtime behavior of a program with llm: How far are we? arXiv e-prints, 2024
Junkai Chen, Zhiyuan Pan, Xing Hu, Zhenhao Li, Ge Li, and Xin Xia. Reasoning runtime behavior of a program with llm: How far are we? arXiv e-prints, 2024
work page 2024
-
[14]
Large language model-based agents for software engineering: A survey, 2024
Junwei Liu, Kaixin Wang, Yixuan Chen, Xin Peng, Zhenpeng Chen, Lingming Zhang, and Yiling Lou. Large language model-based agents for software engineering: A survey, 2024
work page 2024
-
[15]
Transmap: Pinpointing mistakes in neural code translation
Bo Wang, Ruishi Li, Mingkai Li, and Prateek Saxena. Transmap: Pinpointing mistakes in neural code translation. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023 , pages 999–1011. ACM, 2023
work page 2023
-
[16]
deepseek-coder-6.7b instruct. 2023
work page 2023
-
[17]
minimumArrayLength. 2024.01
work page 2024
-
[18]
minOperations. 2024.03
work page 2024
-
[19]
minOrAfterOperations. 2024.01
work page 2024
-
[20]
T. Sherwood, E. Perelman, and B. Calder. Basic block distribution analysis to find periodic behavior and simulation points in applications. In Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques, pages 3–14, 2001
work page 2001
-
[21]
Gamma: Revisiting template-based automated program repair via mask prediction
Quanjun Zhang, Chunrong Fang, Tongke Zhang, Bowen Yu, Weisong Sun, and Zhenyu Chen. Gamma: Revisiting template-based automated program repair via mask prediction. In 38th IEEE/ACM International Conference on Automated Software Engineering, ASE 2023, Luxem- bourg, September 11-15, 2023 , pages 535–547. IEEE, 2023
work page 2023
-
[22]
Yuxiang Wei, Chunqiu Steven Xia, and Lingming Zhang. Copiloting the copilots: Fusing large language models with completion engines for automated program repair. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023, San Francisco, CA, USA, December 3-9, 2023 , ...
work page 2023
-
[23]
The plastic surgery hypothesis in the era of large language models
Chunqiu Steven Xia, Yifeng Ding, and Lingming Zhang. The plastic surgery hypothesis in the era of large language models. In 38th IEEE/ACM International Conference on Automated Software Engineer- ing, ASE 2023, Luxembourg, September 11-15, 2023 , pages 522–534. IEEE, 2023
work page 2023
-
[24]
Less training, more repairing please: revisiting automated program repair via zero-shot learning
Chunqiu Steven Xia and Lingming Zhang. Less training, more repairing please: revisiting automated program repair via zero-shot learning. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022, Singapore, Singapore, November 14-18, 2022 , pages 959–971. ACM, 2022
work page 2022
-
[25]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V . Le, and Denny Zhou. Chain- of-thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems 35: Annual Con- ference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, Novemb...
work page 2022
-
[26]
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V . Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self- consistency improves chain of thought reasoning in language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023 . OpenReview.net, 2023
work page 2023
-
[27]
To- wards better chain-of-thought prompting strategies: A survey
Zihan Yu, Liang He, Zhen Wu, Xinyu Dai, and Jiajun Chen. To- wards better chain-of-thought prompting strategies: A survey. CoRR, abs/2310.04959, 2023
-
[28]
DOBF: A deobfuscation pre-training objective for programming languages
Marie-Anne Lachaux, Baptiste Rozi `ere, Marc Szafraniec, and Guillaume Lample. DOBF: A deobfuscation pre-training objective for programming languages. In Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual, pages 14967–14979, 2021
work page 2021
-
[29]
Clement, Dawn Drain, and et al
Shuai Lu, Daya Guo, Shuo Ren, Junjie Huang, Alexey Svyatkovskiy, Ambrosio Blanco, Colin B. Clement, Dawn Drain, and et al. Codexglue: A machine learning benchmark dataset for code understanding and generation. In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, NeurIPS Datasets and Benchmarks 2021, December 2021...
work page 2021
-
[30]
A V ATAR: A parallel corpus for java-python program translation
Wasi Uddin Ahmad, Md Golam Rahman Tushar, Saikat Chakraborty, and Kai-Wei Chang. A V ATAR: A parallel corpus for java-python program translation. In Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023 , pages 2268–
work page 2023
-
[31]
Association for Computational Linguistics, 2023
work page 2023
-
[32]
Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. Lex- ical statistical machine translation for language migration. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, ESEC/FSE’13, Saint Petersburg, Russian Federation, August 18-26, 2013, pages 651–654. ACM, 2013
work page 2013
-
[33]
Tree-to-tree neural networks for program translation
Xinyun Chen, Chang Liu, and Dawn Song. Tree-to-tree neural networks for program translation. In 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Workshop Track Proceedings . OpenReview.net, 2018
work page 2018
-
[34]
Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, and et al
Ruchir Puri, David S. Kung, Geert Janssen, Wei Zhang, Giacomo Domeniconi, Vladimir Zolotov, and et al. Project codenet: A large- scale AI for code dataset for learning a diversity of coding tasks. CoRR, abs/2105.12655, 2021
-
[35]
Ming Zhu, Karthik Suresh, and Chandan K. Reddy. Multilingual code snippets training for program translation. In Thirty-Sixth AAAI Confer- ence on Artificial Intelligence, AAAI 2022, Thirty-Fourth Conference on Innovative Applications of Artificial Intelligence, IAAI 2022, The Twelveth Symposium on Educational Advances in Artificial Intelligence, EAAI 2022...
work page 2022
-
[36]
Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation. In Alice Oh, Tristan Naumann, Amir Globerson, Kate Saenko, Moritz Hardt, and Sergey Levine, editors, Advances in Neural Information Processing Systems 36: Annual Conference on N...
work page 2023
-
[37]
Llama-3-8B-Instruct. 2023
work page 2023
-
[38]
CodeBLEU: a Method for Automatic Evaluation of Code Synthesis
Shuo Ren, Daya Guo, Shuai Lu, Long Zhou, Shujie Liu, Duyu Tang, Neel Sundaresan, Ming Zhou, Ambrosio Blanco, and Shuai Ma. Code- bleu: a method for automatic evaluation of code synthesis. CoRR, abs/2009.10297, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2009
-
[39]
Elements of survey sampling, volume 15
Ravindra Singh and Naurang Singh Mangat. Elements of survey sampling, volume 15. Springer Science & Business Media, 2013
work page 2013
-
[40]
Math- ematical statistics with applications
Dennis Wackerly, William Mendenhall, and Richard L Scheaffer. Math- ematical statistics with applications . Cengage Learning, 2014
work page 2014
-
[41]
J. Richard Landis and Gary G. Koch. An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics, 33(2):363–374, 1977
work page 1977
-
[42]
cxgo: C to Go transpiler. 2024
work page 2024
- [43]
-
[44]
https://github.com/paulirwin/JavaToCSharp, 2024
JavaToCSharp. https://github.com/paulirwin/JavaToCSharp, 2024
work page 2024
-
[45]
Anh Tuan Nguyen, Tung Thanh Nguyen, and Tien N. Nguyen. Migrating code with statistical machine translation. In 36th International Con- ference on Software Engineering, ICSE ’14, Companion Proceedings, Hyderabad, India, May 31 - June 07, 2014, pages 544–547. ACM, 2014
work page 2014
-
[46]
Svetoslav Karaivanov, Veselin Raychev, and Martin T. Vechev. Phrase- based statistical translation of programming languages. In Onward! 2014, Proceedings of the 2014 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, part of SPLASH ’14, Portland, OR, USA, October 20-24, 2014 , pages 173–184. ACM, 2014
work page 2014
-
[47]
Learning to generate pseudo-code from source code using statistical machine translation (T)
Yusuke Oda, Hiroyuki Fudaba, Graham Neubig, Hideaki Hata, Sakriani Sakti, Tomoki Toda, and Satoshi Nakamura. Learning to generate pseudo-code from source code using statistical machine translation (T). In 30th IEEE/ACM International Conference on Automated Software Engineering, ASE 2015, Lincoln, NE, USA, November 9-13, 2015 , pages 574–584. IEEE Computer...
work page 2015
-
[48]
Evaluating Large Language Models Trained on Code
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pond ´e de Oliveira Pinto, Jared Kaplan, Harri Edwards, and et al. Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[49]
Self-collaboration code generation via chatgpt
Yihong Dong, Xue Jiang, Zhi Jin, and Ge Li. Self-collaboration code generation via chatgpt. CoRR, abs/2304.07590, 2023
-
[50]
Evaluating and improving chatgpt for unit test generation
Zhiqiang Yuan, Mingwei Liu, Shiji Ding, Kaixin Wang, Yixuan Chen, Xin Peng, and Yiling Lou. Evaluating and improving chatgpt for unit test generation. Proc. ACM Softw. Eng. , 1(FSE):1703–1726, 2024
work page 2024
-
[51]
Automated repair of programs from large language models
Zhiyu Fan, Xiang Gao, Martin Mirchev, Abhik Roychoudhury, and Shin Hwei Tan. Automated repair of programs from large language models. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023 , pages 1469–1481. IEEE, 2023
work page 2023
-
[52]
Automated program repair in the era of large pre-trained language models
Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. Automated program repair in the era of large pre-trained language models. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023 , pages 1482–1494. IEEE, 2023
work page 2023
-
[53]
Toufique Ahmed and Premkumar T. Devanbu. Few-shot training llms for project-specific code-summarization. In 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022, Rochester, MI, USA, October 10-14, 2022 , pages 177:1–177:5. ACM, 2022
work page 2022
-
[54]
An empirical study on using large language models for multi-intent comment generation
Mingyang Geng, Shangwen Wang, Dezun Dong, Haotian Wang, Ge Li, Zhi Jin, Xiaoguang Mao, and Xiangke Liao. An empirical study on using large language models for multi-intent comment generation. CoRR, abs/2304.11384, 2023
-
[55]
Ahead of time mutation based fault localisation using statistical inference
Jinhan Kim, Gabin An, Robert Feldt, and Shin Yoo. Ahead of time mutation based fault localisation using statistical inference. In 32nd IEEE International Symposium on Software Reliability Engineering, ISSRE 2021, Wuhan, China, October 25-28, 2021, pages 253–263. IEEE, 2021
work page 2021
-
[56]
Metallaxis-fl: mutation-based fault localization
Mike Papadakis and Yves Le Traon. Metallaxis-fl: mutation-based fault localization. Softw. Test. Verification Reliab., 25(5-7):605–628, 2015
work page 2015
-
[57]
FATOC: bug isolation based multi-fault localization by using OPTICS clustering
Yonghao Wu, Zheng Li, Yong Liu, and Xiang Chen. FATOC: bug isolation based multi-fault localization by using OPTICS clustering. J. Comput. Sci. Technol., 35(5):979–998, 2020
work page 2020
-
[58]
Hassan, Khaled Wassif, Ramadan Moawad, and Soha Makady
Amr Mansour Mohsen, Hesham A. Hassan, Khaled Wassif, Ramadan Moawad, and Soha Makady. Enhancing bug localization using phase- based approach. IEEE Access, 11:35901–35913, 2023
work page 2023
-
[59]
Fast changeset-based bug localization with BERT
Agnieszka Ciborowska and Kostadin Damevski. Fast changeset-based bug localization with BERT. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022, Pittsburgh, PA, USA, May 25-27, 2022 , pages 946–957. ACM, 2022
work page 2022
-
[60]
Trobo: A novel deep transfer model for enhancing cross-project bug localization
Ziye Zhu, Yu Wang, and Yun Li. Trobo: A novel deep transfer model for enhancing cross-project bug localization. In Knowledge Science, Engineering and Management - 14th International Conference, KSEM 2021, Tokyo, Japan, August 14-16, 2021, Proceedings, Part I , volume 12815 of Lecture Notes in Computer Science , pages 529–541. Springer, 2021
work page 2021
-
[61]
A preliminary evaluation of llm-based fault localization
Sungmin Kang, Gabin An, and Shin Yoo. A preliminary evaluation of llm-based fault localization. CoRR, abs/2308.05487, 2023
-
[62]
Pruning dynamic slices with confidence
Xiangyu Zhang, Neelam Gupta, and Rajiv Gupta. Pruning dynamic slices with confidence. In Proceedings of the ACM SIGPLAN 2006 Conference on Programming Language Design and Implementation, Ottawa, Ontario, Canada, June 11-14, 2006 , pages 169–180. ACM, 2006
work page 2006
-
[63]
REPT: reverse debugging of failures in deployed software
Weidong Cui, Xinyang Ge, Baris Kasikci, Ben Niu, Upamanyu Sharma, Ruoyu Wang, and Insu Yun. REPT: reverse debugging of failures in deployed software. In 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018, Carlsbad, CA, USA, October 8-10, 2018, pages 17–32. USENIX Association, 2018
work page 2018
-
[64]
Shaping program repair space with existing patches and similar code
Jiajun Jiang, Yingfei Xiong, Hongyu Zhang, Qing Gao, and Xiangqun Chen. Shaping program repair space with existing patches and similar code. In Proceedings of the 27th ACM SIGSOFT International Sympo- sium on Software Testing and Analysis, ISSTA 2018, Amsterdam, The Netherlands, July 16-21, 2018 , pages 298–309. ACM, 2018
work page 2018
-
[65]
ARJA: automated repair of java pro- grams via multi-objective genetic programming
Yuan Yuan and Wolfgang Banzhaf. ARJA: automated repair of java pro- grams via multi-objective genetic programming. IEEE Trans. Software Eng., 46(10):1040–1067, 2020
work page 2020
-
[66]
ASTOR: a program repair library for java (demo)
Matias Martinez and Martin Monperrus. ASTOR: a program repair library for java (demo). In Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, Saarbr¨ucken, Germany, July 18-20, 2016 , pages 441–444. ACM, 2016
work page 2016
-
[67]
Precise condition synthesis for program repair
Yingfei Xiong, Jie Wang, Runfa Yan, Jiachen Zhang, Shi Han, Gang Huang, and Lu Zhang. Precise condition synthesis for program repair. In Proceedings of the 39th International Conference on Software Engi- neering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017 , pages 416–426. IEEE / ACM, 2017
work page 2017
-
[68]
Nopol: Automatic repair of conditional statement bugs in java programs
Jifeng Xuan, Matias Martinez, Favio Demarco, Maxime Cl ´ement, and et al. Nopol: Automatic repair of conditional statement bugs in java programs. CoRR, abs/1811.04211, 2018
-
[69]
Ultra-large repair search space with automatically mined templates: The cardumen mode of astor
Matias Martinez and Martin Monperrus. Ultra-large repair search space with automatically mined templates: The cardumen mode of astor. In Search-Based Software Engineering - 10th International Symposium, SSBSE 2018, Montpellier, France, September 8-9, 2018, Proceedings , volume 11036 of Lecture Notes in Computer Science , pages 65–86. Springer, 2018
work page 2018
-
[70]
Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawend ´e F. Bissyand ´e. Tbar: revisiting template-based automated program repair. In Proceed- ings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2019, Beijing, China, July 15-19, 2019 , pages 31–42. ACM, 2019
work page 2019
-
[71]
Bissyand´e, Dongsun Kim, Jacques Klein, Martin Monperrus, and Yves Le Traon
Anil Koyuncu, Kui Liu, Tegawend ´e F. Bissyand´e, Dongsun Kim, Jacques Klein, Martin Monperrus, and Yves Le Traon. Fixminer: Mining relevant fix patterns for automated program repair. Empir. Softw. Eng., 25(3):1980–2024, 2020
work page 1980
-
[72]
Kui Liu, Anil Koyuncu, Dongsun Kim, and Tegawend ´e F. Bissyand ´e. A V ATAR: fixing semantic bugs with fix patterns of static analysis violations. In 26th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2019, Hangzhou, China, February 24-27, 2019, pages 456–467. IEEE, 2019
work page 2019
-
[73]
Sequencer: Sequence-to- sequence learning for end-to-end program repair
Zimin Chen, Steve Kommrusch, Michele Tufano, Louis-No ¨el Pouchet, Denys Poshyvanyk, and Martin Monperrus. Sequencer: Sequence-to- sequence learning for end-to-end program repair. IEEE Trans. Software Eng., 47(9):1943–1959, 2021
work page 1943
-
[74]
Coconut: combining context-aware neural translation models using ensemble for program repair
Thibaud Lutellier, Hung Viet Pham, Lawrence Pang, Yitong Li, Moshi Wei, and Lin Tan. Coconut: combining context-aware neural translation models using ensemble for program repair. In ISSTA ’20: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, USA, July 18-22, 2020 , pages 101–114. ACM, 2020
work page 2020
-
[75]
Tare: Type-aware neural program repair
Qihao Zhu, Zeyu Sun, Wenjie Zhang, Yingfei Xiong, and Lu Zhang. Tare: Type-aware neural program repair. In 45th IEEE/ACM Interna- tional Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023 , pages 1443–1455. IEEE, 2023
work page 2023
-
[76]
A survey of learning-based automated program repair
Quanjun Zhang, Chunrong Fang, Yuxiang Ma, Weisong Sun, and Zhenyu Chen. A survey of learning-based automated program repair. ACM Trans. Softw. Eng. Methodol. , 33(2):55:1–55:69, 2024
work page 2024
-
[77]
Pre-trained model-based automated software vulnerability repair: How far are we? IEEE Trans
Quanjun Zhang, Chunrong Fang, Bowen Yu, Weisong Sun, Tongke Zhang, and Zhenyu Chen. Pre-trained model-based automated software vulnerability repair: How far are we? IEEE Trans. Dependable Secur. Comput., 21(4):2507–2525, 2024
work page 2024
-
[78]
Fixing rust compilation errors using llms
Pantazis Deligiannis, Akash Lal, Nikita Mehrotra, and Aseem Rastogi. Fixing rust compilation errors using llms. CoRR, abs/2308.05177, 2023
-
[79]
Repair is nearly generation: Multilingual program repair with llms
Harshit Joshi, Jos ´e Pablo Cambronero S ´anchez, Sumit Gulwani, Vu Le, Gust Verbruggen, and Ivan Radicek. Repair is nearly generation: Multilingual program repair with llms. In Thirty-Seventh AAAI Confer- ence on Artificial Intelligence, AAAI 2023, Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Sympos...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.