MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing

Congying Xu; Hengcheng Zhu; Jialun Cao; Jiarong Wu; Shing-Chi Cheung; Songqiang Chen; Valerio Terragni

arxiv: 2408.15815 · v2 · submitted 2024-08-28 · 💻 cs.SE

MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing

Congying Xu , Songqiang Chen , Jiarong Wu , Shing-Chi Cheung , Valerio Terragni , Hengcheng Zhu , Jialun Cao This is my paper

Pith reviewed 2026-05-23 22:15 UTC · model grok-4.3

classification 💻 cs.SE

keywords metamorphic testinginput transformation deductionLLM code generationtest adequacymetamorphic relationssoftware testing automationdata flow analysis

0 comments

The pith

MR-Adopt deduces input transformations from hard-coded metamorphic test cases to allow reuse with new inputs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MR-Adopt to extract reusable input transformation functions from test cases that encode metamorphic relations but hard-code the inputs. It employs large language models to generate additional source and follow-up input pairs from the single available example, then refines the resulting code with data-flow analysis to eliminate irrelevant parts. The best transformation is chosen by checking how well it satisfies the encoded output relations. This approach succeeds for 72 percent of the relations tested, surpassing vanilla GPT-3.5 by 33 percent, and raises line coverage by 10.62 percent along with mutation scores by 18.91 percent when the transformations are used.

Core claim

MR-Adopt automatically deduces the input transformation from the hard-coded source and follow-up inputs in encoded MR test cases. With typically only one pair available, LLMs generate additional source-followup pairs to guide generalizable transformations, which are refined by removing irrelevant code via data-flow analysis and selected based on the output relations.

What carries the argument

LLM generation of additional input pairs combined with data-flow refinement and output-relation selection to deduce general input transformations.

If this is right

Input transformations work for all experimental source inputs in 72.00% of encoded MRs.
This rate is 33.33% higher than with vanilla GPT-3.5.
Encoded MR-based test cases increase line coverage by 10.62%.
Mutation scores rise by 18.91% when using the generated transformations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Developers could apply this to existing test suites to expand coverage without rewriting relations.
The method might extend to other forms of property-based testing where relations are partially specified.
Integration with test generation frameworks could automate more of the metamorphic testing process.

Load-bearing premise

LLM-generated additional input pairs are representative enough to produce transformations that generalize, and data-flow analysis removes only irrelevant code without losing key mapping logic.

What would settle it

Applying MR-Adopt to a collection of hard-coded MR test cases from unseen projects and finding that fewer than half yield transformations applicable to all new source inputs.

Figures

Figures reproduced from arXiv: 2408.15815 by Congying Xu, Hengcheng Zhu, Jialun Cao, Jiarong Wu, Shing-Chi Cheung, Songqiang Chen, Valerio Terragni.

**Figure 1.** Figure 1: Overview of MR-Adopt for Metamorphic Testing transformation that aligns with the semantic of the encoded MR, ensuring it applies to all potential source inputs with the corresponding output relation. In this paper, we propose MR-Adopt, an approach that leverages large language models (LLMs) to automatically generate input transformation functions for MRs encoded in existing test cases. Trained on extensi… view at source ↗

**Figure 2.** Figure 2: An overview of Figure 2: An overview of 374 [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

While a recent study reveals that many developer-written test cases can encode a reusable Metamorphic Relation (MR), over 70% of them directly hard-code the source input and follow-up input in the encoded relation. Such encoded MRs, which do not contain an explicit input transformation to transform the source inputs to corresponding follow-up inputs, cannot be reused with new source inputs to enhance test adequacy. In this paper, we propose MR-Adopt (Automatic Deduction Of inPut Transformation) to automatically deduce the input transformation from the hard-coded source and follow-up inputs, aiming to enable the encoded MRs to be reused with new source inputs. With typically only one pair of source and follow-up inputs available in an MR-encoded test case as the example, we leveraged LLMs to understand the intention of the test case and generate additional examples of source-followup input pairs. This helps to guide the generation of input transformations generalizable to multiple source inputs. Besides, to mitigate the issue that LLMs generate erroneous code, we refine LLM-generated transformations by removing MR- irrelevant code elements with data-flow analysis. Finally, we assess candidate transformations based on encoded output relations and select the best transformation as the result. Evaluation results show that MR-Adopt can generate input transformations applicable to all experimental source inputs for 72.00% of encoded MRs, which is 33.33% more than using vanilla GPT-3.5. By incorporating MR- Adopt-generated input transformations, encoded MR-based test cases can effectively enhance the test adequacy, increasing the line coverage and mutation score by 10.62% and 18.91%, respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MR-Adopt gives a concrete pipeline to recover reusable input transformations from hard-coded metamorphic relations, beating plain GPT-3.5 on applicability and showing measurable coverage gains.

read the letter

The core takeaway is that this work turns a common limitation in metamorphic testing into something more practical: many developer tests hard-code source and follow-up inputs inside an MR, so they cannot be applied to new inputs. MR-Adopt uses an LLM to synthesize extra example pairs, prunes the generated code with data-flow analysis, and picks the best candidate by checking the output relation. That combination is not in the prior MT literature they cite, and the reported lift (72% applicability versus 38.67% for vanilla GPT-3.5, plus 10.62% line coverage and 18.91% mutation score) is the main empirical result worth noting. The pipeline is straightforward and directly targets reuse, which is a real pain point when test suites already contain encoded MRs. The authors also acknowledge that LLMs can produce bad code and try to mitigate it with static pruning, which is a reasonable engineering step. On the soft side, the abstract gives percentages without stating how many subjects were used, how they were selected, or whether results were averaged over multiple LLM runs or prompt variations. Those details matter for judging whether the 33% relative gain holds up or is sensitive to the particular test cases chosen. The method also assumes the single provided pair plus LLM-generated ones are enough to learn a general transformation; if the original pair is atypical, the data-flow step might still keep irrelevant logic. Nothing in the description suggests circularity or invented metrics, and the approach stays empirical rather than claiming a formal guarantee. This paper is mainly for researchers already working on metamorphic testing or automated test reuse who want a practical next step beyond manual MR encoding. A reader who cares about LLM-assisted program analysis in testing will get concrete numbers and a clear method to compare against. It is solid enough on its own terms to deserve a serious referee rather than a desk reject; the experimental claims can be checked once the full subject list and variance data are in front of someone. I would send it out for review.

Referee Report

1 major / 0 minor

Summary. The paper presents MR-Adopt, an automated approach to deduce reusable input transformation functions from encoded Metamorphic Relations (MRs) that hard-code source and follow-up inputs. The method uses LLMs to generate additional source-followup input pairs from the single available example in a test case, applies data-flow analysis to refine the generated code by removing irrelevant elements, and selects the best transformation by evaluating it against the encoded output relations. Evaluation on encoded MRs shows that MR-Adopt produces transformations applicable to all experimental source inputs for 72.00% of cases (33.33% higher than vanilla GPT-3.5), and that incorporating these transformations increases line coverage by 10.62% and mutation score by 18.91%.

Significance. If the empirical results hold under rigorous controls, the work addresses a clear practical barrier to reusing encoded MRs in metamorphic testing, potentially allowing a large fraction of existing developer-written tests to be applied to new inputs. The hybrid use of LLMs for example generation combined with static data-flow refinement is a pragmatic strength, and the focus on generalizability from minimal examples aligns with real-world test maintenance needs. The reported gains in coverage and mutation score indicate measurable test-adequacy benefits.

major comments (1)

[Evaluation] Evaluation section: the central claims of 72.00% applicability, 33.33% improvement over vanilla GPT-3.5, 10.62% coverage gain, and 18.91% mutation-score gain are presented without reporting the total number of encoded MRs examined, subject-program selection criteria, number of LLM runs, variance or statistical significance, or explicit controls for prompt sensitivity. These details are load-bearing for assessing whether the percentages reliably support the reusability and adequacy claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed feedback on the evaluation section. We agree that additional methodological details are needed to support the reported results and will revise the manuscript to address this.

read point-by-point responses

Referee: [Evaluation] Evaluation section: the central claims of 72.00% applicability, 33.33% improvement over vanilla GPT-3.5, 10.62% coverage gain, and 18.91% mutation-score gain are presented without reporting the total number of encoded MRs examined, subject-program selection criteria, number of LLM runs, variance or statistical significance, or explicit controls for prompt sensitivity. These details are load-bearing for assessing whether the percentages reliably support the reusability and adequacy claims.

Authors: We agree that these details are essential for evaluating the reliability of the empirical claims. The current manuscript presents the aggregate percentages without explicitly stating the underlying experimental parameters. In the revised version, we will expand the Evaluation section (likely in a new subsection on experimental setup and threats to validity) to report: the total number of encoded MRs examined, the criteria and process for selecting subject programs, the number of LLM runs (including any repetition for stability), observed variance across runs if applicable, any statistical significance testing performed, and explicit controls or sensitivity analysis for prompt variations. This will allow readers to better assess the generalizability of the 72% applicability rate and the coverage/mutation improvements. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a multi-stage heuristic pipeline (LLM-based pair generation, data-flow refinement, output-relation selection) whose success metrics (72% applicability, coverage/mutation gains) are measured empirically against external benchmarks and test suites. No derivation reduces by construction to fitted parameters, self-citations, or renamed inputs; the central claims rest on observable program behavior and LLM outputs rather than internal redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The contribution rests on the empirical capability of current LLMs to infer test intent from code and on the correctness of standard data-flow algorithms; no new mathematical constants or entities are introduced.

axioms (2)

domain assumption Large language models can produce additional valid source-followup input pairs that reflect the intended metamorphic relation when given only the test code.
Invoked when the method asks the LLM to generate extra examples to guide transformation synthesis.
domain assumption Data-flow analysis can correctly identify and excise MR-irrelevant statements without removing logic required for a correct input mapping.
Invoked in the refinement step that removes erroneous code elements produced by the LLM.

pith-pipeline@v0.9.0 · 5852 in / 1464 out tokens · 24370 ms · 2026-05-23T22:15:01.111790+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · 5 internal anchors

[1]

Rajeev Alur, Rastislav Bodík, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-guided synthesis. In Formal Methods in Computer-Aided Design, FMCAD 2013, Portland, OR, USA, October 20-23, 2013 . IEEE, 1–8. https://ieeexplore.ieee.org/document/6679385/

work page arXiv 2013
[2]

Jialun Cao, Wuqi Zhang, and Shing-Chi Cheung. 2024. Concerned with Data Contamination? Assessing Countermeasures in Code Language Model. CoRR abs/2403.16898 (2024). https://doi.org/10.48550/ARXIV.2403.16898 arXiv:2403.16898

work page doi:10.48550/arxiv.2403.16898 2024
[3]

Junkai Chen, Xing Hu, Zhenhao Li, Cuiyun Gao, Xin Xia, and David Lo. 2024. Code Search is All You Need? Improving Code Suggestions with Code Search. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 73, 13 pages. https://doi.org/10.1145/3597503.3639085

work page doi:10.1145/3597503.3639085 2024
[4]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pondé de Oliveira Pinto, and et al. 2021. Evaluating Large Language Models Trained on Code. CoRR abs/2107.03374 (2021). arXiv:2107.03374 https://arxiv.org/abs/ 2107.03374

work page internal anchor Pith review Pith/arXiv arXiv 2021
[5]

Songqiang Chen, Shuo Jin, and Xiaoyuan Xie. 2021. Testing Your Question Answering Software via Asking Recursively. In 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021 . IEEE, 104–116. https://doi.org/10.1109/ASE51524.2021. 9678670

work page doi:10.1109/ase51524.2021 2021
[6]

Tsong Yueh Chen, Fei-Ching Kuo, Huai Liu, Pak-Lok Poon, Dave Towey, T. H. Tse, and Zhi Quan Zhou. 2018. Metamorphic Testing: A Review of Challenges and Opportunities. ACM Comput. Surv. 51, 1 (2018), 4:1–4:27. https://doi.org/10. 1145/3143561

work page 2018
[7]

Shihan Dou, Yan Liu, Haoxiang Jia, Limao Xiong, Enyu Zhou, Wei Shen, Junjie Shan, Caishuang Huang, Xiao Wang, Xiaoran Fan, Zhiheng Xi, Yuhao Zhou, Tao Ji, Rui Zheng, Qi Zhang, Xuanjing Huang, and Tao Gui. 2024. StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feed- back. CoRR abs/2402.01391 (2024). https://doi.org/10.48550/ARXI...

work page doi:10.48550/arxiv.2402.01391 2024
[8]

Xueying Du, Mingwei Liu, Kaixin Wang, Hanlin Wang, Junwei Liu, Yixuan Chen, Jiayi Feng, Chaofeng Sha, Xin Peng, and Yiling Lou. 2024. Evaluating Large Language Models in Class-Level Code Generation. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024 . ACM, 81:1–81:13. https:...

work page doi:10.1145/3597503 2024
[9]

Aryaz Eghbali and Michael Pradel. 2024. De-Hallucinator: Iterative Grounding for LLM-Based Code Completion. CoRR abs/2401.01701 (2024). https://doi.org/ 10.48550/ARXIV.2401.01701 arXiv:2401.01701

work page doi:10.48550/arxiv.2401.01701 2024
[10]

Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: automatic test suite generation for object-oriented software. In SIGSOFT/FSE’11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC’11: 13th European Software Engineering Conference (ESEC-13), Szeged, Hungary, September 5-9, 2011 , Tibor Gyimóthy and Andreas Zeller (Eds.)...

work page arXiv 2011
[11]

Mingyang Geng, Shangwen Wang, Dezun Dong, Haotian Wang, Ge Li, Zhi Jin, Xiaoguang Mao, and Xiangke Liao. 2024. Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024 . ACM, 3...

work page doi:10.1145/3597503.3608134 2024
[12]

Sumit Gulwani. 2011. Automating string processing in spreadsheets using input- output examples. In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 26-28, 2011, Thomas Ball and Mooly Sagiv (Eds.). ACM, 317–330. https://doi.org/ 10.1145/1926385.1926423

work page doi:10.1145/1926385.1926423 2011
[13]

Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y. Wu, Y. K. Li, Fuli Luo, Yingfei Xiong, and Wen- feng Liang. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence. CoRR abs/2401.14196 (2024). https://doi.org/10.48550/ARXIV.2401.14196 arXiv:2401.14196

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2401.14196 2024
[14]

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2020. The Curious Case of Neural Text Degeneration. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 . OpenReview.net. https://openreview.net/forum?id=rygGQyrFvH

work page 2020
[15]

Kaifeng Huang, Bihuan Chen, Congying Xu, Ying Wang, Bowen Shi, Xin Peng, Yijian Wu, and Yang Liu. 2022. Characterizing usages, updates and risks of third-party libraries in Java projects. Empir. Softw. Eng. 27, 4 (2022), 90. https: //doi.org/10.1007/s10664-022-10131-8

work page doi:10.1007/s10664-022-10131-8 2022
[16]

Maliheh Izadi, Jonathan Katzy, Tim van Dam, Marc Otten, Razvan Mihai Popescu, and Arie van Deursen. 2024. Language Models for Code Completion: A Practical Evaluation. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024 . ACM, 79:1– 79:13. https://doi.org/10.1145/3597503.3639138

work page doi:10.1145/3597503.3639138 2024
[17]

Shuyang Jiang, Yuhao Wang, and Yu Wang. 2023. SelfEvolve: A Code Evolution Framework via Large Language Models. CoRR abs/2306.02907 (2023). https: //doi.org/10.48550/ARXIV.2306.02907 arXiv:2306.02907

work page doi:10.48550/arxiv.2306.02907 2023
[18]

Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler validation via equivalence modulo inputs. In ACM SIGPLAN Conference on Programming Lan- guage Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014 , Michael F. P. O’Boyle and Keshav Pingali (Eds.). ACM, 216–226. https://doi.org/10.1145/2594291.2594334

work page doi:10.1145/2594291.2594334 2014
[19]

Lahiri, and Siddhartha Sen

Caroline Lemieux, Jeevana Priya Inala, Shuvendu K. Lahiri, and Siddhartha Sen

work page
[20]

In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023

CodaMosa: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023 . IEEE, 919–931. https://doi.org/10.1109/ICSE48619.2023.00085

work page doi:10.1109/icse48619.2023.00085 2023
[21]

Chengshu Li, Jacky Liang, Andy Zeng, Xinyun Chen, Karol Hausman, Dorsa Sadigh, Sergey Levine, Li Fei-Fei, Fei Xia, and Brian Ichter. 2023. Chain of Code: Reasoning with a Language Model-Augmented Code Emulator. CoRR abs/2312.04474 (2023). https://doi.org/10.48550/ARXIV.2312.04474 arXiv:2312.04474

work page doi:10.48550/arxiv.2312.04474 2023
[22]

Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, and et al. 2023. StarCoder: may the source be with you! CoRR abs/2305.06161 (2023). https://doi.org/10.48550/ARXIV.2305.06161 arXiv:2305.06161

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.06161 2023
[23]

Competition-Level Code Generation with AlphaCode

Yujia Li, David H. Choi, Junyoung Chung, Nate Kushman, Julian Schrit- twieser, and et al. 2022. Competition-Level Code Generation with Alpha- Code. CoRR abs/2203.07814 (2022). https://doi.org/10.48550/ARXIV.2203.07814 arXiv:2203.07814

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2203.07814 2022
[24]

Mikael Lindvall, Dharmalingam Ganesan, Ragnar Ardal, and Robert E. Wiegand

work page
[25]

In 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, Volume 2 , Antonia Bertolino, Ger- ardo Canfora, and Sebastian G

Metamorphic Model-Based Testing Applied on NASA DAT - An Experi- ence Report. In 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, Volume 2 , Antonia Bertolino, Ger- ardo Canfora, and Sebastian G. Elbaum (Eds.). IEEE Computer Society, 129–138. https://doi.org/10.1109/ICSE.2015.348

work page doi:10.1109/icse.2015.348 2015
[26]

Huai Liu, Fei-Ching Kuo, Dave Towey, and Tsong Yueh Chen. 2014. How Ef- fectively Does Metamorphic Testing Alleviate the Oracle Problem? IEEE Trans. Software Eng. 40, 1 (2014), 4–22. https://doi.org/10.1109/TSE.2013.46

work page doi:10.1109/tse.2013.46 2014
[27]

Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, and Daxin Jiang. 2023. WizardCoder: Em- powering Code Large Language Models with Evol-Instruct. CoRR abs/2306.08568 (2023). https://doi.org/10.48550/ARXIV.2306.08568 arXiv:2306.08568

work page doi:10.48550/arxiv.2306.08568 2023
[28]

Haoyang Ma, Qingchao Shen, Yongqiang Tian, Junjie Chen, and Shing-Chi Che- ung. 2023. Fuzzing Deep Learning Compilers with HirGen. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Seattle, W A, USA, July 17-21, 2023, René Just and Gordon Fraser (Eds.). ACM, 248–260. https://doi.org/10.1145/359792...

work page doi:10.1145/3597926.3598053 2023
[29]

Lipeng Ma, Weidong Yang, Bo Xu, Sihang Jiang, Ben Fei, Jiaqing Liang, Mingjie Zhou, and Yanghua Xiao. 2024. KnowLog: Knowledge Enhanced Pre-trained Language Model for Log Understanding. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024. ACM, 32:1–32:13. https://doi.org/10.1...

work page doi:10.1145/3597503.3623304 2024
[30]

Qiuyang Mang, Aoyang Fang, Boxi Yu, Hanfei Chen, and Pinjia He. 2024. Testing Graph Database Systems via Equivalent Query Rewriting. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024 . ACM, 143:1–143:12. https://doi.org/10.1145/ 3597503.3639200

work page arXiv 2024
[31]

Hellendoorn, Bogdan Vasilescu, and Brad A

Daye Nam, Andrew Macvean, Vincent J. Hellendoorn, Bogdan Vasilescu, and Brad A. Myers. 2024. Using an LLM to Help With Code Understanding. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024 . ACM, 97:1–97:13. https://doi.org/ 10.1145/3597503.3639187

work page doi:10.1145/3597503.3639187 2024
[32]

Wang, and Xi Victoria Lin

Ansong Ni, Srini Iyer, Dragomir Radev, Veselin Stoyanov, Wen-Tau Yih, Sida I. Wang, and Xi Victoria Lin. 2023. LEVER: Learning to Verify Language-to-Code Generation with Execution. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learn- ing Research, Vol. 202), Andreas Krause, Emma ...

work page 2023
[33]

Carlos Pacheco and Michael D. Ernst. 2007. Randoop: feedback-directed random testing for Java. In Companion to the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2007, October 21-25, 2007, Montreal, Quebec, Canada , Richard P. Gabriel, David F. Bacon, Cristina Videira Lopes, and Guy L. Steel...

work page doi:10.1145/1297846.1297902 2007
[34]

Lahiri, and Mike Kaufman

Rangeet Pan, Vu Le, Nachiappan Nagappan, Sumit Gulwani, Shuvendu K. Lahiri, and Mike Kaufman. 2021. Can Program Synthesis be Used to Learn Merge Conflict MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing ASE’24, Oct 27 – Nov 1, 2024, Sacramento, California, United States Resolutions? An Empirical Analysis. In 43rd IEEE...

work page doi:10.1109/icse43902.2021.00077 2021
[35]

Sergio Segura, Gordon Fraser, Ana Belén Sánchez, and Antonio Ruiz Cortés

work page
[36]

IEEE Trans

A Survey on Metamorphic Testing. IEEE Trans. Software Eng. 42, 9 (2016), 805–824. https://doi.org/10.1109/TSE.2016.2532875

work page doi:10.1109/tse.2016.2532875 2016
[37]

Sergio Segura, José Antonio Parejo, Javier Troya, and Antonio Ruiz Cortés. 2018. Metamorphic Testing of RESTful Web APIs. IEEE Trans. Software Eng. 44, 11 (2018), 1083–1099. https://doi.org/10.1109/TSE.2017.2764464

work page doi:10.1109/tse.2017.2764464 2018
[38]

Sergio Segura, José Antonio Parejo, Javier Troya, and Antonio Ruiz Cortés. 2018. Metamorphic testing of RESTful web APIs. InProceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 882. https://doi.org/10.11...

work page doi:10.1145/3180155.3182528 2018
[39]

Bo Shen, Jiaxin Zhang, Taihong Chen, Daoguang Zan, Bing Geng, An Fu, Muhan Zeng, Ailun Yu, Jichuan Ji, Jingyang Zhao, Yuenan Guo, and Qianxiang Wang

work page
[40]

CoRR abs/2307.14936 (2023)

PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback. CoRR abs/2307.14936 (2023). https://doi.org/10.48550/ARXIV.2307. 14936 arXiv:2307.14936

work page doi:10.48550/arxiv.2307 2023
[41]

Seung Yeob Shin, Fabrizio Pastore, Domenico Bianculli, and Alexandra Baicoianu

work page
[43]

Chengnian Sun, Vu Le, and Zhendong Su. 2016. Finding compiler bugs via live code mutation. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, part of SPLASH 2016, Amsterdam, The Netherlands, October 30 - November 4, 2016 , Eelco Visser and Yannis Smaragdakis (E...

work page doi:10.1145/2983990.2984038 2016
[44]

Chang-Ai Sun, Yiqiang Liu, Zuoyi Wang, and W. K. Chan. 2016. 𝜇MT: a data mutation directed metamorphic relation acquisition methodology. In Proceedings of the 1st International Workshop on Metamorphic Testing, MET@ICSE 2016, Austin, Texas, USA, May 16, 2016 . ACM, 12–18. https://doi.org/10.1145/2896971.2896974

work page doi:10.1145/2896971.2896974 2016
[45]

Yutian Tang, Zhijie Liu, Zhichao Zhou, and Xiapu Luo. 2024. ChatGPT vs SBST: A Comparative Assessment of Unit Test Suite Generation. IEEE Transactions on Software Engineering (2024), 1–19. https://doi.org/10.1109/TSE.2024.3382365

work page doi:10.1109/tse.2024.3382365 2024
[46]

MR-Adopt. 2024. MR-Adopt. Retrieved June 6, 2024 from https://mr-adopt. github.io/

work page 2024
[47]

Christos Tsigkanos, Pooja Rani, Sebastian Müller, and Timo Kehrer. 2023. Variable Discovery with Large Language Models for Metamorphic Testing of Scientific Software. In Computational Science - ICCS 2023 - 23rd International Conference, Prague, Czech Republic, July 3-5, 2023, Proceedings, Part I (Lecture Notes in Com- puter Science, Vol. 14073) , Jirí Mik...

work page doi:10.1007/978-3-031-35995-8_23 2023
[48]

Ying Wang, Bihuan Chen, Kaifeng Huang, Bowen Shi, Congying Xu, Xin Peng, Yijian Wu, and Yang Liu. 2020. An Empirical Study of Usages, Updates and Risks of Third-Party Libraries in Java Projects. In IEEE International Conference on Software Maintenance and Evolution, ICSME 2020, Adelaide, Australia, September 28 - October 2, 2020. IEEE, 35–45. https://doi....

work page doi:10.1109/icsme46990.2020.00014 2020
[49]

Taylor Webb, Keith J Holyoak, and Hongjing Lu. 2023. Emergent analogical reasoning in large language models. Nature Human Behaviour 7, 9 (2023), 1526– 1541

work page 2023
[50]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompt- ing Elicits Reasoning in Large Language Models. In Advances in Neural Infor- mation Processing Systems 35: Annual Conference on Neural Information Pro- cessing Systems 2022, NeurIPS 2022, New Orleans, LA, USA...

work page 2022
[51]

Yi Wu, Nan Jiang, Hung Viet Pham, Thibaud Lutellier, Jordan Davis, Lin Tan, Petr Babkin, and Sameena Shah. 2023. How Effective Are Neural Networks for Fixing Security Vulnerabilities. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Seattle, W A, USA, July 17-21, 2023, René Just and Gordon Fraser...

work page doi:10.1145/3597926.3598135 2023
[52]

Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian, Michael Pradel, and Lingming Zhang. 2024. Fuzz4All: Universal Fuzzing with Large Language Models. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024 . ACM, 126:1–126:13. https://doi. org/10.1145/3597503.3639121

work page doi:10.1145/3597503.3639121 2024
[53]

Congying Xu, Valerio Terragni, Hengcheng Zhu, Jiarong Wu, and Shing-Chi Cheung. 2024. MR-Scout: Automated Synthesis of Metamorphic Relations from Existing Test Cases. ACM Trans. Softw. Eng. Methodol. (Apr 2024). https://doi. org/10.1145/3656340 Just Accepted

work page doi:10.1145/3656340 2024
[54]

Chen Yang, Junjie Chen, Bin Lin, Jianyi Zhou, and Ziqi Wang. 2024. Enhancing LLM-based Test Generation for Hard-to-Cover Branches via Program Analy- sis. CoRR abs/2404.04966 (2024). https://doi.org/10.48550/ARXIV.2404.04966 arXiv:2404.04966

work page doi:10.48550/arxiv.2404.04966 2024
[55]

Zhen Yang, Fang Liu, Zhongxing Yu, Jacky Wai Keung, Jia Li, Shuo Liu, Yifan Hong, Xiaoxue Ma, Zhi Jin, and Ge Li. 2024. Exploring and Unleashing the Power of Large Language Models in Automated Code Translation. CoRR abs/2404.14646 (2024). https://doi.org/10.48550/ARXIV.2404.14646 arXiv:2404.14646

work page doi:10.48550/arxiv.2404.14646 2024
[56]

Chi, and Denny Zhou

Michihiro Yasunaga, Xinyun Chen, Yujia Li, Panupong Pasupat, Jure Leskovec, Percy Liang, Ed H. Chi, and Denny Zhou. 2023. Large Language Models as Analogical Reasoners. CoRR abs/2310.01714 (2023). https://doi.org/10.48550/ ARXIV.2310.01714 arXiv:2310.01714

work page arXiv 2023
[57]

Zhiqiang Yuan, Mingwei Liu, Shiji Ding, Kaixin Wang, Yixuan Chen, Xin Peng, and Yiling Lou. 2024. Evaluating and improving chatgpt for unit test generation. Proceedings of the ACM on Software Engineering 1, FSE (2024), 1703–1726

work page 2024
[58]

Zhiqiang Yuan, Yiling Lou, Mingwei Liu, Shiji Ding, Kaixin Wang, Yixuan Chen, and Xin Peng. 2023. No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation. CoRR abs/2305.04207 (2023). https://doi.org/10.48550/ ARXIV.2305.04207 arXiv:2305.04207

work page arXiv 2023
[59]

Bo Zhang, Hongyu Zhang, Junjie Chen, Dan Hao, and Pablo Moscato. 2019. Automatic Discovery and Cleansing of Numerical Metamorphic Relations. In 2019 IEEE International Conference on Software Maintenance and Evolution, ICSME 2019, Cleveland, OH, USA, September 29 - October 4, 2019 . IEEE, 235–245. https: //doi.org/10.1109/ICSME.2019.00035

work page doi:10.1109/icsme.2019.00035 2019
[60]

Jie Zhang, Junjie Chen, Dan Hao, Yingfei Xiong, Bing Xie, Lu Zhang, and Hong Mei. 2014. Search-based inference of polynomial metamorphic relations. In ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, Vasteras, Sweden - September 15 - 19, 2014 , Ivica Crnkovic, Marsha Chechik, and Paul Grünbacher (Eds.). ACM, 701–712. https://d...

work page doi:10.1145/2642937.2642994 2014
[61]

Kechi Zhang, Jia Li, Ge Li, Xianjie Shi, and Zhi Jin. 2024. CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges. CoRR abs/2401.07339 (2024). https://doi.org/10.48550/ARXIV. 2401.07339 arXiv:2401.07339

work page internal anchor Pith review doi:10.48550/arxiv 2024
[62]

Zhi Quan Zhou, Liqun Sun, Tsong Yueh Chen, and Dave Towey. 2020. Meta- morphic Relations for Enhancing System Understanding and Use. IEEE Trans. Software Eng. 46, 10 (2020), 1120–1154. https://doi.org/10.1109/TSE.2018.2876433

work page doi:10.1109/tse.2018.2876433 2020

[1] [1]

Rajeev Alur, Rastislav Bodík, Garvit Juniwal, Milo M. K. Martin, Mukund Raghothaman, Sanjit A. Seshia, Rishabh Singh, Armando Solar-Lezama, Emina Torlak, and Abhishek Udupa. 2013. Syntax-guided synthesis. In Formal Methods in Computer-Aided Design, FMCAD 2013, Portland, OR, USA, October 20-23, 2013 . IEEE, 1–8. https://ieeexplore.ieee.org/document/6679385/

work page arXiv 2013

[2] [2]

Jialun Cao, Wuqi Zhang, and Shing-Chi Cheung. 2024. Concerned with Data Contamination? Assessing Countermeasures in Code Language Model. CoRR abs/2403.16898 (2024). https://doi.org/10.48550/ARXIV.2403.16898 arXiv:2403.16898

work page doi:10.48550/arxiv.2403.16898 2024

[3] [3]

Junkai Chen, Xing Hu, Zhenhao Li, Cuiyun Gao, Xin Xia, and David Lo. 2024. Code Search is All You Need? Improving Code Suggestions with Code Search. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). Association for Computing Machinery, New York, NY, USA, Article 73, 13 pages. https://doi.org/10.1145/3597503.3639085

work page doi:10.1145/3597503.3639085 2024

[4] [4]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pondé de Oliveira Pinto, and et al. 2021. Evaluating Large Language Models Trained on Code. CoRR abs/2107.03374 (2021). arXiv:2107.03374 https://arxiv.org/abs/ 2107.03374

work page internal anchor Pith review Pith/arXiv arXiv 2021

[5] [5]

Songqiang Chen, Shuo Jin, and Xiaoyuan Xie. 2021. Testing Your Question Answering Software via Asking Recursively. In 36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021 . IEEE, 104–116. https://doi.org/10.1109/ASE51524.2021. 9678670

work page doi:10.1109/ase51524.2021 2021

[6] [6]

Tsong Yueh Chen, Fei-Ching Kuo, Huai Liu, Pak-Lok Poon, Dave Towey, T. H. Tse, and Zhi Quan Zhou. 2018. Metamorphic Testing: A Review of Challenges and Opportunities. ACM Comput. Surv. 51, 1 (2018), 4:1–4:27. https://doi.org/10. 1145/3143561

work page 2018

[7] [7]

Shihan Dou, Yan Liu, Haoxiang Jia, Limao Xiong, Enyu Zhou, Wei Shen, Junjie Shan, Caishuang Huang, Xiao Wang, Xiaoran Fan, Zhiheng Xi, Yuhao Zhou, Tao Ji, Rui Zheng, Qi Zhang, Xuanjing Huang, and Tao Gui. 2024. StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feed- back. CoRR abs/2402.01391 (2024). https://doi.org/10.48550/ARXI...

work page doi:10.48550/arxiv.2402.01391 2024

[8] [8]

Xueying Du, Mingwei Liu, Kaixin Wang, Hanlin Wang, Junwei Liu, Yixuan Chen, Jiayi Feng, Chaofeng Sha, Xin Peng, and Yiling Lou. 2024. Evaluating Large Language Models in Class-Level Code Generation. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024 . ACM, 81:1–81:13. https:...

work page doi:10.1145/3597503 2024

[9] [9]

Aryaz Eghbali and Michael Pradel. 2024. De-Hallucinator: Iterative Grounding for LLM-Based Code Completion. CoRR abs/2401.01701 (2024). https://doi.org/ 10.48550/ARXIV.2401.01701 arXiv:2401.01701

work page doi:10.48550/arxiv.2401.01701 2024

[10] [10]

Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: automatic test suite generation for object-oriented software. In SIGSOFT/FSE’11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC’11: 13th European Software Engineering Conference (ESEC-13), Szeged, Hungary, September 5-9, 2011 , Tibor Gyimóthy and Andreas Zeller (Eds.)...

work page arXiv 2011

[11] [11]

Mingyang Geng, Shangwen Wang, Dezun Dong, Haotian Wang, Ge Li, Zhi Jin, Xiaoguang Mao, and Xiangke Liao. 2024. Large Language Models are Few-Shot Summarizers: Multi-Intent Comment Generation via In-Context Learning. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024 . ACM, 3...

work page doi:10.1145/3597503.3608134 2024

[12] [12]

Sumit Gulwani. 2011. Automating string processing in spreadsheets using input- output examples. In Proceedings of the 38th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2011, Austin, TX, USA, January 26-28, 2011, Thomas Ball and Mooly Sagiv (Eds.). ACM, 317–330. https://doi.org/ 10.1145/1926385.1926423

work page doi:10.1145/1926385.1926423 2011

[13] [13]

Daya Guo, Qihao Zhu, Dejian Yang, Zhenda Xie, Kai Dong, Wentao Zhang, Guanting Chen, Xiao Bi, Y. Wu, Y. K. Li, Fuli Luo, Yingfei Xiong, and Wen- feng Liang. 2024. DeepSeek-Coder: When the Large Language Model Meets Programming - The Rise of Code Intelligence. CoRR abs/2401.14196 (2024). https://doi.org/10.48550/ARXIV.2401.14196 arXiv:2401.14196

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2401.14196 2024

[14] [14]

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. 2020. The Curious Case of Neural Text Degeneration. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020 . OpenReview.net. https://openreview.net/forum?id=rygGQyrFvH

work page 2020

[15] [15]

Kaifeng Huang, Bihuan Chen, Congying Xu, Ying Wang, Bowen Shi, Xin Peng, Yijian Wu, and Yang Liu. 2022. Characterizing usages, updates and risks of third-party libraries in Java projects. Empir. Softw. Eng. 27, 4 (2022), 90. https: //doi.org/10.1007/s10664-022-10131-8

work page doi:10.1007/s10664-022-10131-8 2022

[16] [16]

Maliheh Izadi, Jonathan Katzy, Tim van Dam, Marc Otten, Razvan Mihai Popescu, and Arie van Deursen. 2024. Language Models for Code Completion: A Practical Evaluation. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024 . ACM, 79:1– 79:13. https://doi.org/10.1145/3597503.3639138

work page doi:10.1145/3597503.3639138 2024

[17] [17]

Shuyang Jiang, Yuhao Wang, and Yu Wang. 2023. SelfEvolve: A Code Evolution Framework via Large Language Models. CoRR abs/2306.02907 (2023). https: //doi.org/10.48550/ARXIV.2306.02907 arXiv:2306.02907

work page doi:10.48550/arxiv.2306.02907 2023

[18] [18]

Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler validation via equivalence modulo inputs. In ACM SIGPLAN Conference on Programming Lan- guage Design and Implementation, PLDI ’14, Edinburgh, United Kingdom - June 09 - 11, 2014 , Michael F. P. O’Boyle and Keshav Pingali (Eds.). ACM, 216–226. https://doi.org/10.1145/2594291.2594334

work page doi:10.1145/2594291.2594334 2014

[19] [19]

Lahiri, and Siddhartha Sen

Caroline Lemieux, Jeevana Priya Inala, Shuvendu K. Lahiri, and Siddhartha Sen

work page

[20] [20]

In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023

CodaMosa: Escaping Coverage Plateaus in Test Generation with Pre-trained Large Language Models. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023 . IEEE, 919–931. https://doi.org/10.1109/ICSE48619.2023.00085

work page doi:10.1109/icse48619.2023.00085 2023

[21] [21]

Chengshu Li, Jacky Liang, Andy Zeng, Xinyun Chen, Karol Hausman, Dorsa Sadigh, Sergey Levine, Li Fei-Fei, Fei Xia, and Brian Ichter. 2023. Chain of Code: Reasoning with a Language Model-Augmented Code Emulator. CoRR abs/2312.04474 (2023). https://doi.org/10.48550/ARXIV.2312.04474 arXiv:2312.04474

work page doi:10.48550/arxiv.2312.04474 2023

[22] [22]

Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, and et al. 2023. StarCoder: may the source be with you! CoRR abs/2305.06161 (2023). https://doi.org/10.48550/ARXIV.2305.06161 arXiv:2305.06161

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.06161 2023

[23] [23]

Competition-Level Code Generation with AlphaCode

Yujia Li, David H. Choi, Junyoung Chung, Nate Kushman, Julian Schrit- twieser, and et al. 2022. Competition-Level Code Generation with Alpha- Code. CoRR abs/2203.07814 (2022). https://doi.org/10.48550/ARXIV.2203.07814 arXiv:2203.07814

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2203.07814 2022

[24] [24]

Mikael Lindvall, Dharmalingam Ganesan, Ragnar Ardal, and Robert E. Wiegand

work page

[25] [25]

In 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, Volume 2 , Antonia Bertolino, Ger- ardo Canfora, and Sebastian G

Metamorphic Model-Based Testing Applied on NASA DAT - An Experi- ence Report. In 37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, Volume 2 , Antonia Bertolino, Ger- ardo Canfora, and Sebastian G. Elbaum (Eds.). IEEE Computer Society, 129–138. https://doi.org/10.1109/ICSE.2015.348

work page doi:10.1109/icse.2015.348 2015

[26] [26]

Huai Liu, Fei-Ching Kuo, Dave Towey, and Tsong Yueh Chen. 2014. How Ef- fectively Does Metamorphic Testing Alleviate the Oracle Problem? IEEE Trans. Software Eng. 40, 1 (2014), 4–22. https://doi.org/10.1109/TSE.2013.46

work page doi:10.1109/tse.2013.46 2014

[27] [27]

Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, and Daxin Jiang. 2023. WizardCoder: Em- powering Code Large Language Models with Evol-Instruct. CoRR abs/2306.08568 (2023). https://doi.org/10.48550/ARXIV.2306.08568 arXiv:2306.08568

work page doi:10.48550/arxiv.2306.08568 2023

[28] [28]

Haoyang Ma, Qingchao Shen, Yongqiang Tian, Junjie Chen, and Shing-Chi Che- ung. 2023. Fuzzing Deep Learning Compilers with HirGen. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Seattle, W A, USA, July 17-21, 2023, René Just and Gordon Fraser (Eds.). ACM, 248–260. https://doi.org/10.1145/359792...

work page doi:10.1145/3597926.3598053 2023

[29] [29]

Lipeng Ma, Weidong Yang, Bo Xu, Sihang Jiang, Ben Fei, Jiaqing Liang, Mingjie Zhou, and Yanghua Xiao. 2024. KnowLog: Knowledge Enhanced Pre-trained Language Model for Log Understanding. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024. ACM, 32:1–32:13. https://doi.org/10.1...

work page doi:10.1145/3597503.3623304 2024

[30] [30]

Qiuyang Mang, Aoyang Fang, Boxi Yu, Hanfei Chen, and Pinjia He. 2024. Testing Graph Database Systems via Equivalent Query Rewriting. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024 . ACM, 143:1–143:12. https://doi.org/10.1145/ 3597503.3639200

work page arXiv 2024

[31] [31]

Hellendoorn, Bogdan Vasilescu, and Brad A

Daye Nam, Andrew Macvean, Vincent J. Hellendoorn, Bogdan Vasilescu, and Brad A. Myers. 2024. Using an LLM to Help With Code Understanding. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024 . ACM, 97:1–97:13. https://doi.org/ 10.1145/3597503.3639187

work page doi:10.1145/3597503.3639187 2024

[32] [32]

Wang, and Xi Victoria Lin

Ansong Ni, Srini Iyer, Dragomir Radev, Veselin Stoyanov, Wen-Tau Yih, Sida I. Wang, and Xi Victoria Lin. 2023. LEVER: Learning to Verify Language-to-Code Generation with Execution. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA (Proceedings of Machine Learn- ing Research, Vol. 202), Andreas Krause, Emma ...

work page 2023

[33] [33]

Carlos Pacheco and Michael D. Ernst. 2007. Randoop: feedback-directed random testing for Java. In Companion to the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2007, October 21-25, 2007, Montreal, Quebec, Canada , Richard P. Gabriel, David F. Bacon, Cristina Videira Lopes, and Guy L. Steel...

work page doi:10.1145/1297846.1297902 2007

[34] [34]

Lahiri, and Mike Kaufman

Rangeet Pan, Vu Le, Nachiappan Nagappan, Sumit Gulwani, Shuvendu K. Lahiri, and Mike Kaufman. 2021. Can Program Synthesis be Used to Learn Merge Conflict MR-Adopt: Automatic Deduction of Input Transformation Function for Metamorphic Testing ASE’24, Oct 27 – Nov 1, 2024, Sacramento, California, United States Resolutions? An Empirical Analysis. In 43rd IEEE...

work page doi:10.1109/icse43902.2021.00077 2021

[35] [35]

Sergio Segura, Gordon Fraser, Ana Belén Sánchez, and Antonio Ruiz Cortés

work page

[36] [36]

IEEE Trans

A Survey on Metamorphic Testing. IEEE Trans. Software Eng. 42, 9 (2016), 805–824. https://doi.org/10.1109/TSE.2016.2532875

work page doi:10.1109/tse.2016.2532875 2016

[37] [37]

Sergio Segura, José Antonio Parejo, Javier Troya, and Antonio Ruiz Cortés. 2018. Metamorphic Testing of RESTful Web APIs. IEEE Trans. Software Eng. 44, 11 (2018), 1083–1099. https://doi.org/10.1109/TSE.2017.2764464

work page doi:10.1109/tse.2017.2764464 2018

[38] [38]

Sergio Segura, José Antonio Parejo, Javier Troya, and Antonio Ruiz Cortés. 2018. Metamorphic testing of RESTful web APIs. InProceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 882. https://doi.org/10.11...

work page doi:10.1145/3180155.3182528 2018

[39] [39]

Bo Shen, Jiaxin Zhang, Taihong Chen, Daoguang Zan, Bing Geng, An Fu, Muhan Zeng, Ailun Yu, Jichuan Ji, Jingyang Zhao, Yuenan Guo, and Qianxiang Wang

work page

[40] [40]

CoRR abs/2307.14936 (2023)

PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback. CoRR abs/2307.14936 (2023). https://doi.org/10.48550/ARXIV.2307. 14936 arXiv:2307.14936

work page doi:10.48550/arxiv.2307 2023

[41] [41]

Seung Yeob Shin, Fabrizio Pastore, Domenico Bianculli, and Alexandra Baicoianu

work page

[42] [43]

Chengnian Sun, Vu Le, and Zhendong Su. 2016. Finding compiler bugs via live code mutation. In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2016, part of SPLASH 2016, Amsterdam, The Netherlands, October 30 - November 4, 2016 , Eelco Visser and Yannis Smaragdakis (E...

work page doi:10.1145/2983990.2984038 2016

[43] [44]

Chang-Ai Sun, Yiqiang Liu, Zuoyi Wang, and W. K. Chan. 2016. 𝜇MT: a data mutation directed metamorphic relation acquisition methodology. In Proceedings of the 1st International Workshop on Metamorphic Testing, MET@ICSE 2016, Austin, Texas, USA, May 16, 2016 . ACM, 12–18. https://doi.org/10.1145/2896971.2896974

work page doi:10.1145/2896971.2896974 2016

[44] [45]

Yutian Tang, Zhijie Liu, Zhichao Zhou, and Xiapu Luo. 2024. ChatGPT vs SBST: A Comparative Assessment of Unit Test Suite Generation. IEEE Transactions on Software Engineering (2024), 1–19. https://doi.org/10.1109/TSE.2024.3382365

work page doi:10.1109/tse.2024.3382365 2024

[45] [46]

MR-Adopt. 2024. MR-Adopt. Retrieved June 6, 2024 from https://mr-adopt. github.io/

work page 2024

[46] [47]

Christos Tsigkanos, Pooja Rani, Sebastian Müller, and Timo Kehrer. 2023. Variable Discovery with Large Language Models for Metamorphic Testing of Scientific Software. In Computational Science - ICCS 2023 - 23rd International Conference, Prague, Czech Republic, July 3-5, 2023, Proceedings, Part I (Lecture Notes in Com- puter Science, Vol. 14073) , Jirí Mik...

work page doi:10.1007/978-3-031-35995-8_23 2023

[47] [48]

Ying Wang, Bihuan Chen, Kaifeng Huang, Bowen Shi, Congying Xu, Xin Peng, Yijian Wu, and Yang Liu. 2020. An Empirical Study of Usages, Updates and Risks of Third-Party Libraries in Java Projects. In IEEE International Conference on Software Maintenance and Evolution, ICSME 2020, Adelaide, Australia, September 28 - October 2, 2020. IEEE, 35–45. https://doi....

work page doi:10.1109/icsme46990.2020.00014 2020

[48] [49]

Taylor Webb, Keith J Holyoak, and Hongjing Lu. 2023. Emergent analogical reasoning in large language models. Nature Human Behaviour 7, 9 (2023), 1526– 1541

work page 2023

[49] [50]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompt- ing Elicits Reasoning in Large Language Models. In Advances in Neural Infor- mation Processing Systems 35: Annual Conference on Neural Information Pro- cessing Systems 2022, NeurIPS 2022, New Orleans, LA, USA...

work page 2022

[50] [51]

Yi Wu, Nan Jiang, Hung Viet Pham, Thibaud Lutellier, Jordan Davis, Lin Tan, Petr Babkin, and Sameena Shah. 2023. How Effective Are Neural Networks for Fixing Security Vulnerabilities. In Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2023, Seattle, W A, USA, July 17-21, 2023, René Just and Gordon Fraser...

work page doi:10.1145/3597926.3598135 2023

[51] [52]

Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian, Michael Pradel, and Lingming Zhang. 2024. Fuzz4All: Universal Fuzzing with Large Language Models. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, ICSE 2024, Lisbon, Portugal, April 14-20, 2024 . ACM, 126:1–126:13. https://doi. org/10.1145/3597503.3639121

work page doi:10.1145/3597503.3639121 2024

[52] [53]

Congying Xu, Valerio Terragni, Hengcheng Zhu, Jiarong Wu, and Shing-Chi Cheung. 2024. MR-Scout: Automated Synthesis of Metamorphic Relations from Existing Test Cases. ACM Trans. Softw. Eng. Methodol. (Apr 2024). https://doi. org/10.1145/3656340 Just Accepted

work page doi:10.1145/3656340 2024

[53] [54]

Chen Yang, Junjie Chen, Bin Lin, Jianyi Zhou, and Ziqi Wang. 2024. Enhancing LLM-based Test Generation for Hard-to-Cover Branches via Program Analy- sis. CoRR abs/2404.04966 (2024). https://doi.org/10.48550/ARXIV.2404.04966 arXiv:2404.04966

work page doi:10.48550/arxiv.2404.04966 2024

[54] [55]

Zhen Yang, Fang Liu, Zhongxing Yu, Jacky Wai Keung, Jia Li, Shuo Liu, Yifan Hong, Xiaoxue Ma, Zhi Jin, and Ge Li. 2024. Exploring and Unleashing the Power of Large Language Models in Automated Code Translation. CoRR abs/2404.14646 (2024). https://doi.org/10.48550/ARXIV.2404.14646 arXiv:2404.14646

work page doi:10.48550/arxiv.2404.14646 2024

[55] [56]

Chi, and Denny Zhou

Michihiro Yasunaga, Xinyun Chen, Yujia Li, Panupong Pasupat, Jure Leskovec, Percy Liang, Ed H. Chi, and Denny Zhou. 2023. Large Language Models as Analogical Reasoners. CoRR abs/2310.01714 (2023). https://doi.org/10.48550/ ARXIV.2310.01714 arXiv:2310.01714

work page arXiv 2023

[56] [57]

Zhiqiang Yuan, Mingwei Liu, Shiji Ding, Kaixin Wang, Yixuan Chen, Xin Peng, and Yiling Lou. 2024. Evaluating and improving chatgpt for unit test generation. Proceedings of the ACM on Software Engineering 1, FSE (2024), 1703–1726

work page 2024

[57] [58]

Zhiqiang Yuan, Yiling Lou, Mingwei Liu, Shiji Ding, Kaixin Wang, Yixuan Chen, and Xin Peng. 2023. No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation. CoRR abs/2305.04207 (2023). https://doi.org/10.48550/ ARXIV.2305.04207 arXiv:2305.04207

work page arXiv 2023

[58] [59]

Bo Zhang, Hongyu Zhang, Junjie Chen, Dan Hao, and Pablo Moscato. 2019. Automatic Discovery and Cleansing of Numerical Metamorphic Relations. In 2019 IEEE International Conference on Software Maintenance and Evolution, ICSME 2019, Cleveland, OH, USA, September 29 - October 4, 2019 . IEEE, 235–245. https: //doi.org/10.1109/ICSME.2019.00035

work page doi:10.1109/icsme.2019.00035 2019

[59] [60]

Jie Zhang, Junjie Chen, Dan Hao, Yingfei Xiong, Bing Xie, Lu Zhang, and Hong Mei. 2014. Search-based inference of polynomial metamorphic relations. In ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, Vasteras, Sweden - September 15 - 19, 2014 , Ivica Crnkovic, Marsha Chechik, and Paul Grünbacher (Eds.). ACM, 701–712. https://d...

work page doi:10.1145/2642937.2642994 2014

[60] [61]

Kechi Zhang, Jia Li, Ge Li, Xianjie Shi, and Zhi Jin. 2024. CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges. CoRR abs/2401.07339 (2024). https://doi.org/10.48550/ARXIV. 2401.07339 arXiv:2401.07339

work page internal anchor Pith review doi:10.48550/arxiv 2024

[61] [62]

Zhi Quan Zhou, Liqun Sun, Tsong Yueh Chen, and Dave Towey. 2020. Meta- morphic Relations for Enhancing System Understanding and Use. IEEE Trans. Software Eng. 46, 10 (2020), 1120–1154. https://doi.org/10.1109/TSE.2018.2876433

work page doi:10.1109/tse.2018.2876433 2020