pith. sign in

arxiv: 2304.07548 · v5 · submitted 2023-04-15 · 💻 cs.SE

MR-Scout: Automated Synthesis of Metamorphic Relations from Existing Test Cases

Pith reviewed 2026-05-24 09:25 UTC · model grok-4.3

classification 💻 cs.SE
keywords metamorphic testingmetamorphic relationstest case synthesisautomated test generationsoftware testingopen source projectstest coveragemutation testing
0
0 comments X

The pith

MR-Scout automatically turns developer test cases into reusable metamorphic relations that generate new tests for similar programs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MR-Scout to mine test cases that already encode metamorphic relations from open-source projects and convert those relations into parameterized, reusable methods. These codified relations are then filtered for quality before being applied to generate additional tests. Experiments show the method located over 11,000 such relations across 701 projects, with more than 97 percent proving high quality. When the relations are used to create new tests, line coverage rises 13.52 percent and mutation scores rise 9.42 percent even on programs that already possess developer-written tests. A separate study finds that 55.76 to 76.92 percent of the relations are readily understandable by developers.

Core claim

MR-Scout discovers MR-encoded test cases in existing OSS test suites, synthesizes the embedded relations into codified parameterized methods, discards low-quality ones, and shows that the retained relations can be applied to other programs sharing similar functionality to produce new tests that measurably raise line coverage and mutation scores.

What carries the argument

The pipeline that identifies MR-encoded test cases (MTCs), synthesizes them into codified MRs, and filters them according to their effectiveness at generating new test cases.

If this is right

  • Codified MRs extracted from one program can be reused to test other programs with overlapping functionality.
  • Tests generated from the codified MRs raise line coverage by 13.52 percent and mutation score by 9.42 percent on programs that already have developer tests.
  • Over 97 percent of the synthesized relations pass quality checks for automated test generation.
  • Between 55.76 and 76.92 percent of the codified MRs are considered easily comprehensible by developers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • A shared repository of such codified relations could serve as a starting library for metamorphic testing across many projects.
  • The technique might surface implementation differences between programs that claim the same functionality.
  • The same mining approach could be tried on other forms of implicit test knowledge beyond metamorphic relations.

Load-bearing premise

Metamorphic relations discovered in test cases written for one program can be safely transferred to other programs that merely share similar functionality without introducing false positives or missing domain-specific constraints.

What would settle it

A case in which a codified MR, when used to generate tests for a similar program, either accepts an implementation that violates the intended relation or rejects a correct implementation.

Figures

Figures reproduced from arXiv: 2304.07548 by Congying Xu, Hengcheng Zhu, Jiarong Wu, Shing-Chi Cheung, Valerio Terragni.

Figure 1
Figure 1. Figure 1: A test case crafted from com.itextpdf.layout.renderer.TextRendererTest in project iText. Un￾derlying MR: IF text2 = text1.setBold() THEN text1.width() ≤ text2 .width(). (the associated MR is |𝑃 (𝑎, 𝑏,𝐺)| = |𝑃 (𝑏, 𝑎,𝐺)|). An advantage of MT is that an MR can serve as an oracle that is applicable to many test inputs. It enables automated test case generation by integrating MRs with automatically generated te… view at source ↗
Figure 2
Figure 2. Figure 2: A test case crafted from com.conversantmedia.util.concurrent.ConcurrentStackTest in project Disruptor. Underlying MR: x = stack.push(x).pop() — IF an element 𝑥 is pushed onto a stack and the stack subsequently pops off the top element, THEN the element 𝑥 should be the one popped. MR-Scout: Automated Synthesis of Metamorphic Relations from Existing Test Cases 1:5 1 @Test 2 public void pushPopTest() throws E… view at source ↗
Figure 4
Figure 4. Figure 4: Overview of MR-Scout ⟨𝑥1, 𝑥2⟩ have the relation x2 .receObj = push(x1) (R𝑖) , THEN the output relation pop(x2) = x1.arg (R𝑜 ) is expected to be satisfied. In this test case, x1.receObj and x1.arg are implemented with stack1 and 3, and the invocation push(x1) is implemented as stack1.push(3). Similarly, x2 .receObj and pop(x2) are implemented with stack2 and stack2.pop() (pop() does not require any argument… view at source ↗
Figure 5
Figure 5. Figure 5: Procedure of MR-Scout operating on the MTC simulateWidth() [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of constructing a codified MR [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Distribution of 11,350 MR-encoded test cases (MTCs) in 701 projects w.r.t the number and percentage |MI|=2 64.13% |MI|>2 35.87% (a) Size of involved MI (|MI|) w/ IT 27.80% w/o IT 72.20% (b) Existence of IT, when |MI|=2 [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of 21,574 MR instances w.r.t the size of involved method invocations (|MI|) and the existence of an input transformation (IT) Distribution of MTCs. The distribution of MTCs provides insights into how MTCs are spread across projects. The distribution of 11,350 MTCs in the 701 projects varies significantly, ranging from a single MTC to 500 MTCs. As shown in Figure 7a, the majority of the project… view at source ↗
Figure 10
Figure 10. Figure 10: Distribution of generated valid inputs 4.3.2 Result. Out of 71 MR-Scout output MRs, we found that 97.18% (69) of MRs are high-quality and even applicable to all valid inputs. Two codified MRs are low-quality. 16 (out of 24) valid inputs of the two codified MRs result in AssertionError alarms. After manually analyzing, we found that the 2 codified MRs are indeed of low quality. For example, the simplified … view at source ↗
Figure 11
Figure 11. Figure 11: Enhancement of test adequacy by codified-MR-based test suites ( [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Comparison of covered and killed mutants by developer-written ( [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Comprehensibiliy scores of 52 MR-Scout synthesized MRs (Score: 1. very difficult, 2. difficult, 3. easy 4. every easy to understand) 5 DISCUSSION 5.1 Threats to Validity We have identified potential threats to the validity of our experiments and have taken measures to mitigate them. Subjectivity in Human Judgment. The evaluation of precision (RQ1) and comprehensibility (RQ4) depends on human judgment. To … view at source ↗
read the original abstract

Metamorphic Testing (MT) alleviates the oracle problem by defining oracles based on metamorphic relations (MRs), that govern multiple related inputs and their outputs. However, designing MRs is challenging, as it requires domain-specific knowledge. This hinders the widespread adoption of MT. We observe that developer-written test cases can embed domain knowledge that encodes MRs. Such encoded MRs could be synthesized for testing not only their original programs but also other programs that share similar functionalities. In this paper, we propose MR-Scout to automatically synthesize MRs from test cases in open-source software (OSS) projects. MR-Scout first discovers MR-encoded test cases (MTCs), and then synthesizes the encoded MRs into parameterized methods (called codified MRs), and filters out MRs that demonstrate poor quality for new test case generation. MR-Scout discovered over 11,000 MTCs from 701 OSS projects. Experimental results show that over 97% of codified MRs are of high quality for automated test case generation, demonstrating the practical applicability of MR-Scout. Furthermore, codified-MRs-based tests effectively enhance the test adequacy of programs with developer-written tests, leading to 13.52% and 9.42% increases in line coverage and mutation score, respectively. Our qualitative study shows that 55.76% to 76.92% of codified MRs are easily comprehensible for developers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents MR-Scout, a technique to mine metamorphic relations (MRs) encoded in existing developer-written test cases from 701 OSS projects (yielding >11,000 MR-encoded test cases). It codifies these into parameterized methods, applies a quality filter for new test generation, and reports that >97% of the resulting codified MRs are high-quality; tests derived from them improve line coverage by 13.52% and mutation score by 9.42% on programs that already have developer tests. A qualitative study finds 55.76–76.92% of the MRs comprehensible to developers.

Significance. If the transferability and quality claims hold under independent validation, the work offers a scalable, artifact-driven route to MR acquisition that could materially increase adoption of metamorphic testing. The scale of the mining study and the inclusion of a developer-comprehensibility assessment are concrete strengths that distinguish it from purely synthetic MR generators.

major comments (3)
  1. [Abstract and §5] Abstract and §5 (evaluation): the 97% 'high-quality' figure, the 13.52% coverage gain, and the 9.42% mutation-score gain are presented without an explicit, independent oracle or validity check that the synthesized MR actually holds on the target programs rather than merely producing additional passing tests; coverage and mutation metrics alone cannot distinguish a sound MR from one that silently accepts incorrect behavior on the new implementation.
  2. [§4.2] §4.2 (transfer step): the criterion used to decide that a target program 'shares similar functionality' with the source of an MTC is not formalized, so it is impossible to assess whether domain-specific constraints present in the original tests but absent from the target are being violated by the transferred MR.
  3. [§5.1] §5.1 (experimental design): the paper does not describe the baseline MR generators, the statistical tests applied to the coverage/mutation deltas, or the sampling procedure for the programs used in the transfer experiment; without these details the quantitative claims cannot be reproduced or compared.
minor comments (2)
  1. [§3] The definition of 'codified MR' (parameterized method) should be accompanied by a small illustrative example in §3 so readers can see the exact syntactic form that is later filtered and reused.
  2. [§5] Table or figure captions in the evaluation section should explicitly state the number of programs, number of MRs, and number of generated tests underlying each reported percentage.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments highlight important aspects of evaluation validity, transfer criteria, and experimental reproducibility. We address each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract and §5] Abstract and §5 (evaluation): the 97% 'high-quality' figure, the 13.52% coverage gain, and the 9.42% mutation-score gain are presented without an explicit, independent oracle or validity check that the synthesized MR actually holds on the target programs rather than merely producing additional passing tests; coverage and mutation metrics alone cannot distinguish a sound MR from one that silently accepts incorrect behavior on the new implementation.

    Authors: We acknowledge the distinction between utility (measured via coverage/mutation gains) and semantic soundness of transferred MRs. Our quality filter verifies that codified MRs generate passing tests on source programs, and gains are observed on targets with similar functionality. However, we agree these metrics do not independently confirm the MR holds for the target. We will revise §5 to explicitly define the quality criteria, clarify that coverage/mutation serve as proxies for utility rather than soundness, and add a limitations discussion with suggestions for future oracle-based validation. revision: partial

  2. Referee: [§4.2] §4.2 (transfer step): the criterion used to decide that a target program 'shares similar functionality' with the source of an MTC is not formalized, so it is impossible to assess whether domain-specific constraints present in the original tests but absent from the target are being violated by the transferred MR.

    Authors: The transfer relies on a heuristic matching of method signatures (names and parameter types) between source and target. We agree this is not formally defined, which limits assessment of constraint preservation. We will revise §4.2 to formalize the similarity criterion, state its assumptions explicitly, and discuss potential risks regarding domain-specific constraints. revision: yes

  3. Referee: [§5.1] §5.1 (experimental design): the paper does not describe the baseline MR generators, the statistical tests applied to the coverage/mutation deltas, or the sampling procedure for the programs used in the transfer experiment; without these details the quantitative claims cannot be reproduced or compared.

    Authors: These details were inadvertently omitted. We will revise §5.1 to describe the baseline MR generators, the statistical tests used for the deltas, and the sampling procedure for the transfer experiment programs, enabling reproducibility and comparison. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical measurements are independent of the synthesis method

full rationale

The paper describes an empirical pipeline that extracts MTCs from OSS test suites, codifies MRs, filters them by a quality check for new test generation, and then measures line coverage and mutation score improvements on target programs. These percentages (97% high-quality, +13.52% coverage, +9.42% mutation) are presented as direct experimental outcomes rather than quantities defined in terms of the MR-Scout algorithm itself. No equations, fitted parameters, or self-citation chains are invoked to derive the central results; the evaluation relies on external program executions and standard coverage/mutation tools. The transferability claim is an empirical observation, not a self-referential definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review yields limited visibility into modeling choices. The approach implicitly assumes that test cases contain extractable MRs and that a quality filter can be defined without circular dependence on the target programs. No explicit free parameters, new entities, or non-standard axioms are stated.

axioms (1)
  • domain assumption Developer-written test cases encode domain-specific metamorphic relations that can be extracted and reused across programs with similar functionality.
    Stated in the second sentence of the abstract as the foundational observation.

pith-pipeline@v0.9.0 · 5800 in / 1448 out tokens · 20927 ms · 2026-05-24T09:25:59.720362+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages

  1. [1]

    John Ahlgren, Maria Eugenia Berezin, Kinga Bojarczuk, Elena Dulskyte, Inna Dvortsova, Johann George, Natalija Gucevska, Mark Harman, Maria Lomeli, Erik Meijer, Silvia Sapora, and Justin Spahr-Summers. 2021. Testing Web Enabled Simulation at Scale Using Metamorphic Testing. In 43rd IEEE/ACM International Conference on Software Engineering: Software Enginee...

  2. [2]

    Leonhard Applis, Annibale Panichella, and Arie van Deursen. 2021. Assessing Robustness of ML-Based Program Analysis Tools using Metamorphic Program Transformations. In36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021 . IEEE, 1377–1381. https://doi.org/10.1109/ ASE51524.2021.9678706

  3. [3]

    Andrea Arcuri and Lionel C. Briand. 2014. A Hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering. Softw. Test. Verification Reliab. 24, 3 (2014), 219–250. https://doi.org/10.1002/STVR.1486

  4. [4]

    Jon Ayerdi, Valerio Terragni, Aitor Arrieta, Paolo Tonella, Goiuria Sagardui, and Maite Arratibel. 2021. Generating metamorphic relations for cyber-physical systems with genetic programming: an industrial case study. InESEC/FSE ’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Gr...

  5. [5]

    Jon Ayerdi, Valerio Terragni, Aitor Arrieta, Paolo Tonella, Goiuria Sagardui, and Maite Arratibel. 2022. Evolutionary generation of metamorphic relations for cyber-physical systems. In GECCO ’22: Genetic and Evolutionary Computation Conference, Companion Volume, Boston, Massachusetts, USA, July 9 - 13, 2022 , Jonathan E. Fieldsend and Markus Wagner (Eds.)...

  6. [6]

    Ernst, Mauro Pezzè, and Antonio Carzaniga

    Arianna Blasi, Alessandra Gorla, Michael D. Ernst, Mauro Pezzè, and Antonio Carzaniga. 2021. MeMo: Automatically identifying metamorphic relations in Javadoc comments for test automation. J. Syst. Softw. 181 (2021), 111041. https://doi.org/10.1016/j.jss.2021.111041

  7. [7]

    Hudson Borges and Marco Túlio Valente. 2018. What’s in a GitHub Star? Understanding Repository Starring Practices in a Social Coding Platform. J. Syst. Softw. 146 (2018), 112–129. https://doi.org/10.1016/j.jss.2018.09.016

  8. [8]

    Cristian Cadar and Koushik Sen. 2013. Symbolic execution for software testing: three decades later. Commun. ACM 56, 2 (2013), 82–90. https://doi.org/10.1145/2408776.2408795

  9. [9]

    Jialun Cao, Meiziniu Li, Yeting Li, Ming Wen, Shing-Chi Cheung, and Haiming Chen. 2022. SemMT: A Semantic-Based Testing Approach for Machine Translation Systems. ACM Trans. Softw. Eng. Methodol. 31, 2 (2022), 34e:1–34e:36. https://doi.org/10.1145/3490488

  10. [10]

    T. Y. Chen, S. C. Cheung, and S. M. Yiu. 1998. Metamorphic Testing: A New Approach for Generating Next Test Cases . Technical Report. Technical Report HKUST-CS98-01, Department of Computer Science, The Hong Kong University of Science and Technology

  11. [11]

    Tsong Yueh Chen, Fei-Ching Kuo, Huai Liu, Pak-Lok Poon, Dave Towey, T. H. Tse, and Zhi Quan Zhou. 2018. Metamorphic Testing: A Review of Challenges and Opportunities. ACM Comput. Surv. 51, 1 (2018), 4:1–4:27. https: //doi.org/10.1145/3143561 ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 1, Article 1. Publication date: March 2024. 1:26 Congying Xu, Valerio...

  12. [12]

    Tsong Yueh Chen, Pak-Lok Poon, and Xiaoyuan Xie. 2016. METRIC: METamorphic Relation Identification based on the Category-choice framework. J. Syst. Softw. 116 (2016), 177–190. https://doi.org/10.1016/j.jss.2015.07.037

  13. [13]

    Valle-Gómez, Inmaculada Medina-Bulo, and José Raúl Romero

    Pedro Delgado-Pérez, Aurora Ramírez, Kevin J. Valle-Gómez, Inmaculada Medina-Bulo, and José Raúl Romero. 2023. InterEvo-TR: Interactive Evolutionary Test Generation With Readability Assessment. IEEE Trans. Software Eng. 49, 4 (2023), 2580–2596. https://doi.org/10.1109/TSE.2022.3227418

  14. [14]

    Donaldson

    Alastair F. Donaldson. 2019. Metamorphic testing of Android graphics drivers. In Proceedings of the 4th International Workshop on Metamorphic Testing, MET@ICSE 2019, Montreal, QC, Canada, May 26, 2019 , Xiaoyuan Xie, Pak-Lok Poon, and Laura L. Pullum (Eds.). IEEE / ACM, 1. https://doi.org/10.1109/MET.2019.00008

  15. [15]

    Donaldson and Andrei Lascu

    Alastair F. Donaldson and Andrei Lascu. 2016. Metamorphic testing for (graphics) compilers. In Proceedings of the 1st International Workshop on Metamorphic Testing, MET@ICSE 2016, Austin, Texas, USA, May 16, 2016 . ACM, 44–47. https://doi.org/10.1145/2896971.2896978

  16. [16]

    EvoSuite. 2023. EvoSuite. Retrieved August 20, 2023 from https://www.evosuite.org/

  17. [17]

    Gordon Fraser and Andrea Arcuri. 2011. EvoSuite: automatic test suite generation for object-oriented software. In SIGSOFT/FSE’11 19th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-19) and ESEC’11: 13th European Software Engineering Conference (ESEC-13), Szeged, Hungary, September 5-9, 2011 , Tibor Gyimóthy and Andreas Zeller (Eds.)...

  18. [18]

    Gordon Fraser and Andrea Arcuri. 2013. EvoSuite: On the Challenges of Test Case Generation in the Real World. In Sixth IEEE International Conference on Software Testing, Verification and Validation, ICST 2013, Luxembourg, Luxembourg, March 18-22, 2013. IEEE Computer Society, 362–369. https://doi.org/10.1109/ICST.2013.51

  19. [19]

    Gordon Fraser and Andrea Arcuri. 2013. Whole Test Suite Generation. IEEE Trans. Software Eng. 39, 2, 276–291. https://doi.org/10.1109/TSE.2012.14

  20. [20]

    Gordon Fraser and Andreas Zeller. 2011. Generating parameterized unit tests. In Proceedings of the 20th International Symposium on Software Testing and Analysis, ISSTA 2011, Toronto, ON, Canada, July 17-21, 2011 , Matthew B. Dwyer and Frank Tip (Eds.). ACM, 364–374. https://doi.org/10.1145/2001420.2001464

  21. [21]

    Alessio Gambi, Gunel Jahangirova, Vincenzo Riccio, and Fiorella Zampetti. 2022. SBST Tool Competition 2022. In 15th IEEE/ACM International Workshop on Search-Based Software Testing, SBST@ICSE 2022, Pittsburgh, PA, USA, May 9, 2022 . IEEE, 25–32. https://doi.org/10.1145/3526072.3527538

  22. [22]

    GitHub. 2023. GitHub. Retrieved August 20, 2023 from https://github.com/

  23. [23]

    Grammarly. 2023. Grammarly. Retrieved August 20, 2023 from http://grammarly.com

  24. [24]

    Mark Harman, Yue Jia, and Yuanyuan Zhang. 2015. Achievements, Open Problems and Challenges for Search Based Software Testing. In 8th IEEE International Conference on Software Testing, Verification and Validation, ICST 2015, Graz, Austria, April 13-17, 2015. IEEE Computer Society, 1–12. https://doi.org/10.1109/ICST.2015.7102580

  25. [25]

    N Alan Heckert, James J Filliben, C M Croarkin, B Hembree, William F Guthrie, P Tobias, and J Prinz. 2002. Handbook 151: Nist/sematech e-handbook of statistical methods. (2002)

  26. [26]

    Kaifeng Huang, Bihuan Chen, Congying Xu, Ying Wang, Bowen Shi, Xin Peng, Yijian Wu, and Yang Liu. 2022. Characterizing usages, updates and risks of third-party libraries in Java projects. Empir. Softw. Eng. 27, 4 (2022), 90. https://doi.org/10.1007/s10664-022-10131-8

  27. [27]

    Gunel Jahangirova, David Clark, Mark Harman, and Paolo Tonella. 2016. Test oracle assessment and improvement. In Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, Saarbrücken, Germany, July 18-20, 2016, Andreas Zeller and Abhik Roychoudhury (Eds.). ACM, 247–258. https://doi.org/10.1145/2931037.2931062

  28. [28]

    Junit. 2023. Junit4. Retrieved August 20, 2023 from https://junit.org/junit4/javadoc/4.13/org/junit/Assert.html

  29. [29]

    Junit. 2023. Junit5. Retrieved August 20, 2023 from https://junit.org/junit5/

  30. [30]

    Junit. 2023. Junit5 Assertions. Retrieved August 20, 2023 from https://junit.org/junit5/docs/5.0.3/api/org/junit/jupiter/ api/Assertions.html

  31. [31]

    Alexander Kampmann and Andreas Zeller. 2019. Carving parameterized unit tests. InProceedings of the 41st International Conference on Software Engineering: Companion Proceedings, ICSE 2019, Montreal, QC, Canada, May 25-31, 2019, Joanne M. Atlee, Tevfik Bultan, and Jon Whittle (Eds.). IEEE / ACM, 248–249. https://doi.org/10.1109/ICSE-COMPANION.2019. 00098

  32. [32]

    Upulee Kanewala and James M. Bieman. 2013. Using machine learning techniques to detect metamorphic relations for programs without test oracles. In IEEE 24th International Symposium on Software Reliability Engineering, ISSRE 2013, Pasadena, CA, USA, November 4-7, 2013 . IEEE Computer Society, 1–10. https://doi.org/10.1109/ISSRE.2013.6698899

  33. [33]

    Bieman, and Asa Ben-Hur

    Upulee Kanewala, James M. Bieman, and Asa Ben-Hur. 2016. Predicting metamorphic relations for testing scientific software: a machine learning approach using graph kernels. Softw. Test. Verification Reliab. 26, 3 (2016), 245–269. https://doi.org/10.1002/stvr.1594

  34. [34]

    Yun Lin, You Sheng Ong, Jun Sun, Gordon Fraser, and Jin Song Dong. 2021. Graph-based seed object synthesis for search-based unit testing. In ESEC/FSE ’21: 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, August 23-28, 2021 , Diomidis Spinellis, Georgios Gousios, ACM Trans. So...

  35. [35]

    Porter, Gudjon Magnusson, and Christoph Schulze

    Mikael Lindvall, Adam A. Porter, Gudjon Magnusson, and Christoph Schulze. 2017. Metamorphic Model-Based Testing of Autonomous Systems. In 2nd IEEE/ACM International Workshop on Metamorphic Testing, MET@ICSE 2017, Buenos Aires, Argentina, May 22, 2017 . IEEE Computer Society, 35–41. https://doi.org/10.1109/MET.2017.6

  36. [36]

    Haoyang Ma, Qingchao Shen, Yongqiang Tian, Junjie Chen, and Shing-Chi Cheung. 2023. Fuzzing Deep Learning Compilers with HirGen. , 248–260 pages. https://doi.org/10.1145/3597926.3598053

  37. [37]

    Pingchuan Ma, Shuai Wang, and Jin Liu. 2020. Metamorphic Testing and Certified Mitigation of Fairness Violations in NLP Models. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI 2020 , Christian Bessiere (Ed.). ijcai.org, 458–465. https://doi.org/10.24963/ijcai.2020/64

  38. [38]

    OpenAI. 2023. ChatGPT. Retrieved August 20, 2023 from https://openai.com/blog/chatgpt

  39. [39]

    Oracle. 2023. Java Language Specification. Retrieved August 20, 2023 from https://docs.oracle.com/javase/specs/

  40. [40]

    Carlos Pacheco and Michael D. Ernst. 2007. Randoop: feedback-directed random testing for Java. In Companion to the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA 2007, October 21-25, 2007, Montreal, Quebec, Canada , Richard P. Gabriel, David F. Bacon, Cristina Videira Lopes, and Guy L. Steel...

  41. [41]

    Matteo Paltenghi and Michael Pradel. 2023. MorphQ: Metamorphic Testing of the Qiskit Quantum Computing Platform. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20, 2023 . IEEE, 2413–2424. https://doi.org/10.1109/ICSE48619.2023.00202

  42. [42]

    PITest. 2023. PITest. Retrieved August 20, 2023 from https://pitest.org/

  43. [43]

    Kun Qiu, Zheng Zheng, Tsong Yueh Chen, and Pak-Lok Poon. 2022. Theoretical and Empirical Analyses of the Effectiveness of Metamorphic Relation Composition. IEEE Trans. Software Eng. 48, 3 (2022), 1001–1017. https: //doi.org/10.1109/TSE.2020.3009698

  44. [44]

    John A Rice. 2006. Mathematical statistics and data analysis . Cengage Learning

  45. [45]

    Sergio Segura, Amador Durán, Javier Troya, and Antonio Ruiz Cortés. 2017. A Template-Based Approach to Describing Metamorphic Relations. In 2nd IEEE/ACM International Workshop on Metamorphic Testing, MET@ICSE 2017, Buenos Aires, Argentina, May 22, 2017 . IEEE Computer Society, 3–9. https://doi.org/10.1109/MET.2017.3

  46. [46]

    Sergio Segura, Gordon Fraser, Ana Belén Sánchez, and Antonio Ruiz Cortés. 2016. A Survey on Metamorphic Testing. IEEE Trans. Software Eng. 42, 9 (2016), 805–824. https://doi.org/10.1109/TSE.2016.2532875

  47. [47]

    Sergio Segura, José Antonio Parejo, Javier Troya, and Antonio Ruiz Cortés. 2018. Metamorphic testing of RESTful web APIs. In Proceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018 , Michel Chaudron, Ivica Crnkovic, Marsha Chechik, and Mark Harman (Eds.). ACM, 882. https://doi.org/10....

  48. [48]

    Chang-Ai Sun, An Fu, Pak-Lok Poon, Xiaoyuan Xie, Huai Liu, and Tsong Yueh Chen. 2021. METRIC$ˆ{+}$+: A Metamorphic Relation Identification Technique Based on Input Plus Output Domains. IEEE Trans. Software Eng. 47, 9 (2021), 1764–1785. https://doi.org/10.1109/TSE.2019.2934848

  49. [49]

    Chang-Ai Sun, Yiqiang Liu, Zuoyi Wang, and W. K. Chan. 2016. 𝜇MT: a data mutation directed metamorphic relation acquisition methodology. In Proceedings of the 1st International Workshop on Metamorphic Testing, MET@ICSE 2016, Austin, Texas, USA, May 16, 2016 . ACM, 12–18. https://doi.org/10.1145/2896971.2896974

  50. [50]

    Valerio Terragni, Gunel Jahangirova, Paolo Tonella, and Mauro Pezzè. 2020. Evolutionary improvement of assertion oracles. In ESEC/FSE ’20: 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual Event, USA, November 8-13, 2020 , Prem Devanbu, Myra B. Cohen, and Thomas Zimmermann (Eds.). ACM...

  51. [51]

    TestNG. 2023. TestNG. Retrieved August 20, 2023 from https://testng.org/doc/

  52. [52]

    Marri, Tao Xie, Nikolai Tillmann, and Jonathan de Halleux

    Suresh Thummalapenta, Madhuri R. Marri, Tao Xie, Nikolai Tillmann, and Jonathan de Halleux. 2011. Retrofitting Unit Tests for Parameterized Unit Testing. In Fundamental Approaches to Software Engineering - 14th International Conference, FASE 2011, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2011, Saarbrücken, G...

  53. [53]

    Yongqiang Tian, Shiqing Ma, Ming Wen, Yepang Liu, Shing-Chi Cheung, and Xiangyu Zhang. 2021. To what extent do DNN-based image classification models make unreliable inferences? Empir. Softw. Eng. 26, 4 (2021), 84. https: //doi.org/10.1007/s10664-021-09985-1

  54. [54]

    MR-Scout. 2023. MR-Scout. Retrieved August 20, 2023 from https://mr-scout.github.io

  55. [55]

    Shuai Wang and Zhendong Su. 2020. Metamorphic Object Insertion for Testing Object Detection Systems. (2020), 1053–1065. https://doi.org/10.1145/3324884.3416584

  56. [56]

    Ying Wang, Bihuan Chen, Kaifeng Huang, Bowen Shi, Congying Xu, Xin Peng, Yijian Wu, and Yang Liu. 2020. An Empirical Study of Usages, Updates and Risks of Third-Party Libraries in Java Projects. InIEEE International Conference on Software Maintenance and Evolution, ICSME 2020, Adelaide, Australia, September 28 - October 2, 2020 . IEEE, 35–45. ACM Trans. S...

  57. [57]

    Dongwei Xiao, Zhibo Liu, Yuanyuan Yuan, Qi Pang, and Shuai Wang. 2022. Metamorphic Testing of Deep Learning Compilers. Proc. ACM Meas. Anal. Comput. Syst. 6, 1 (2022), 15:1–15:28. https://doi.org/10.1145/3508035

  58. [58]

    Bo Zhang, Hongyu Zhang, Junjie Chen, Dan Hao, and Pablo Moscato. 2019. Automatic Discovery and Cleansing of Numerical Metamorphic Relations. In 2019 IEEE International Conference on Software Maintenance and Evolution, ICSME 2019, Cleveland, OH, USA, September 29 - October 4, 2019 . IEEE, 235–245. https://doi.org/10.1109/ICSME.2019.00035

  59. [59]

    Jie Zhang, Junjie Chen, Dan Hao, Yingfei Xiong, Bing Xie, Lu Zhang, and Hong Mei. 2014. Search-based inference of polynomial metamorphic relations. In ACM/IEEE International Conference on Automated Software Engineering, ASE ’14, Vasteras, Sweden - September 15 - 19, 2014 , Ivica Crnkovic, Marsha Chechik, and Paul Grünbacher (Eds.). ACM, 701–712. https://d...

  60. [60]

    Zhi Quan Zhou, Liqun Sun, Tsong Yueh Chen, and Dave Towey. 2020. Metamorphic Relations for Enhancing System Understanding and Use. IEEE Trans. Software Eng. 46, 10 (2020), 1120–1154. https://doi.org/10.1109/TSE.2018.2876433

  61. [61]

    Hengcheng Zhu, Lili Wei, Ming Wen, Yepang Liu, Shing-Chi Cheung, Qin Sheng, and Cui Zhou. 2020. MockSniffer: Characterizing and Recommending Mocking Decisions for Unit Tests. In 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, Melbourne, Australia, September 21-25, 2020 . IEEE, 436–447. https: //doi.org/10.1145/3324884.3...