Metamorphic Testing: A New Approach for Generating Next Test Cases

S.C. Cheung; S.M. Yiu; T.Y. Chen

arxiv: 2002.12543 · v1 · pith:WNQRMZNMnew · submitted 2020-02-28 · 💻 cs.SE

Metamorphic Testing: A New Approach for Generating Next Test Cases

T.Y. Chen , S.C. Cheung , S.M. Yiu This is my paper

classification 💻 cs.SE

keywords testsoftwareerrorscasestestingselectionsuccessfulbeen

0 comments

read the original abstract

In software testing, a set of test cases is constructed according to some predefined selection criteria. The software is then examined against these test cases. Three interesting observations have been made on the current artifacts of software testing. Firstly, an error-revealing test case is considered useful while a successful test case which does not reveal software errors is usually not further investigated. Whether these successful test cases still contain useful information for revealing software errors has not been properly studied. Secondly, no matter how extensive the testing has been conducted in the development phase, errors may still exist in the software [5]. These errors, if left undetected, may eventually cause damage to the production system. The study of techniques for uncovering software errors in the production phase is seldom addressed in the literature. Thirdly, as indicated by Weyuker in [6], the availability of test oracles is pragmatically unattainable in most situations. However, the availability of test oracles is generally assumed in conventional software testing techniques. In this paper, we propose a novel test case selection technique that derives new test cases from the successful ones. The selection aims at revealing software errors that are possibly left undetected in successful test cases which may be generated using some existing strategies. As such, the proposed technique augments the effectiveness of existing test selection strategies. The technique also helps uncover software errors in the production phase and can be used in the absence of test oracles.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 14 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A Set-Theoretic Approach to Detecting Logic Bugs in DBMS Inner Join Optimizations
cs.DB 2026-06 unverdicted novelty 7.0

JoinEquiv uses set-theoretic metamorphic relations to generate equivalent queries and detects 29 previously unknown logic bugs in inner join optimizations across MySQL, TiDB, DuckDB, and Percona, with 27 confirmed.
Detecting and Understanding Vulnerabilities in Fully Homomorphic Encryption Frameworks
cs.CR 2026-06 unverdicted novelty 7.0

HERTA is the first metamorphic-testing tool for FHE frameworks that found 21 previously unknown bugs across three industry frameworks, some already fixed by developers.
Tensor Algebraic Property Skeletons: Amplifying Property-Based Testing for AI Compilers
cs.SE 2026-06 unverdicted novelty 7.0

Propilot instantiates 20 tensor-algebra property skeletons into 4,579 executable PBTs for TVM, cutting redundancy 49% and surfacing semantic and numerical errors.
Social Bias in LLM-Generated Code: Benchmark and Mitigation
cs.SE 2026-05 unverdicted novelty 7.0

LLMs show up to 60.58% social bias in generated code; a new Fairness Monitor Agent cuts bias by 65.1% and raises functional correctness from 75.80% to 83.97%.
Capture the Flags: Family-Based Evaluation of Agentic LLMs via Semantics-Preserving Transformations
cs.SE 2026-02 unverdicted novelty 7.0

Agentic LLMs remain robust to renaming and insertion but degrade on composed transformations and deeper obfuscation in CTF tasks, enabled by a new Evolve-CTF tool for generating equivalent challenge families.
A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?
cs.SE 2025-11 unverdicted novelty 6.0

Student models distilled from code language models often fail to deeply mimic teachers, showing up to 62% behavioral discrepancies and 285% worse drops under attacks that accuracy metrics miss.
Constrained Co-evolutionary Metamorphic Differential Testing for Autonomous Systems with an Interpretability Approach
cs.SE 2025-09 unverdicted novelty 6.0

CoCoMagic applies constrained cooperative co-evolution to metamorphic and differential testing to find up to 287% more distinct behavioral divergences in an end-to-end ADS than baseline search methods.
Minimum Complete MR Subsets under Semantic-Mutation Fault Models: A Support-Set Domination Boundary
cs.SE 2026-06 unverdicted novelty 5.0

The paper establishes a support-set domination boundary governed by kill-signature heterogeneity that separates when class-level MR abstraction suffices from when mutant-level minimization is required, proving the Min...
Multi-Agent LLM-based Metamorphic Testing for REST APIs
cs.SE 2026-05 unverdicted novelty 5.0

ARMeta uses multi-agent LLMs to identify and execute metamorphic relations for REST API testing, showing complementary coverage to scenario-based baselines on two public applications.
Multi-Agent Specification-based Metamorphic Testing of FMU-Based Simulations
cs.SE 2026-05 unverdicted novelty 5.0

A multi-agent LLM workflow extracts Given-When-Then metamorphic relations from specifications to generate and run tests on FMU simulations, demonstrated on a lube oil cooling system FMU.
MetaRA: Metamorphic Robustness Assessment for Multimodal Large Language Model-based Visual Question Answering Systems
cs.CV 2026-05 unverdicted novelty 5.0

MetaRA applies metamorphic testing to VQA tasks and shows that MLLM models exhibit sensitivity to linguistic perturbations and superficial visual cues not detected by conventional accuracy benchmarks.
An Evaluation of Chat Safety Moderations in Roblox
cs.CY 2026-05 unverdicted novelty 5.0

Roblox's automated chat moderation fails to catch numerous unsafe messages involving grooming, sexualization of minors, bullying, violence, self-harm, and sensitive information sharing, with users evading detection th...
An Evaluation of Chat Safety Moderations in Roblox
cs.CY 2026-05 unverdicted novelty 5.0

Roblox's chat moderation system allows many unsafe messages involving grooming, sexualizing minors, bullying, harassment, violence, self-harm, and sharing sensitive information to go undetected, with users using evasi...
Assessing, Exploiting, and Mitigating Syntactic Robustness Failures in LLM-Based Code Generation
cs.SE 2024-04 unverdicted novelty 5.0

LLM code generation lacks syntactic robustness on math-formula prompts, but formula-reduction pre-processing raises it from 54.05% to 74.42%.