Mutation Testing with Hyperproperties
Pith reviewed 2026-05-24 20:25 UTC · model grok-4.3
The pith
Hyperproperties formalize when a test kills a mutant so that model checkers can generate the tests.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Mutant killing is expressed as hyperproperties that hold between traces of the original system and its mutant; the resulting formulas are universal over arbitrary reactive models and mutants, so an off-the-shelf hyperproperty model checker directly yields the test cases.
What carries the argument
Hyperproperties that relate executions of the system and its mutant to express the semantic condition for killing.
If this is right
- Test generation for mutation testing reduces to a single model-checking call.
- The same definitions and tool apply without modification to deterministic and non-deterministic models.
- No separate algorithm is needed for each modeling language once the model is expressed in a language accepted by the checker.
- Existing mutation tools can be wrapped by a hyperproperty checker to produce tests.
Where Pith is reading between the lines
- The approach could be combined with other hyperproperty specifications already used in verification, such as those for information flow.
- Scalability would be limited by the state-space blow-up typical of hyperproperty checkers when the number of traces grows.
- The method might extend naturally to mutants that alter timing or probabilistic behavior if the underlying hyperproperty logic supports those features.
Load-bearing premise
The hyperproperties exactly match the intended meanings of mutant killing for both deterministic and non-deterministic cases.
What would settle it
A concrete test produced by the hyperproperty checker that fails to kill its mutant, or a test known to kill a mutant that the checker does not produce.
Figures
read the original abstract
We present a new method for model-based mutation-driven test case generation. Mutants are generated by making small syntactical modifications to the model or source code of the system under test. A test case kills a mutant if the behavior of the mutant deviates from the original system when running the test. In this work, we use hyperproperties-which allow to express relations between multiple executions-to formalize different notions of killing for both deterministic as well as non-deterministic models. The resulting hyperproperties are universal in the sense that they apply to arbitrary reactive models and mutants. Moreover, an off-the-shelf model checking tool for hyperproperties can be used to generate test cases. We evaluate our approach on a number of models expressed in two different modeling languages by generating tests using a state-of-the-art mutation testing tool.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a method for model-based mutation testing that formalizes notions of mutant killing (for both deterministic and non-deterministic reactive models) as hyperproperties. It claims these hyperproperties are universal (apply to arbitrary models and mutants) and that off-the-shelf hyperproperty model checkers can generate killing test cases. The approach is evaluated by generating tests on models expressed in two modeling languages using a state-of-the-art mutation testing tool.
Significance. If the hyperproperty encodings are shown to be faithful, the work supplies a semantic bridge between mutation testing and hyperproperty verification, allowing reuse of existing model checkers for test generation. This is a practical strength, as is the claimed universality and the concrete evaluation on multiple modeling languages.
major comments (2)
- [Formalization section] Formalization section (hyperproperty definitions for nondeterministic killing): the quantifier alternation over traces does not come with an explicit argument or lemma showing that it exactly captures all cases in which a nondeterministic mutant can still produce a matching execution under some resolution of nondeterminism. This is load-bearing for the universality claim and the reduction to model checking.
- [Evaluation section] Evaluation section: no quantitative data (e.g., number of mutants killed, test-case sizes, or comparison against baseline mutation tools) is supplied in the manuscript, so it is impossible to assess whether the hyperproperty-based generation is feasible or superior in practice.
minor comments (2)
- Notation for trace relations is introduced without a dedicated preliminary section; a small table summarizing the different killing hyperproperties would improve readability.
- [Abstract] The abstract states that the hyperproperties 'apply to arbitrary reactive models,' but the evaluation is limited to two specific languages; a sentence clarifying the scope of the universality claim would help.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to strengthen the formalization and evaluation sections.
read point-by-point responses
-
Referee: [Formalization section] Formalization section (hyperproperty definitions for nondeterministic killing): the quantifier alternation over traces does not come with an explicit argument or lemma showing that it exactly captures all cases in which a nondeterministic mutant can still produce a matching execution under some resolution of nondeterminism. This is load-bearing for the universality claim and the reduction to model checking.
Authors: We agree that an explicit lemma is needed to rigorously support the claim that the hyperproperty definitions exactly capture nondeterministic mutant killing. In the revised manuscript we will add a lemma (with proof sketch) establishing the equivalence between the quantified trace relation and the semantic definition of killing under all possible resolutions of nondeterminism. This will also reinforce the universality argument and the soundness of the reduction to hyperproperty model checking. revision: yes
-
Referee: [Evaluation section] Evaluation section: no quantitative data (e.g., number of mutants killed, test-case sizes, or comparison against baseline mutation tools) is supplied in the manuscript, so it is impossible to assess whether the hyperproperty-based generation is feasible or superior in practice.
Authors: The manuscript currently reports a qualitative evaluation across two modeling languages. To address the concern, the revised version will include quantitative results from the experiments (number of mutants killed, generated test-case sizes, and runtime) together with a comparison against the baseline mutation-testing tool used in the study. revision: yes
Circularity Check
No circularity: hyperproperty formalization builds on external model-checking tools
full rationale
The paper's core contribution is the definition of hyperproperties (in HyperLTL or similar) to encode mutant-killing relations for deterministic and nondeterministic models, followed by reduction to off-the-shelf model checkers for test generation. No equations, definitions, or claims in the provided abstract or description reduce by construction to fitted parameters, self-citations, or renamed inputs; the approach treats hyperproperties as an independent formalism applied to mutation testing. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
https://git-service.ait.ac.at/sct-dse-public/ mutation-testing-with-hyperproperties
Mutation testing with hyperproperies benchmark models. https://git-service.ait.ac.at/sct-dse-public/ mutation-testing-with-hyperproperties . Uploaded: 2019-04-25
work page 2019
-
[2]
B. Aichernig, H. Brandl, E. J ¨obstl, W. Krenn, R. Schlick, and S. Tiran. MoMuT::UML model-based mutation testing for UML. In Software Testing, Verification and Validation (ICST), 2015 IEEE 8th International Conference on, ICST, pages 1–8, April 2015
work page 2015
-
[3]
Aichernig, Harald Brandl, Elisabeth J ¨obstl, Willibald Krenn, Rupert Schlick, and Stefan Tiran
Bernhard K. Aichernig, Harald Brandl, Elisabeth J ¨obstl, Willibald Krenn, Rupert Schlick, and Stefan Tiran. Killing strategies for model-based mutation testing. Softw. Test., Verif. Reliab., 25(8):716–748, 2015
work page 2015
-
[4]
Aichernig, Elisabeth J ¨obstl, and Stefan Tiran
Bernhard K. Aichernig, Elisabeth J ¨obstl, and Stefan Tiran. Model-based mutation testing via symbolic refinement checking. 2014
work page 2014
-
[5]
Using mutation to assess fault detection capability of model review
Paolo Arcaini, Angelo Gargantini, and Elvinia Riccobene. Using mutation to assess fault detection capability of model review. Softw. Test., Verif. Reliab., 25(5-7):629–652, 2015
work page 2015
-
[6]
Nuseen: A tool framework for the nusmv model checker
Paolo Arcaini, Angelo Gargantini, and Elvinia Riccobene. Nuseen: A tool framework for the nusmv model checker. In 2017 IEEE International Conference on Software Testing, Verification and Validation, ICST 2017, Tokyo, Japan, March 13-17, 2017, pages 476–483. IEEE Computer Society, 2017
work page 2017
-
[7]
Sound and quasi-complete detection of infeasible test requirements
S ´ebastien Bardin, Micka ¨el Delahaye, Robin David, Nikolai Kosmatov, Mike Papadakis, Yves Le Traon, and Jean-Yves Marion. Sound and quasi-complete detection of infeasible test requirements. In 8th IEEE International Conference on Software Testing, Verification and Validation, ICST 2015, Graz, Austria, April 13-17, 2015, pages 1–10, 2015
work page 2015
-
[8]
Efficient leveraging of sym- bolic execution to advanced coverage criteria
S ´ebastien Bardin, Nikolai Kosmatov, and Franc ¸ois Cheynier. Efficient leveraging of sym- bolic execution to advanced coverage criteria. In Seventh IEEE International Conference on Software Testing, Verification and Validation, ICST 2014, March 31 2014-April 4, 2014, Cleveland, Ohio, USA, pages 173–182, 2014
work page 2014
-
[9]
Armin Biere, Keijo Heljanko, and Siert Wieringa. AIGER 1.9 and beyond, 2011. Available at fmv.jku.at/hwmcc11/beyond1.pdf
work page 2011
-
[10]
Sergiy Boroday, Alexandre Petrenko, and Roland Groz. Can a model checker generate tests for non-deterministic systems? Electronic Notes in Theoretical Computer Science, 190(2):3– 19, 2007
work page 2007
-
[11]
Timothy A Budd, Richard J Lipton, Richard A DeMillo, and Frederick G Sayward. Mutation analysis. Technical report, DTIC Document, 1979
work page 1979
-
[12]
Vl2mv: A compiler from verilog to blif-mv
Szu-Tsung Cheng, Gary York, and Robert K Brayton. Vl2mv: A compiler from verilog to blif-mv. HSIS Distribution, 1993
work page 1993
-
[13]
Clarkson, Bernd Finkbeiner, Masoud Koleini, Kristopher K
Michael R. Clarkson, Bernd Finkbeiner, Masoud Koleini, Kristopher K. Micinski, Markus N. Rabe, and C ´esar S´anchez. Temporal Logics for Hyperproperties, pages 265–284. Springer Berlin Heidelberg, Berlin, Heidelberg, 2014
work page 2014
-
[14]
Michael R. Clarkson and Fred B. Schneider. Hyperproperties. Journal of Computer Security, 18(6):1157–1210, 2010
work page 2010
-
[15]
Model-based, mutation-driven test case generation via heuristic-guided branching search
Andreas Fellner, Willibald Krenn, Rupert Schlick, Thorsten Tarrach, and Georg Weis- senbacher. Model-based, mutation-driven test case generation via heuristic-guided branching search. In Jean-Pierre Talpin, Patricia Derler, and Klaus Schneider, editors,Formal Methods and Models for System Design (MEMOCODE), pages 56–66. ACM, 2017
work page 2017
-
[16]
Mghyper: Checking satisfiability of HyperLTL formulas beyond the∃∗∀∗ fragment
Bernd Finkbeiner, Christopher Hahn, and Tobias Hans. Mghyper: Checking satisfiability of HyperLTL formulas beyond the∃∗∀∗ fragment. In Shuvendu K. Lahiri and Chao Wang, ed- itors, Automated Technology for Verification and Analysis (ATVA), volume 11138 of Lecture Notes in Computer Science, pages 521–527. Springer, 2018
work page 2018
-
[17]
Bernd Finkbeiner, Markus N. Rabe, and C ´esar S ´anchez. Algorithms for model checking HyperLTL and HyperCTL∗. In Daniel Kroening and Corina S. P˘as˘areanu, editors, Computer Aided Verification (CAV), Lecture Notes in Computer Science, pages 30–48. Springer, 2015
work page 2015
-
[18]
Testing with model checkers: a survey
Gordon Fraser, Franz Wotawa, and Paul E Ammann. Testing with model checkers: a survey. Software Testing, Verification and Reliability, 19(3):215–261, 2009
work page 2009
-
[19]
Using model checking to generate tests from requirements specifications
Angelo Gargantini and Constance Heitmeyer. Using model checking to generate tests from requirements specifications. In ACM SIGSOFT Software Engineering Notes , volume 24, pages 146–162. Springer-Verlag, 1999
work page 1999
-
[20]
A temporal logic based theory of test coverage and generation
Hyoung Seok Hong, Insup Lee, Oleg Sokolsky, and Hasan Ural. A temporal logic based theory of test coverage and generation. InInternational Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 327–341. Springer, 2002
work page 2002
-
[21]
William E. Howden. Weak mutation testing and completeness of test sets. IEEE Trans. Software Eng., 8(4):371–379, 1982
work page 1982
-
[22]
Reducing concurrent analysis under a context bound to se- quential analysis
Akash Lal and Thomas Reps. Reducing concurrent analysis under a context bound to se- quential analysis. Formal Methods in System Design, 35(1):73–97, 2009
work page 2009
-
[23]
Generic and effective specification of structural test objectives
Micha ¨el Marcozzi, Micka¨el Delahaye, S´ebastien Bardin, Nikolai Kosmatov, and Virgile Pre- vosto. Generic and effective specification of structural test objectives. In 2017 IEEE In- ternational Conference on Software Testing, Verification and Validation, ICST 2017, Tokyo, Japan, March 13-17, 2017, pages 436–441, 2017
work page 2017
-
[24]
McMillan, Kenneth L. The SMV system. Technical Report CMU-CS-92-131, Carnegie Mellon University, 1992
work page 1992
-
[25]
A generalization of dijkstra’s calculus
Greg Nelson. A generalization of dijkstra’s calculus. ACM Transactions on Programming Languages and Systems (TOPLAS), 11(4):517–561, October 1989
work page 1989
-
[26]
A. Jefferson Offutt. Investigations of the software testing coupling effect. ACM Trans. Softw. Eng. Methodol., 1(1):5–20, 1992
work page 1992
-
[27]
Coverage based test-case generation using model checkers
Sanjai Rayadurgam and Mats Per Erik Heimdahl. Coverage based test-case generation using model checkers. In Engineering of Computer Based Systems (ECBS) , pages 83–91. IEEE, 2001
work page 2001
-
[28]
Test generation with inputs, outputs and repetitive quiescence
Jan Tretmans. Test generation with inputs, outputs and repetitive quiescence. Software - Concepts and Tools, 17(3):103–120, 1996
work page 1996
-
[29]
Test input generation with java pathfinder.ACM SIGSOFT Software Engineering Notes, 29(4):97–107, 2004
Willem Visser, Corina S Psreanu, and Sarfraz Khurshid. Test input generation with java pathfinder.ACM SIGSOFT Software Engineering Notes, 29(4):97–107, 2004
work page 2004
-
[30]
Mualloy: a mutation testing frame- work for alloy
Kaiyuan Wang, Allison Sullivan, and Sarfraz Khurshid. Mualloy: a mutation testing frame- work for alloy. In International Conference on Software Engineering: Companion (ICSE- Companion), pages 29–32. IEEE, 2018. A Appendix A.1 Verilog mutation operators Table 3: List of supported Verilog mutation operators (∗ marks bit-wise operations) Type Mutation Arith...
work page 2018
-
[31]
Π|=Sc(m) 2(¬mutπ) then p|I∪O∪X∈T (S)
-
[32]
Π|=Sc(m) 2(mutπ) then p|I∪O∪X ∈T (Sm)
-
[33]
Π|=Sc(m) 2(⋀ i∈API iπ↔ iπ′)) then p|I = q|I
-
[34]
The first two statements follow directly from the definition of conditional mu- tants
Π|=Sc(m) ♦(⋁ o∈APO ¬(oπ↔ oπ′)) then p|O̸= q|O Proof. The first two statements follow directly from the definition of conditional mu- tants. The latter two statements follow directly from the fact that API, APO uniquely characterize inputs and outputs. Proposition 6. Let the modelS with inputsI and outputsO be deterministic and the mutantSm be non-determinis...
-
[35]
2.T (Sc(m))|X+∪I∪O⊆T (D(Sc(m)))[1,∞]|X+∪I∪O
D(Sc(m)) is deterministic (up to mut). 2.T (Sc(m))|X+∪I∪O⊆T (D(Sc(m)))[1,∞]|X+∪I∪O
-
[36]
D(Sc(m))̸|= φ1(I,O) thenSm is equivalent. Proof. We show D(Sc(m)) is deterministic (up to mut). D(Sc(m)) has a unique (up to mut) initial state Xτ and initial output Oε, since we fix D(αc(m)) to be satisfiable by exactly this state and output. Assume that there are state X, successor X′ (both with respect to X∪{ mut}∪ {xτ}), inputs I (with respect toI∪{ nd}...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.