pith. sign in

arxiv: 1907.07368 · v1 · pith:DY2T3LMXnew · submitted 2019-07-17 · 💻 cs.LO · cs.SE

Mutation Testing with Hyperproperties

Pith reviewed 2026-05-24 20:25 UTC · model grok-4.3

classification 💻 cs.LO cs.SE
keywords mutation testinghyperpropertiesmodel checkingtest case generationreactive systemsnon-deterministic modelsmutant killing
0
0 comments X

The pith

Hyperproperties formalize when a test kills a mutant so that model checkers can generate the tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines hyperproperties that relate multiple executions to capture the condition under which a test distinguishes a mutant from the original model. These definitions cover deterministic and non-deterministic reactive systems and require no changes when the model or mutant changes. Because the hyperproperties are universal, any existing hyperproperty model checker can be used to produce the killing tests. A reader cares if this turns mutation testing into an instance of a standard verification problem rather than a collection of ad-hoc algorithms.

Core claim

Mutant killing is expressed as hyperproperties that hold between traces of the original system and its mutant; the resulting formulas are universal over arbitrary reactive models and mutants, so an off-the-shelf hyperproperty model checker directly yields the test cases.

What carries the argument

Hyperproperties that relate executions of the system and its mutant to express the semantic condition for killing.

If this is right

  • Test generation for mutation testing reduces to a single model-checking call.
  • The same definitions and tool apply without modification to deterministic and non-deterministic models.
  • No separate algorithm is needed for each modeling language once the model is expressed in a language accepted by the checker.
  • Existing mutation tools can be wrapped by a hyperproperty checker to produce tests.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The approach could be combined with other hyperproperty specifications already used in verification, such as those for information flow.
  • Scalability would be limited by the state-space blow-up typical of hyperproperty checkers when the number of traces grows.
  • The method might extend naturally to mutants that alter timing or probabilistic behavior if the underlying hyperproperty logic supports those features.

Load-bearing premise

The hyperproperties exactly match the intended meanings of mutant killing for both deterministic and non-deterministic cases.

What would settle it

A concrete test produced by the hyperproperty checker that fails to kill its mutant, or a test known to kill a mutant that the checker does not produce.

Figures

Figures reproduced from arXiv: 1907.07368 by Andreas Fellner, Georg Weissenbacher, Mitra Tabaei Befrouei.

Figure 1
Figure 1. Figure 1: Beverage machine running example conditions predicate), and δ is a formula over I ∪ O ∪ X ∪ X 0 (the transition relation predicate), where X 0 = {x 0 | x ∈ X } is a set of primed variables representing the successor states. An input I, output O, state X, and successor state X0 , respectively, is a mapping of I, O, X , and X 0 , respectively, to values in a fixed domain that includes the elements > and ⊥ (r… view at source ↗
Figure 2
Figure 2. Figure 2: Tool Pipeline of our Experiments – Non-deterministic assignments. Non-deterministic choice over a finite set of ele￾ments {x 0 1 , . . . x0 n}, as provided by SMV [24], can readily be converted into a case￾switch construct over nd. More generally, explicit non-deterministic assignments x := ? to state variables x [25] can be controlled by assigning the value of nd to x. – Non-deterministic schedulers. Non-… view at source ↗
read the original abstract

We present a new method for model-based mutation-driven test case generation. Mutants are generated by making small syntactical modifications to the model or source code of the system under test. A test case kills a mutant if the behavior of the mutant deviates from the original system when running the test. In this work, we use hyperproperties-which allow to express relations between multiple executions-to formalize different notions of killing for both deterministic as well as non-deterministic models. The resulting hyperproperties are universal in the sense that they apply to arbitrary reactive models and mutants. Moreover, an off-the-shelf model checking tool for hyperproperties can be used to generate test cases. We evaluate our approach on a number of models expressed in two different modeling languages by generating tests using a state-of-the-art mutation testing tool.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a method for model-based mutation testing that formalizes notions of mutant killing (for both deterministic and non-deterministic reactive models) as hyperproperties. It claims these hyperproperties are universal (apply to arbitrary models and mutants) and that off-the-shelf hyperproperty model checkers can generate killing test cases. The approach is evaluated by generating tests on models expressed in two modeling languages using a state-of-the-art mutation testing tool.

Significance. If the hyperproperty encodings are shown to be faithful, the work supplies a semantic bridge between mutation testing and hyperproperty verification, allowing reuse of existing model checkers for test generation. This is a practical strength, as is the claimed universality and the concrete evaluation on multiple modeling languages.

major comments (2)
  1. [Formalization section] Formalization section (hyperproperty definitions for nondeterministic killing): the quantifier alternation over traces does not come with an explicit argument or lemma showing that it exactly captures all cases in which a nondeterministic mutant can still produce a matching execution under some resolution of nondeterminism. This is load-bearing for the universality claim and the reduction to model checking.
  2. [Evaluation section] Evaluation section: no quantitative data (e.g., number of mutants killed, test-case sizes, or comparison against baseline mutation tools) is supplied in the manuscript, so it is impossible to assess whether the hyperproperty-based generation is feasible or superior in practice.
minor comments (2)
  1. Notation for trace relations is introduced without a dedicated preliminary section; a small table summarizing the different killing hyperproperties would improve readability.
  2. [Abstract] The abstract states that the hyperproperties 'apply to arbitrary reactive models,' but the evaluation is limited to two specific languages; a sentence clarifying the scope of the universality claim would help.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to strengthen the formalization and evaluation sections.

read point-by-point responses
  1. Referee: [Formalization section] Formalization section (hyperproperty definitions for nondeterministic killing): the quantifier alternation over traces does not come with an explicit argument or lemma showing that it exactly captures all cases in which a nondeterministic mutant can still produce a matching execution under some resolution of nondeterminism. This is load-bearing for the universality claim and the reduction to model checking.

    Authors: We agree that an explicit lemma is needed to rigorously support the claim that the hyperproperty definitions exactly capture nondeterministic mutant killing. In the revised manuscript we will add a lemma (with proof sketch) establishing the equivalence between the quantified trace relation and the semantic definition of killing under all possible resolutions of nondeterminism. This will also reinforce the universality argument and the soundness of the reduction to hyperproperty model checking. revision: yes

  2. Referee: [Evaluation section] Evaluation section: no quantitative data (e.g., number of mutants killed, test-case sizes, or comparison against baseline mutation tools) is supplied in the manuscript, so it is impossible to assess whether the hyperproperty-based generation is feasible or superior in practice.

    Authors: The manuscript currently reports a qualitative evaluation across two modeling languages. To address the concern, the revised version will include quantitative results from the experiments (number of mutants killed, generated test-case sizes, and runtime) together with a comparison against the baseline mutation-testing tool used in the study. revision: yes

Circularity Check

0 steps flagged

No circularity: hyperproperty formalization builds on external model-checking tools

full rationale

The paper's core contribution is the definition of hyperproperties (in HyperLTL or similar) to encode mutant-killing relations for deterministic and nondeterministic models, followed by reduction to off-the-shelf model checkers for test generation. No equations, definitions, or claims in the provided abstract or description reduce by construction to fitted parameters, self-citations, or renamed inputs; the approach treats hyperproperties as an independent formalism applied to mutation testing. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on the abstract alone, no free parameters, axioms, or invented entities are specified. The approach relies on the prior existence of hyperproperties and hyperproperty model checkers from the literature.

pith-pipeline@v0.9.0 · 5660 in / 1170 out tokens · 43410 ms · 2026-05-24T20:25:31.583083+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages

  1. [1]

    https://git-service.ait.ac.at/sct-dse-public/ mutation-testing-with-hyperproperties

    Mutation testing with hyperproperies benchmark models. https://git-service.ait.ac.at/sct-dse-public/ mutation-testing-with-hyperproperties . Uploaded: 2019-04-25

  2. [2]

    Aichernig, H

    B. Aichernig, H. Brandl, E. J ¨obstl, W. Krenn, R. Schlick, and S. Tiran. MoMuT::UML model-based mutation testing for UML. In Software Testing, Verification and Validation (ICST), 2015 IEEE 8th International Conference on, ICST, pages 1–8, April 2015

  3. [3]

    Aichernig, Harald Brandl, Elisabeth J ¨obstl, Willibald Krenn, Rupert Schlick, and Stefan Tiran

    Bernhard K. Aichernig, Harald Brandl, Elisabeth J ¨obstl, Willibald Krenn, Rupert Schlick, and Stefan Tiran. Killing strategies for model-based mutation testing. Softw. Test., Verif. Reliab., 25(8):716–748, 2015

  4. [4]

    Aichernig, Elisabeth J ¨obstl, and Stefan Tiran

    Bernhard K. Aichernig, Elisabeth J ¨obstl, and Stefan Tiran. Model-based mutation testing via symbolic refinement checking. 2014

  5. [5]

    Using mutation to assess fault detection capability of model review

    Paolo Arcaini, Angelo Gargantini, and Elvinia Riccobene. Using mutation to assess fault detection capability of model review. Softw. Test., Verif. Reliab., 25(5-7):629–652, 2015

  6. [6]

    Nuseen: A tool framework for the nusmv model checker

    Paolo Arcaini, Angelo Gargantini, and Elvinia Riccobene. Nuseen: A tool framework for the nusmv model checker. In 2017 IEEE International Conference on Software Testing, Verification and Validation, ICST 2017, Tokyo, Japan, March 13-17, 2017, pages 476–483. IEEE Computer Society, 2017

  7. [7]

    Sound and quasi-complete detection of infeasible test requirements

    S ´ebastien Bardin, Micka ¨el Delahaye, Robin David, Nikolai Kosmatov, Mike Papadakis, Yves Le Traon, and Jean-Yves Marion. Sound and quasi-complete detection of infeasible test requirements. In 8th IEEE International Conference on Software Testing, Verification and Validation, ICST 2015, Graz, Austria, April 13-17, 2015, pages 1–10, 2015

  8. [8]

    Efficient leveraging of sym- bolic execution to advanced coverage criteria

    S ´ebastien Bardin, Nikolai Kosmatov, and Franc ¸ois Cheynier. Efficient leveraging of sym- bolic execution to advanced coverage criteria. In Seventh IEEE International Conference on Software Testing, Verification and Validation, ICST 2014, March 31 2014-April 4, 2014, Cleveland, Ohio, USA, pages 173–182, 2014

  9. [9]

    AIGER 1.9 and beyond, 2011

    Armin Biere, Keijo Heljanko, and Siert Wieringa. AIGER 1.9 and beyond, 2011. Available at fmv.jku.at/hwmcc11/beyond1.pdf

  10. [10]

    Can a model checker generate tests for non-deterministic systems? Electronic Notes in Theoretical Computer Science, 190(2):3– 19, 2007

    Sergiy Boroday, Alexandre Petrenko, and Roland Groz. Can a model checker generate tests for non-deterministic systems? Electronic Notes in Theoretical Computer Science, 190(2):3– 19, 2007

  11. [11]

    Mutation analysis

    Timothy A Budd, Richard J Lipton, Richard A DeMillo, and Frederick G Sayward. Mutation analysis. Technical report, DTIC Document, 1979

  12. [12]

    Vl2mv: A compiler from verilog to blif-mv

    Szu-Tsung Cheng, Gary York, and Robert K Brayton. Vl2mv: A compiler from verilog to blif-mv. HSIS Distribution, 1993

  13. [13]

    Clarkson, Bernd Finkbeiner, Masoud Koleini, Kristopher K

    Michael R. Clarkson, Bernd Finkbeiner, Masoud Koleini, Kristopher K. Micinski, Markus N. Rabe, and C ´esar S´anchez. Temporal Logics for Hyperproperties, pages 265–284. Springer Berlin Heidelberg, Berlin, Heidelberg, 2014

  14. [14]

    Clarkson and Fred B

    Michael R. Clarkson and Fred B. Schneider. Hyperproperties. Journal of Computer Security, 18(6):1157–1210, 2010

  15. [15]

    Model-based, mutation-driven test case generation via heuristic-guided branching search

    Andreas Fellner, Willibald Krenn, Rupert Schlick, Thorsten Tarrach, and Georg Weis- senbacher. Model-based, mutation-driven test case generation via heuristic-guided branching search. In Jean-Pierre Talpin, Patricia Derler, and Klaus Schneider, editors,Formal Methods and Models for System Design (MEMOCODE), pages 56–66. ACM, 2017

  16. [16]

    Mghyper: Checking satisfiability of HyperLTL formulas beyond the∃∗∀∗ fragment

    Bernd Finkbeiner, Christopher Hahn, and Tobias Hans. Mghyper: Checking satisfiability of HyperLTL formulas beyond the∃∗∀∗ fragment. In Shuvendu K. Lahiri and Chao Wang, ed- itors, Automated Technology for Verification and Analysis (ATVA), volume 11138 of Lecture Notes in Computer Science, pages 521–527. Springer, 2018

  17. [17]

    Rabe, and C ´esar S ´anchez

    Bernd Finkbeiner, Markus N. Rabe, and C ´esar S ´anchez. Algorithms for model checking HyperLTL and HyperCTL∗. In Daniel Kroening and Corina S. P˘as˘areanu, editors, Computer Aided Verification (CAV), Lecture Notes in Computer Science, pages 30–48. Springer, 2015

  18. [18]

    Testing with model checkers: a survey

    Gordon Fraser, Franz Wotawa, and Paul E Ammann. Testing with model checkers: a survey. Software Testing, Verification and Reliability, 19(3):215–261, 2009

  19. [19]

    Using model checking to generate tests from requirements specifications

    Angelo Gargantini and Constance Heitmeyer. Using model checking to generate tests from requirements specifications. In ACM SIGSOFT Software Engineering Notes , volume 24, pages 146–162. Springer-Verlag, 1999

  20. [20]

    A temporal logic based theory of test coverage and generation

    Hyoung Seok Hong, Insup Lee, Oleg Sokolsky, and Hasan Ural. A temporal logic based theory of test coverage and generation. InInternational Conference on Tools and Algorithms for the Construction and Analysis of Systems, pages 327–341. Springer, 2002

  21. [21]

    William E. Howden. Weak mutation testing and completeness of test sets. IEEE Trans. Software Eng., 8(4):371–379, 1982

  22. [22]

    Reducing concurrent analysis under a context bound to se- quential analysis

    Akash Lal and Thomas Reps. Reducing concurrent analysis under a context bound to se- quential analysis. Formal Methods in System Design, 35(1):73–97, 2009

  23. [23]

    Generic and effective specification of structural test objectives

    Micha ¨el Marcozzi, Micka¨el Delahaye, S´ebastien Bardin, Nikolai Kosmatov, and Virgile Pre- vosto. Generic and effective specification of structural test objectives. In 2017 IEEE In- ternational Conference on Software Testing, Verification and Validation, ICST 2017, Tokyo, Japan, March 13-17, 2017, pages 436–441, 2017

  24. [24]

    The SMV system

    McMillan, Kenneth L. The SMV system. Technical Report CMU-CS-92-131, Carnegie Mellon University, 1992

  25. [25]

    A generalization of dijkstra’s calculus

    Greg Nelson. A generalization of dijkstra’s calculus. ACM Transactions on Programming Languages and Systems (TOPLAS), 11(4):517–561, October 1989

  26. [26]

    Jefferson Offutt

    A. Jefferson Offutt. Investigations of the software testing coupling effect. ACM Trans. Softw. Eng. Methodol., 1(1):5–20, 1992

  27. [27]

    Coverage based test-case generation using model checkers

    Sanjai Rayadurgam and Mats Per Erik Heimdahl. Coverage based test-case generation using model checkers. In Engineering of Computer Based Systems (ECBS) , pages 83–91. IEEE, 2001

  28. [28]

    Test generation with inputs, outputs and repetitive quiescence

    Jan Tretmans. Test generation with inputs, outputs and repetitive quiescence. Software - Concepts and Tools, 17(3):103–120, 1996

  29. [29]

    Test input generation with java pathfinder.ACM SIGSOFT Software Engineering Notes, 29(4):97–107, 2004

    Willem Visser, Corina S Psreanu, and Sarfraz Khurshid. Test input generation with java pathfinder.ACM SIGSOFT Software Engineering Notes, 29(4):97–107, 2004

  30. [30]

    Mualloy: a mutation testing frame- work for alloy

    Kaiyuan Wang, Allison Sullivan, and Sarfraz Khurshid. Mualloy: a mutation testing frame- work for alloy. In International Conference on Software Engineering: Companion (ICSE- Companion), pages 29–32. IEEE, 2018. A Appendix A.1 Verilog mutation operators Table 3: List of supported Verilog mutation operators (∗ marks bit-wise operations) Type Mutation Arith...

  31. [31]

    Π|=Sc(m) 2(¬mutπ) then p|I∪O∪X∈T (S)

  32. [32]

    Π|=Sc(m) 2(mutπ) then p|I∪O∪X ∈T (Sm)

  33. [33]

    Π|=Sc(m) 2(⋀ i∈API iπ↔ iπ′)) then p|I = q|I

  34. [34]

    The first two statements follow directly from the definition of conditional mu- tants

    Π|=Sc(m) ♦(⋁ o∈APO ¬(oπ↔ oπ′)) then p|O̸= q|O Proof. The first two statements follow directly from the definition of conditional mu- tants. The latter two statements follow directly from the fact that API, APO uniquely characterize inputs and outputs. Proposition 6. Let the modelS with inputsI and outputsO be deterministic and the mutantSm be non-determinis...

  35. [35]

    2.T (Sc(m))|X+∪I∪O⊆T (D(Sc(m)))[1,∞]|X+∪I∪O

    D(Sc(m)) is deterministic (up to mut). 2.T (Sc(m))|X+∪I∪O⊆T (D(Sc(m)))[1,∞]|X+∪I∪O

  36. [36]

    D(Sc(m))̸|= φ1(I,O) thenSm is equivalent. Proof. We show D(Sc(m)) is deterministic (up to mut). D(Sc(m)) has a unique (up to mut) initial state Xτ and initial output Oε, since we fix D(αc(m)) to be satisfiable by exactly this state and output. Assume that there are state X, successor X′ (both with respect to X∪{ mut}∪ {xτ}), inputs I (with respect toI∪{ nd}...