pith. sign in

arxiv: 2607.00929 · v1 · pith:ZYUUYJF3new · submitted 2026-07-01 · 💻 cs.SE

Delta Debugging in the Absence of Test Oracles Through Metamorphic Testing

Pith reviewed 2026-07-02 08:28 UTC · model grok-4.3

classification 💻 cs.SE
keywords delta debuggingmetamorphic testingtest oracle probleminput minimizationproperty preservationsoftware testingoracle-deficient programs
0
0 comments X

The pith

Delta debugging can minimize inputs for programs without test oracles by using metamorphic testing to check property preservation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Delta debugging shrinks a program input while keeping a target property such as a failure, but the process stops without an oracle that decides whether each candidate still shows the property. Many real programs are oracle-deficient because their outputs have no easy way to judge correctness. DDMT builds a replacement test function from metamorphic relations that compare related inputs and decide preservation without any oracle. The method plugs this function into the standard delta debugging loop. Tests across 66 subjects in both oracle-available and oracle-deficient settings show the new function often matches or beats the original on reduction size and number of queries run.

Core claim

DDMT redesigns the test function inside delta debugging by establishing metamorphic relations that decide whether a candidate input still preserves the original property, then substitutes this oracle-independent function for the usual test function so the entire reduction procedure runs without access to any test oracle.

What carries the argument

The metamorphic testing-based test function that replaces the original test function and decides property preservation by checking relations between the original input and each reduced candidate.

If this is right

  • DDMT applies delta debugging to oracle-deficient programs where output correctness cannot be checked directly.
  • Reduction effectiveness measured by final input size is often preserved or improved compared with standard delta debugging.
  • Number of test queries required during reduction is often preserved or improved.
  • Proper configuration choices yield performance gains over the compared delta debugging approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Teams working on programs whose outputs resist direct checking could define domain-specific metamorphic relations and immediately gain automated input minimization.
  • The same substitution pattern might let other search-based debugging techniques operate without oracles.
  • If metamorphic relations are incomplete for some property, the reduced input may still contain irrelevant parts that a perfect oracle would have removed.

Load-bearing premise

Metamorphic relations can be written so the resulting test function correctly decides whether a reduced input still exhibits the target property.

What would settle it

A reduced input produced by DDMT that no longer exhibits the target property yet the metamorphic relation reports preservation, or a preserving input that the relation rejects.

Figures

Figures reproduced from arXiv: 2607.00929 by Mingyue Jiang, Tsong Yueh Chen, Yongqiang Tian.

Figure 1
Figure 1. Figure 1: The detailed procedure of applying ddmin to a faulty version of program printtokens. The test column reports the test outcome of individual inputs (the contents of which are highlighted in blue) in terms of passing or failing. size 3 after handling a sequence of 22 candidate inputs. The detailed minimization process of ddmin is shown in [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The procedure of DDMT for the motivating example. The [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Performance of DDMT with three different MRs on [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Detailed comparison results of DDMT and ddmin on the Siemens programs. (including the size, the number of queries, and the time cost) from ddmin and DDMT, respectively. A total of 1,577 comparisons were conducted, including 528 comparisons on ddmin and DDMT using the best MR, and 1,049 comparisons on ddmin and DDMT using the worst MR. We performed the Wilcoxon signed-rank test [29] to statistically compare… view at source ↗
read the original abstract

Delta debugging provides an automatic way to minimize a program input while preserving a certain property. However, its effectiveness fundamentally relies on the availability of test oracles to determine whether a reduced input still preserves the specific property. Consequently, the oracle problem substantially limits the applicability of existing delta debugging techniques, particularly for oracle-deficient programs where output correctness cannot be directly determined. To address this problem, this paper proposes a novel approach, DDMT, to enhance the applicability of delta debugging, especially facilitating its application to oracle-deficient programs. Our key insight is to redesign an oracle-independent test function and incorporate it into the reduction procedure of delta debugging such that the property-preservation validation can be accomplished without requiring a test oracle. To this end, DDMT employs the technique of metamorphic testing, which is a property-based and oracle-independent testing method. It establishes a metamorphic testing-based test function, using it as a replacement for the original test function adopted by delta debugging. The experiments evaluate DDMT on 66 subjects across both oracle-available and oracle-deficient scenarios, with different delta debugging approaches. The results positively confirm that DDMT can enhance the applicability of delta debugging while often preserving or improving reduction effectiveness and query efficiency. Furthermore, compared to the relevant delta debugging approaches, DDMT is also able to achieve performance improvements with proper configurations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes DDMT, a technique that substitutes a metamorphic-testing-based test function for the oracle-dependent test predicate in delta debugging. This allows input minimization to proceed in oracle-deficient settings by using user-defined metamorphic relations to check property preservation. Experiments on 66 subjects across oracle-available and oracle-deficient scenarios with multiple delta-debugging variants are reported to show that DDMT extends applicability while often preserving or improving reduction effectiveness and query efficiency.

Significance. If the metamorphic relations can be shown to faithfully proxy the target properties, the work would meaningfully broaden delta debugging to the large class of programs where oracles are unavailable, addressing a well-known practical barrier. The approach is conceptually direct and the experimental scale (66 subjects) is reasonable, but the absence of any demonstration that the chosen MRs correctly encode the intended properties limits the strength of the claimed improvements.

major comments (2)
  1. [Abstract] Abstract: the central claim that DDMT 'establishes a metamorphic testing-based test function' as a replacement rests on the unexamined premise that suitable metamorphic relations exist and correctly decide property preservation for arbitrary oracle-deficient properties; no argument, construction procedure, or validation that the MR returns true exactly when the reduced input still satisfies the original property is supplied.
  2. [Abstract] Abstract (experimental claim): the statement that DDMT 'often preserv[es] or improv[es] reduction effectiveness and query efficiency' on 66 subjects is presented without any metrics, statistical tests, or discussion of confounds, so the quantitative support for the central claim cannot be evaluated.
minor comments (1)
  1. The abstract would be clearer if it briefly indicated how the metamorphic relations are constructed or selected for the evaluated subjects.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that DDMT 'establishes a metamorphic testing-based test function' as a replacement rests on the unexamined premise that suitable metamorphic relations exist and correctly decide property preservation for arbitrary oracle-deficient properties; no argument, construction procedure, or validation that the MR returns true exactly when the reduced input still satisfies the original property is supplied.

    Authors: DDMT is designed as a framework that integrates user-defined metamorphic relations (MRs) into delta debugging, following the standard practice in metamorphic testing where MRs are constructed from domain knowledge of the target property. The paper's contribution is the redesign of the test function and its embedding in the reduction algorithm rather than a general procedure for constructing MRs for arbitrary properties (which would be infeasible without property-specific insight). We acknowledge that the abstract does not sufficiently qualify this assumption. We will revise the abstract to state explicitly that DDMT's correctness depends on the fidelity of the supplied MRs and that the work assumes users provide appropriate relations based on their understanding of the property. revision: yes

  2. Referee: [Abstract] Abstract (experimental claim): the statement that DDMT 'often preserv[es] or improv[es] reduction effectiveness and query efficiency' on 66 subjects is presented without any metrics, statistical tests, or discussion of confounds, so the quantitative support for the central claim cannot be evaluated.

    Authors: The abstract is intended as a concise summary; the full experimental section reports concrete metrics (reduction ratios, query counts), applies statistical tests such as the Wilcoxon signed-rank test for significance, and discusses potential confounds including subject diversity and MR selection. To address the concern directly, we will revise the abstract to include representative quantitative results and a brief reference to the statistical analysis performed. revision: yes

Circularity Check

0 steps flagged

No circularity; direct methodological substitution evaluated empirically.

full rationale

The paper proposes DDMT by replacing delta debugging's oracle-dependent test function with a metamorphic-testing-based predicate. No equations, fitted parameters, self-citations, or uniqueness theorems appear in the provided text. The central claim rests on the empirical results across 66 subjects rather than reducing by construction to its own definitions or inputs. This is a standard non-circular presentation of a technique substitution.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that metamorphic relations can be established for the target property and that they will serve as a faithful substitute for an oracle during reduction; no free parameters or invented entities are mentioned.

axioms (1)
  • domain assumption Metamorphic relations can be defined to capture the property-preservation decision without an oracle
    This premise is required for the metamorphic test function to replace the original oracle-based test function inside the delta-debugging procedure.

pith-pipeline@v0.9.1-grok · 5764 in / 1167 out tokens · 24116 ms · 2026-07-02T08:28:44.505241+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references

  1. [1]

    Simplifying and isolating failure-inducing input,

    A. Zeller and R. Hildebrandt, “Simplifying and isolating failure-inducing input,”IEEE Transactions on Software Engineering, vol. 28, no. 2, pp. 183–200, 2002

  2. [2]

    Zeller,Why programs fail: a guide to systematic debugging

    A. Zeller,Why programs fail: a guide to systematic debugging. Morgan Kaufmann, 2009

  3. [3]

    Simplifying and isolating failure-inducing input: A retrospective on delta debugging,

    A. Zeller and R. Hildebrandt, “Simplifying and isolating failure-inducing input: A retrospective on delta debugging,”IEEE Transactions on Software Engineering, vol. 51, no. 3, pp. 820–824, 2025

  4. [4]

    Isolating cause-effect chains from computer programs,

    A. Zeller, “Isolating cause-effect chains from computer programs,”ACM SIGSOFT Software Engineering Notes, vol. 27, no. 6, pp. 1–10, 2002

  5. [5]

    Minimizing reproduction of software failures,

    M. Burger and A. Zeller, “Minimizing reproduction of software failures,” inProceedings of the 2011 International Symposium on Software Testing and Analysis, 2011, pp. 221–231

  6. [6]

    Minimizing GUI event traces,

    L. Clapp, O. Bastani, S. Anand, and A. Aiken, “Minimizing GUI event traces,” inProceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, ser. FSE 2016, 2016, pp. 422–434

  7. [7]

    On the use of delta debugging to reduce recordings and facilitate debugging of web applications,

    M. Hammoudi, B. Burg, G. Bae, and G. Rothermel, “On the use of delta debugging to reduce recordings and facilitate debugging of web applications,” inProceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2015, 2015, pp. 333–344

  8. [8]

    C2d2: Extracting critical changes for real-world bugs with dependency-sensitive delta debugging,

    X. Song, Y . Wu, S. Liu, B. Chen, Y . Lin, and X. Peng, “C2d2: Extracting critical changes for real-world bugs with dependency-sensitive delta debugging,” inProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2024, pp. 300–312

  9. [9]

    Delta debugging microservice systems,

    X. Zhou, X. Peng, T. Xie, J. Sun, W. Li, C. Ji, and D. Ding, “Delta debugging microservice systems,” inProceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, ser. ASE

  10. [10]

    New York, NY , USA: Association for Computing Machinery, 2018, pp. 802–807

  11. [11]

    Delta debugging for llm-integrated systems,

    H.-N. Zhu, M. N. Mansur, M. Sch ¨af, Z. Chen, T. Lepoint, and W. Visser, “Delta debugging for llm-integrated systems,” inProceedings of the IEEE/ACM 48th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), 2026, pp. 70–80

  12. [12]

    HDD: Hierarchical delta debugging,

    G. Misherghi and Z. Su, “HDD: Hierarchical delta debugging,” inPro- ceedings of the 28th International Conference on Software Engineering, 2006, pp. 142–151

  13. [13]

    Automatically reducing tree- structured test inputs,

    S. Herfert, J. Patra, and M. Pradel, “Automatically reducing tree- structured test inputs,” inProceedings of the 32nd IEEE/ACM Interna- tional Conference on Automated Software Engineering, 2017, pp. 861 – 871

  14. [14]

    Perses: Syntax-guided program reduction,

    C. Sun, Y . Li, Q. Zhang, T. Gu, and Z. Su, “Perses: Syntax-guided program reduction,” inProceedings of the 40th International Conference on Software Engineering, 2018, pp. 361–371

  15. [15]

    Probabilistic delta debugging,

    G. Wang, R. Shen, J. Chen, Y . Xiong, and L. Zhang, “Probabilistic delta debugging,” inProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2021, pp. 881–892

  16. [16]

    Wdd: Weighted delta debugging,

    X. Zhou, Z. Xu, M. Zhang, Y . Tian, and C. Sun, “Wdd: Weighted delta debugging,” in2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), 2025, pp. 1592–1603

  17. [17]

    Validity-preserving delta debugging via generator trace reduction,

    L. Ren, X. Zhang, Z. Hua, Y . Jiang, X. He, Y . Xiong, and T. Xie, “Validity-preserving delta debugging via generator trace reduction,” ACM Transactions on Software Engineering and Methodology, vol. 34, no. 3, pp. 1–33, 2025

  18. [18]

    Coarse hierarchical delta debugging,

    R. Hodovan, ´A. Kiss, and T. Gyimothy, “Coarse hierarchical delta debugging,” in2017 IEEE International Conference on Software Main- tenance and Evolution (ICSME), Sep. 2017, pp. 194–203

  19. [19]

    Cause reduction: Delta debugging, even without bugs,

    A. Groce, M. A. Alipour, C. Zhang, Y . Chen, and J. Regehr, “Cause reduction: Delta debugging, even without bugs,”Software Testing, Ver- ification and Reliability, vol. 26, no. 1, pp. 40–68, Jan. 2016

  20. [20]

    Reduce before you localize: Delta-debugging and spectrum-based fault localization,

    A. Christi, M. L. Olson, M. A. Alipour, and A. Groce, “Reduce before you localize: Delta-debugging and spectrum-based fault localization,” in 2018 IEEE International Symposium on Software Reliability Engineer- ing Workshops (ISSREW), Oct 2018, pp. 184–191

  21. [21]

    The oracle problem in software testing: A survey,

    E. T. Barr, M. Harman, P. McMinn, M. Shahbaz, and S. Yoo, “The oracle problem in software testing: A survey,”IEEE Transactions on Software Engineering, vol. 41, no. 5, pp. 507–525, 2015

  22. [22]

    Metamorphic testing: A new approach for generating next test cases,

    T. Y . Chen, S. C. Cheung, and S. M. Yiu, “Metamorphic testing: A new approach for generating next test cases,” Department of Computer Science, Hong Kong University of Science and Technology, Hong Kong, Tech. Rep. HKUST-CS98-01, 1998

  23. [23]

    A survey on metamorphic testing,

    S. Segura, G. Fraser, A. B. Sanchez, and A. Ruiz-Cort ´es, “A survey on metamorphic testing,”IEEE Transactions on Software Engineering, vol. 42, no. 9, pp. 805–824, 2016

  24. [24]

    Metamorphic testing: A review of challenges and opportunities,

    T. Y . Chen, F.-C. Kuo, H. Liu, P.-L. Poon, D. Towey, T. H. Tse, and Z. Q. Zhou, “Metamorphic testing: A review of challenges and opportunities,” ACM Computing Surveys, vol. 51, no. 1, pp. 4:1–4:27, Jan. 2018

  25. [25]

    Metamorphic slice: An application in spectrum-based fault localization,

    X. Xie, W. E. Wong, T. Y . Chen, and B. W. Xu, “Metamorphic slice: An application in spectrum-based fault localization,”Information and Software Technology, vol. 55, no. 5, pp. 866–879, 2013

  26. [26]

    Semi-proving: An integrated method for program proving, testing and debugging,

    T. Y . Chen, T. H. Tse, and Z. Q. Zhou, “Semi-proving: An integrated method for program proving, testing and debugging,”IEEE Transactions on Software Engineering, vol. 37, no. 1, pp. 109 – 125, 2011

  27. [27]

    Input test suites for program repair: A novel construction method based on metamorphic relations,

    M. Jiang, T. Y . Chen, Z. Q. Zhou, and Z. Ding, “Input test suites for program repair: A novel construction method based on metamorphic relations,”IEEE Transactions on Reliability, vol. 70, no. 1, pp. 285– 303, 2021

  28. [28]

    Toward a better understanding of probabilistic delta debugging,

    M. Zhang, Z. Xu, Y . Tian, X. Cheng, and C. Sun, “Toward a better understanding of probabilistic delta debugging,” in2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), 2025, pp. 2024–2035

  29. [29]

    Compiler validation via equivalence modulo inputs,

    V . Le, M. Afshari, and Z. Su, “Compiler validation via equivalence modulo inputs,”ACM Sigplan Notices, vol. 49, no. 6, pp. 216–226, 2014

  30. [30]

    Con- tinuous variable analyses: T-test, mann–whitney, wilcoxin rank,

    M. D. Riina, C. Stambaugh, N. Stambaugh, and K. E. Huber, “Con- tinuous variable analyses: T-test, mann–whitney, wilcoxin rank,” in Translational radiation oncology. Elsevier, 2023, pp. 153–163

  31. [31]

    How effectively does metamorphic testing alleviate the oracle problem?

    H. Liu, F.-C. Kuo, D. Towey, and T. Y . Chen, “How effectively does metamorphic testing alleviate the oracle problem?”IEEE Transactions on Software Engineering, vol. 40, no. 1, pp. 4–22, 2013

  32. [32]

    Metamorphic relation generation: State of the art and research directions,

    R. Li, H. Liu, P.-L. Poon, D. Towey, C.-A. Sun, Z. Zheng, Z. Q. Zhou, and T. Y . Chen, “Metamorphic relation generation: State of the art and research directions,”ACM Transactions on Software Engineering and Methodology, vol. 34, no. 5, pp. 1–25, 2025

  33. [33]

    Metamorphic testing of deep learning compilers,

    D. Xiao, Z. Liu, Y . Yuan, Q. Pang, and S. Wang, “Metamorphic testing of deep learning compilers,”Proceedings of the ACM on Measurement and Analysis of Computing Systems, vol. 6, no. 1, pp. 1–28, 2022

  34. [34]

    Contextual understanding and im- provement of metamorphic testing in scientific software development,

    Z. Peng, U. Kanewala, and N. Niu, “Contextual understanding and im- provement of metamorphic testing in scientific software development,” inProceedings of the 15th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), 2021, pp. 1–6

  35. [35]

    Identifying implementation bugs in ma- chine learning based image classifiers using metamorphic testing,

    A. Dwarakanath, M. Ahuja, S. Sikand, R. M. Rao, R. J. C. Bose, N. Dubash, and S. Podder, “Identifying implementation bugs in ma- chine learning based image classifiers using metamorphic testing,” in Proceedings of the 27th ACM SIGSOFT international symposium on software testing and analysis, 2018, pp. 118–128

  36. [36]

    Metamor- phic testing for web system security,

    N. B. Chaleshtari, F. Pastore, A. Goknil, and L. C. Briand, “Metamor- phic testing for web system security,”IEEE Transactions on Software Engineering, vol. 49, no. 6, pp. 3430–3471, 2023

  37. [37]

    Qtran: Extending metamorphic-oracle based logical bug detection techniques for multiple-dbms dialect support,

    L. Lin, Q. Zhu, H. Chen, Z. Wang, R. Wu, and X. Xie, “Qtran: Extending metamorphic-oracle based logical bug detection techniques for multiple-dbms dialect support,”Proceedings of the ACM on Software Engineering, vol. 2, no. ISSTA, pp. 731–752, 2025

  38. [38]

    Modernizing hierarchical delta debugging,

    R. Hodov ´an and ´A. Kiss, “Modernizing hierarchical delta debugging,” inProceedings of the 7th International Workshop on Automating Test Case Design, Selection, and Evaluation, 2016, pp. 31–37

  39. [39]

    Hddr: a recursive variant of the hierarchical delta debugging algorithm,

    ´A. Kiss, R. Hodov´an, and T. Gyim´othy, “Hddr: a recursive variant of the hierarchical delta debugging algorithm,” inProceedings of the 9th ACM SIGSOFT International Workshop on Automating TEST Case Design, Selection, and Evaluation, 2018, pp. 16–22

  40. [40]

    LPR: Large language models-aided program reduction,

    M. Zhang, Y . Tian, Z. Xu, Y . Dong, S. H. Tan, and C. Sun, “LPR: Large language models-aided program reduction,” inProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2024, pp. 261–273

  41. [41]

    Fault-based testing in the absence of an oracle,

    T. Y . Chen, T. H. Tse, and Z. Q. Zhou, “Fault-based testing in the absence of an oracle,” inProceedings of the 25th Annual International Computer Software and Applications Conference (COMPSAC’01), 2001, pp. 172–178

  42. [42]

    A metamorphic testing approach for supporting program repair without the need for a test oracle,

    M. Jiang, T. Y . Chen, F.-C. Kuo, D. Towey, and Z. Ding, “A metamorphic testing approach for supporting program repair without the need for a test oracle,”Journal of Systems and Software, vol. 126, pp. 127–140, 2017

  43. [43]

    Enhance combinatorial testing with metamorphic relations,

    X. Niu, Y . Sun, H. Wu, G. Li, C. Nie, L. Yu, and X. Wang, “Enhance combinatorial testing with metamorphic relations,”IEEE Transactions on Software Engineering, vol. 48, no. 12, pp. 5007–5029, 2021

  44. [44]

    Mtgp: Combining metamorphic testing and genetic programming,

    D. Sobania, M. Briesch, P. R ¨ochner, and F. Rothlauf, “Mtgp: Combining metamorphic testing and genetic programming,” inEuropean Conference 13 on Genetic Programming (Part of EvoStar). Springer, 2023, pp. 324– 338

  45. [45]

    Experimental results. ddmt

    “Experimental results. ddmt.” [Online]. Available: https://github.com/ymxl85/DDMT