pith. sign in

arxiv: 2605.19412 · v1 · pith:EWUU4HMRnew · submitted 2026-05-19 · 💻 cs.SE · cs.PL

DRReduce: Enhancing Syntax-Guided Program Reduction with Dependency Reconstruction

Pith reviewed 2026-05-20 04:44 UTC · model grok-4.3

classification 💻 cs.SE cs.PL
keywords program reductionsyntax-guided reductiondependency reconstructiontest case minimizationsoftware debugginglanguage-agnostic toolsC programsJava programs
0
0 comments X

The pith

DRReduce repairs broken semantic dependencies after syntax deletions to boost acceptance rates and shrink programs more effectively than pure syntax-guided reducers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Program reduction simplifies large failure-inducing code into minimal test cases, but syntax-guided tools like Perses often reject deletions that break semantic dependencies, forcing costly backtracking. DRReduce adds a lightweight semantic layer that builds a dependency graph from the input program and reconstructs dependencies after each deletion to keep intermediate versions valid for the property checker. This hybrid approach then hands off to standard syntax-guided minimization. The result is substantially smaller final programs and shorter runtimes than other language-agnostic methods, while matching the quality of language-specific tools such as CReduce without any hardcoded language rules. Readers care because the technique makes general, grammar-based reduction practical for debugging across many languages.

Core claim

DRReduce constructs a semantic dependency graph from the input program, performs semantically coherent deletions with dependency reconstruction to preserve the semantic validity of intermediate programs, and delegates further minimization to a syntax-guided reducer. On real-world bug-triggering programs this yields average size reductions of 51.9 percent over Perses, 14.9 percent over WDD, and 19.8 percent over CDD while finishing faster on most cases; it produces results comparable to CReduce and Latra yet runs 3.3 times faster than CReduce and 1.2 times faster than Latra on average.

What carries the argument

Dependency reconstruction, which repairs program dependencies broken by a deletion in order to preserve the semantic validity of intermediate programs and increase the acceptance rate of the property checker.

If this is right

  • DRReduce produces final reduced programs that are on average more than 50 percent smaller than those from Perses.
  • The method finishes reduction faster than Perses, WDD, and CDD on the majority of tested programs.
  • Dependency reconstruction cuts the number of property-checker queries by roughly 80 percent and total reduction time by about 59 percent.
  • DRReduce reaches reduction quality comparable to language-specific tools while remaining fully language-agnostic.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same dependency-reconstruction idea could be inserted into other syntax-driven transformation tasks such as code simplification or refactoring pipelines.
  • Extending the dependency-graph construction to additional languages would directly test whether the efficiency gains generalize beyond C and Java.
  • If the overhead of graph construction remains low on very large inputs, the technique might enable reduction of programs that current syntax-only reducers cannot finish in reasonable time.

Load-bearing premise

A lightweight semantic dependency graph can be built from the input program and dependency reconstruction can reliably restore semantic validity after deletions without introducing new failures or excessive overhead.

What would settle it

Applying DRReduce and Perses to the same collection of real bug-triggering programs and observing that DRReduce does not produce smaller final token counts or require fewer property-checker queries on average would falsify the performance claims.

Figures

Figures reproduced from arXiv: 2605.19412 by Peng Liang, Qiong Feng, Wei Song, Xiaotian Ma, Yongqiang Tian.

Figure 2
Figure 2. Figure 2: The overall reduction result of WDD and DRReduce. effectiveness and efficiency compared to other language-agnostic methods, it still exhibits two key limitations on this example. Limitation 1: WDD generates invalid intermediate results, which limits efficiency. During node deletion, operations must often follow a strict order: removing elements out of sequence can violate syntactic or semantic dependencies… view at source ↗
Figure 5
Figure 5. Figure 5: DRReduce’s overall workflow a clean, semantically coherent core, Stage 3 encounters far fewer query invocations and converges faster than it would on the original input. The two stages are complementary, but the majority of DRReduce’s effectiveness gain over prior reducers comes from Stage 2’s dependency reconstruction— the core contribution of this work — which performs reductions that no purely syntactic… view at source ↗
Figure 6
Figure 6. Figure 6: Ablation study of dependency reconstruction across all 28 bug-triggering programs (w: with dependency [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗
read the original abstract

Program reduction is a technique for simplifying large, failure-inducing programs into minimal reproducible test cases. Language-specific tools such as CReduce achieve strong performance by leveraging deep semantic knowledge of C/C++, but are tightly coupled to a single language family. Language-agnostic reducers such as Perses address this by applying syntax-guided search across any grammar, yet share a fundamental limitation: deleting a node or subtree in isolation often breaks semantic coherence causing the property checker to reject the deletion and forcing the reducer to backtrack, limiting overall reduction effectiveness and efficiency. In this paper, we propose DRReduce, a framework that bridges this gap by augmenting language-agnostic syntactic reduction with a lightweight semantic layer: dependency reconstruction, which repairs program dependencies broken by a deletion in order to preserve the semantic validity of intermediate programs and increase the acceptance rate of the property checker. DRReduce constructs a semantic dependency graph from the input program, performs semantically coherent deletions with dependency reconstruction, and delegates further minimization to a syntax-guided reducer. We implement DRReduce for C and Java and evaluate it on real-world bug-triggering programs. Compared to SOTA syntax-guided reducers, DRReduce achieves average size reductions of 51.9%, 14.9%, and 19.8% over Perses, WDD, and CDD respectively, while completing reduction faster on the majority of programs. Compared to language-specific tools, DRReduce achieves results comparable to CReduce and Latra without any language-specific transformation rules, at 3.3x and 1.2x higher efficiency than CReduce and Latra on average, respectively. An ablation study confirms that dependency reconstruction reduces query invocations by 80.2%, reduction time by 58.7%, and final token count by over 55.1%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces DRReduce, a framework that augments language-agnostic syntax-guided program reduction with dependency reconstruction. It constructs a semantic dependency graph from the input program, performs deletions followed by reconstruction to repair broken dependencies and preserve semantic validity of intermediate programs, then delegates further minimization to a syntax-guided reducer. The authors claim average size reductions of 51.9%, 14.9%, and 19.8% over Perses, WDD, and CDD respectively, faster completion on most programs, and results comparable to CReduce and Latra (at 3.3x and 1.2x higher efficiency) without language-specific rules; an ablation study reports 80.2% fewer queries, 58.7% less time, and over 55.1% smaller final token count.

Significance. If the results hold and reductions preserve original failure-inducing behavior, the work would be significant for bridging syntax-guided and language-specific reduction without custom rules, potentially improving minimal test-case generation for debugging across languages. The ablation directly attributes gains to the reconstruction component.

major comments (2)
  1. [Evaluation] Evaluation section: the abstract and reported results give concrete percentage improvements and ablation numbers, yet supply no details on the number of subject programs, selection criteria, variance, or statistical tests. This makes it impossible to assess whether the 51.9%, 14.9%, and 19.8% gains are robust.
  2. [Dependency reconstruction description] Dependency reconstruction description (abstract and §3): the central claim rests on reconstruction increasing property-checker acceptance while preserving the original bug-triggering condition. No section verifies that reconstructed programs still reproduce the reported failure; if reconstruction alters data/control flow enough to eliminate the trigger, the checker may accept a non-triggering program, producing invalid reductions.
minor comments (1)
  1. [Abstract] The abstract could more explicitly state the exact number and characteristics of the real-world bug-triggering programs used in the evaluation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve clarity and completeness.

read point-by-point responses
  1. Referee: [Evaluation] Evaluation section: the abstract and reported results give concrete percentage improvements and ablation numbers, yet supply no details on the number of subject programs, selection criteria, variance, or statistical tests. This makes it impossible to assess whether the 51.9%, 14.9%, and 19.8% gains are robust.

    Authors: We agree that the evaluation section would benefit from additional details to allow readers to assess robustness. The current manuscript reports results on real-world bug-triggering programs but does not explicitly state the exact count, selection criteria, variance, or statistical tests. In the revised version we will add these: the number of programs, how they were selected, standard deviations or ranges for the reduction metrics, and results of statistical significance tests such as the Wilcoxon signed-rank test. revision: yes

  2. Referee: [Dependency reconstruction description] Dependency reconstruction description (abstract and §3): the central claim rests on reconstruction increasing property-checker acceptance while preserving the original bug-triggering condition. No section verifies that reconstructed programs still reproduce the reported failure; if reconstruction alters data/control flow enough to eliminate the trigger, the checker may accept a non-triggering program, producing invalid reductions.

    Authors: We appreciate this concern regarding the validity of reductions. By design, DRReduce applies the property checker (which verifies reproduction of the original failure) after each reconstruction step; only programs that pass the checker are accepted for further reduction. This ensures that accepted programs continue to trigger the reported failure. However, the manuscript does not contain an explicit verification subsection or additional results confirming this for all cases. We will revise §3 to describe the post-reconstruction checker invocation more clearly and add a short verification paragraph in the evaluation section reporting that all final reduced programs reproduce the original failures. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on external baselines

full rationale

The paper proposes DRReduce as a syntactic reducer augmented by dependency reconstruction and validates it solely via direct experimental comparison against independent external tools (Perses, WDD, CDD, CReduce, Latra) on real-world bug-triggering programs. No equations, fitted parameters, or predictions are defined inside the paper and then re-used as outputs; the acceptance-rate and size-reduction numbers are measured quantities, not tautological re-statements of the method itself. The dependency-reconstruction step is described as an algorithmic addition whose benefit is quantified by ablation and cross-tool runs rather than by self-definition or self-citation chains.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical software-engineering contribution rather than a mathematical derivation. No free parameters, axioms, or invented entities are introduced in the abstract.

pith-pipeline@v0.9.0 · 5863 in / 1216 out tokens · 37561 ms · 2026-05-20T04:44:13.993628+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

  1. [1]

    Stefanos Chaliasos, Thodoris Sotiropoulos, Diomidis Spinellis, Arthur Gervais, Benjamin Livshits, and Dimitris Mitropoulos. 2022. Finding typing compiler bugs. InProceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI). ACM, 183–198

  2. [2]

    GNU Compiler Collection Contributors. 2026. Reporting Bugs. https://gcc.gnu.org/bugs/

  3. [3]

    Qiong Feng, Xiaotian Ma, Ziyuan Feng, Marat Akhin, Wei Song, and Peng Liang. 2025. Finding Compiler Bugs through Cross-Language Code Generator and Differential Testing.Proceedings of the ACM on Programming Languages9, OOPSLA2 (2025), 2843–2869

  4. [4]

    Golnaz Gharachorlu and Nick Sumner. 2021. Leveraging models to reduce test cases in software repositories. In Proceedings of the 18th IEEE/ACM International Conference on Mining Software Repositories (MSR). IEEE, 230–241

  5. [5]

    JetBrains. 2024. Program Structure Interface. https://plugins.jetbrains.com/docs/intellij/psi.html

  6. [6]

    Christian Gram Kalhauge and Jens Palsberg. 2019. Binary reduction of dependency graphs. InProceedings of the 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). ACM, 556–566

  7. [7]

    Christian Gram Kalhauge and Jens Palsberg. 2021. Logical bytecode reduction. InProceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation (PLDI). ACM, 1003–1016

  8. [8]

    Christopher Lidbury, Andrei Lascu, Nathan Chong, and Alastair F Donaldson. 2015. Many-core compiler fuzzing. ACM SIGPLAN Notices50, 6 (2015), 65–76

  9. [9]

    LLVM Project. 2026. How to Submit a Bug Report. https://llvm.org/docs/HowToSubmitABug.html

  10. [10]

    Ghassan Misherghi and Zhendong Su. 2006. HDD: Hierarchical delta debugging. InProceedings of the 28th International Conference on Software Engineering (ICSE). ACM, 142–151

  11. [11]

    Loi Ngo Duc Nguyen, Tahiatul Islam, Theron Wang, Sam Lenz, and Martin Kellogg. 2025. Static Program Reduction via Type-Directed Slicing.Proceedings of the ACM on Software Engineering2, ISSTA (2025), 2068–2090

  12. [12]

    Diego Trevi no Ferrer. 2019. LLVM-Reduce for testcase reduction. https://llvm.org/devmtg/2019-10/talk-abstracts. html#tech22

  13. [13]

    John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and Xuejun Yang. 2012. Test-case reduction for C compiler bugs. InProceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 335–346

  14. [14]

    Luyao Ren, Xing Zhang, Ziyue Hua, Yanyan Jiang, Xiao He, Yingfei Xiong, and Tao Xie. 2025. Validity-Preserving Delta Debugging via Generator Trace Reduction.ACM Transactions on Software Engineering and Methodology34, 3 (2025), 1–33

  15. [15]

    Daniil Stepanov, Marat Akhin, and Mikhail Belyaev. 2019. ReduKtor: How We Stopped Worrying About Bugs in Kotlin Compiler. InProceedings of the 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 317–326

  16. [16]

    Chengnian Sun, Vu Le, and Zhendong Su. 2016. Finding compiler bugs via live code mutation. InProceedings of the 31st ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). ACM, 849–863

  17. [17]

    Chengnian Sun, Yuanbo Li, Qirun Zhang, Tianxiao Gu, and Zhendong Su. 2018. Perses: Syntax-guided program reduction. InProceedings of the 40th International Conference on Software Engineering (ICSE). ACM, 361–371

  18. [18]

    Guancheng Wang, Ruobing Shen, Junjie Chen, Yingfei Xiong, and Lu Zhang. 2021. Probabilistic Delta Debugging. InProceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). ACM, 881–892. 22 Feng et al

  19. [19]

    Ohlsson, Björn Regnell, and Anders Wesslén

    Claes Wohlin, Per Runeson, Martin Höst, Magnus C. Ohlsson, Björn Regnell, and Anders Wesslén. 2012.Experimentation in Software Engineering. Springer

  20. [20]

    Zhenyang Xu, Yongqiang Tian, Mengxiao Zhang, and Chengnian Sun. 2025. Boosting Program Reduction with the Missing Piece of Syntax-Guided Transformation.Proceedings of the ACM on Programming Languages9, OOPSLA2 (2025), 86–112

  21. [21]

    Zhenyang Xu, Yongqiang Tian, Mengxiao Zhang, Jiarui Zhang, Puzhuo Liu, Yu Jiang, and Chengnian Sun. 2025. T-rec: Fine-grained language-agnostic program reduction guided by lexical syntax.ACM Transactions on Software Engineering and Methodology34, 2 (2025), 1–31

  22. [22]

    Zhenyang Xu, Yongqiang Tian, Mengxiao Zhang, Gaosen Zhao, Yu Jiang, and Chengnian Sun. 2023. Pushing the limit of 1-minimality of language-agnostic program reduction.Proceedings of the ACM on Programming Languages7, OOPSLA1 (2023), 636–664

  23. [23]

    Zhenyang Xu, Yiran Wang, Yongqiang Tian, Mengxiao Zhang, and Chengnian Sun. 2025. Latra: A Template-Based Language-Agnostic Transformation Framework for Effective Program Reduction. InProceedings of the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2274–2285

  24. [24]

    Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). ACM, 283–294

  25. [25]

    Andreas Zeller and Ralf Hildebrandt. 2002. Simplifying and isolating failure-inducing input.IEEE Transactions on Software Engineering28, 2 (2002), 183–200

  26. [26]

    Jiang Zhang, Shuai Wang, Manuel Rigger, Pinjia He, and Zhendong Su. 2021. SANRAZOR: Reducing Redundant Sanitizer Checks in C/C++ Programs. InProceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI). USENIX Association, 479–494

  27. [27]

    Mengxiao Zhang, Yongqiang Tian, Zhenyang Xu, Yiwen Dong, Shin Hwei Tan, and Chengnian Sun. 2024. LPR: Large language models-aided program reduction. InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA). ACM, 261–273

  28. [28]

    Mengxiao Zhang, Zhenyang Xu, Yongqiang Tian, Xinru Cheng, and Chengnian Sun. 2025. Toward a Better Under- standing of Probabilistic Delta Debugging. InProceedings of the 47th International Conference on Software Engineering (ICSE). ACM, 2024–2035

  29. [29]

    Mengxiao Zhang, Zhenyang Xu, Yongqiang Tian, Yu Jiang, and Chengnian Sun. 2023. PPR: Pairwise Program Reduction. InProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). ACM, 338–349

  30. [30]

    Xintong Zhou, Zhenyang Xu, Mengxiao Zhang, Yongqiang Tian, and Chengnian Sun. 2025. WDD: Weighted Delta Debugging. InProceedings of the 47th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 1592– 1603