Recognition: no theorem link
Static Program Slicing Using Language Models With Dataflow-Aware Pretraining and Constrained Decoding
Pith reviewed 2026-05-12 02:10 UTC · model grok-4.3
The pith
Static program slicing becomes more accurate when language models receive dataflow-aware pretraining and constrained decoding.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Sliceformer reformulates static program slicing as a sequence-to-sequence task and improves dependency modeling and hallucination control through two innovations: dataflow-aware pretraining objectives that leverage data flow graphs via dataflow-preserving statement permutation and dataflow-aware span corruption, plus a constrained decoding mechanism that enforces lexical and syntactic constraints during generation.
What carries the argument
Dataflow-aware pretraining on data flow graphs combined with lexical-syntactic constrained decoding inside a small seq2seq language model.
If this is right
- The model captures variable dependencies more accurately across statements.
- Generated slices contain fewer hallucinated tokens or statements.
- ExactMatch scores rise by up to 22 percent on standard Java and Python slicing benchmarks.
- Small language models become competitive for dependency-sensitive code analysis tasks.
Where Pith is reading between the lines
- The same pretraining pattern could be applied to other code tasks that depend on data-flow understanding, such as bug localization.
- Constrained decoding techniques may reduce errors in additional code-generation settings beyond slicing.
- Combining program-analysis graphs more tightly with language-model training may further improve reliability on complex codebases.
Load-bearing premise
That dataflow-preserving statement permutation and dataflow-aware span corruption will cause the model to internalize precise data-flow relations, and that lexical-syntactic constrained decoding will remove hallucinations without preventing generation of correct slices on unseen programs.
What would settle it
Running the trained model on the Java or Python slicing benchmarks and finding no gain, or even a drop, in ExactMatch compared with standard CodeT5+ baselines.
Figures
read the original abstract
Static program slicing is a fundamental software engineering technique for isolating code relevant to specific variables. While recent learning-based approaches using language models (LMs) show promise in automating slice prediction, they suffer from inaccurate dependency modeling and unconstrained generation, where LMs fail to capture precise data flow relations and produce slices containing hallucinated tokens and statements. To address these challenges, we propose Sliceformer, a novel approach that reformulates static program slicing as a sequence-to-sequence task using small language models such as CodeT5+. Sliceformer introduces two key innovations that directly target the identified limitations. First, to improve dependency modeling, we design dataflow-aware pretraining objectives that leverage data flow graphs (DFG) to teach models data dependencies through dataflow-preserving statement permutation and dataflow-aware span corruption. Second, to eliminate hallucination, we develop a constrained decoding mechanism that enforces both lexical and syntactic constraints. We evaluate Sliceformer on Java and Python program slicing benchmarks, demonstrating consistent improvements over state-of-the-art baselines with up to 22% gain in ExactMatch.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Sliceformer, a sequence-to-sequence reformulation of static program slicing using small LMs such as CodeT5+. It introduces dataflow-aware pretraining that leverages data-flow graphs (DFG) via dataflow-preserving statement permutation and dataflow-aware span corruption to improve dependency modeling, together with a constrained decoding step that enforces lexical and syntactic constraints to reduce hallucinations. The approach is evaluated on Java and Python slicing benchmarks and reports consistent gains, including up to 22% ExactMatch improvement over prior baselines.
Significance. If the reported gains prove robust and the pretraining objectives demonstrably instill precise data-flow relations rather than generic code patterns, the work would meaningfully advance automated program analysis by showing how structural DFG information can be injected into LMs for a core software-engineering task. The combination of DFG-guided pretraining and constrained decoding is a concrete, practical contribution that directly targets two well-known failure modes of LM-based slicing.
major comments (3)
- [§3.1 (Pretraining Objectives)] The dataflow-preserving statement permutation and dataflow-aware span corruption supply only indirect supervision on dependencies. Without an intermediate probe (e.g., dependency-prediction accuracy on held-out code after pretraining only), the ExactMatch gains on the downstream slicing task could arise from improved general code modeling rather than internalization of precise data-flow relations. This distinction is load-bearing for the central claim that the pretraining “directly target[s] … inaccurate dependency modeling.”
- [§3.2 (Constrained Decoding)] The constrained decoding mechanism enforces only lexical and syntactic constraints. Because slice correctness is ultimately a semantic property of data-flow preservation, the mechanism leaves open the possibility of data-flow hallucinations on unseen programs; the paper therefore does not yet establish that hallucination is eliminated.
- [§5 (Experiments)] The evaluation reports “consistent improvements” and “up to 22% gain in ExactMatch” but supplies no experimental protocol, baseline definitions, dataset sizes, statistical tests, ablation results, or variance across runs. These omissions prevent verification of the empirical claims that underwrite the paper’s contribution.
minor comments (1)
- [Abstract and §5] The abstract and method sections would benefit from an explicit statement of the ExactMatch metric definition and the precise baselines used, to aid immediate reproducibility.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. We address each major comment below with point-by-point responses and indicate planned revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3.1 (Pretraining Objectives)] The dataflow-preserving statement permutation and dataflow-aware span corruption supply only indirect supervision on dependencies. Without an intermediate probe (e.g., dependency-prediction accuracy on held-out code after pretraining only), the ExactMatch gains on the downstream slicing task could arise from improved general code modeling rather than internalization of precise data-flow relations. This distinction is load-bearing for the central claim that the pretraining “directly target[s] … inaccurate dependency modeling.”
Authors: We designed the pretraining objectives specifically around DFG structures to force the model to respect data dependencies during permutation and span reconstruction. While the supervision is indirect, the downstream slicing task directly requires precise dependency modeling, and the observed gains align with this intent. We agree that an explicit probe would better isolate the effect. In revision we will add a dependency-prediction probe on held-out code after pretraining only, reporting accuracy before and after the dataflow-aware objectives. revision: yes
-
Referee: [§3.2 (Constrained Decoding)] The constrained decoding mechanism enforces only lexical and syntactic constraints. Because slice correctness is ultimately a semantic property of data-flow preservation, the mechanism leaves open the possibility of data-flow hallucinations on unseen programs; the paper therefore does not yet establish that hallucination is eliminated.
Authors: The constrained decoding targets lexical and syntactic validity to block obvious token- and statement-level hallucinations, while the dataflow-aware pretraining targets semantic dependency accuracy. We acknowledge that the mechanism does not enforce semantic data-flow constraints and that our original phrasing (“eliminate hallucination”) overstates the guarantee. We will revise the text to state that it reduces hallucinations via structural constraints and will add a limitations paragraph noting that data-flow hallucinations remain possible on unseen programs. revision: partial
-
Referee: [§5 (Experiments)] The evaluation reports “consistent improvements” and “up to 22% gain in ExactMatch” but supplies no experimental protocol, baseline definitions, dataset sizes, statistical tests, ablation results, or variance across runs. These omissions prevent verification of the empirical claims that underwrite the paper’s contribution.
Authors: We agree that the experimental section requires substantially more detail for reproducibility and verification. In the revised manuscript we will expand §5 to include: the full training and evaluation protocol, precise baseline implementations and hyper-parameters, dataset sizes and train/validation/test splits, results of statistical significance tests, complete ablation tables, and mean ± standard deviation across at least three runs with different random seeds. revision: yes
Circularity Check
No circularity: empirical gains measured on external benchmarks
full rationale
The paper reformulates slicing as seq2seq, introduces dataflow-aware pretraining (permutation and span corruption using DFG) plus constrained decoding, then reports ExactMatch gains on Java/Python benchmarks versus external baselines. No equations, fitted parameters, or self-citations appear in the provided text that reduce any claimed result to its own inputs by construction. The pretraining objectives are defined independently of the final slicing metric, and performance is assessed on held-out programs rather than being tautological. This is the standard non-circular empirical ML setup.
Axiom & Free-Parameter Ledger
free parameters (1)
- pretraining task hyperparameters
axioms (2)
- domain assumption Dataflow-preserving permutation and dataflow-aware span corruption enable language models to learn precise data dependencies
- domain assumption Lexical and syntactic constraints can be enforced during decoding without eliminating valid slices
Reference graph
Works this paper leans on
-
[1]
A program slicer for java (tool paper). In Software Engineering and Formal Methods: 20th International Conference, SEFM 2022, Berlin, Germany, September 26–30, 2022, Proceedings, page 146–151, Berlin, Heidelberg. Springer-Verlag. Linyuan Gong, Mostafa Elhoushi, and Alvin Cheung
work page 2022
-
[2]
GraphCodeBERT: Pre-training Code Representations with Data Flow
Ast-t5: structure-aware pretraining for code generation and understanding. In Proceedings of the 41st International Conference on Machine Learning, pages 15839–15853. The MiST group. 2022. slicer. https://github.com/ mistupv/JavaSlicer. Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Jian Yin, Daxin Jiang, and M. Z...
work page internal anchor Pith review arXiv 2022
-
[3]
Vuldeepecker: A deep learning-based system for vulnerability detection,
Vuldeepecker: A deep learning-based sys- tem for vulnerability detection. arXiv preprint arXiv:1801.01681. Fang Liu, Yang Liu, Lin Shi, Houkun Huang, Ruifeng Wang, Zhen Yang, Li Zhang, Zhongqi Li, and Yuchi Ma. 2024. Exploring and evaluating hallucinations in llm-powered code generation. arXiv preprint arXiv:2404.00971. Deep Mind. 2024. Gemma 2: Improving...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.