arxiv: 2604.26961 · v2 · submitted 2026-04-09 · 💻 cs.SE · cs.AI· cs.PL

Recognition: no theorem link

Static Program Slicing Using Language Models With Dataflow-Aware Pretraining and Constrained Decoding

Pengfei He , Shaowei Wang , Tse-Hsun (Peter) Chen , Muhammad Asaduzzaman

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:10 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.PL

keywords static program slicinglanguage modelsdata flow graphspretraining objectivesconstrained decodingcode dependenciessoftware engineeringseq2seq generation

0 comments

The pith

Static program slicing becomes more accurate when language models receive dataflow-aware pretraining and constrained decoding.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to demonstrate that language models can automate static program slicing more reliably by learning precise data dependencies from code structure rather than guessing them from text alone. It does this by adding pretraining objectives that use data flow graphs to guide statement reordering and span corruption while keeping dependency relations intact. It further adds a decoding step that forces every generated token and statement to obey both lexical and syntactic rules of the target language. If these changes work, automated slicing tools would produce fewer incorrect or invented code fragments, making them more useful for debugging, testing, and code review.

Core claim

Sliceformer reformulates static program slicing as a sequence-to-sequence task and improves dependency modeling and hallucination control through two innovations: dataflow-aware pretraining objectives that leverage data flow graphs via dataflow-preserving statement permutation and dataflow-aware span corruption, plus a constrained decoding mechanism that enforces lexical and syntactic constraints during generation.

What carries the argument

Dataflow-aware pretraining on data flow graphs combined with lexical-syntactic constrained decoding inside a small seq2seq language model.

If this is right

The model captures variable dependencies more accurately across statements.
Generated slices contain fewer hallucinated tokens or statements.
ExactMatch scores rise by up to 22 percent on standard Java and Python slicing benchmarks.
Small language models become competitive for dependency-sensitive code analysis tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pretraining pattern could be applied to other code tasks that depend on data-flow understanding, such as bug localization.
Constrained decoding techniques may reduce errors in additional code-generation settings beyond slicing.
Combining program-analysis graphs more tightly with language-model training may further improve reliability on complex codebases.

Load-bearing premise

That dataflow-preserving statement permutation and dataflow-aware span corruption will cause the model to internalize precise data-flow relations, and that lexical-syntactic constrained decoding will remove hallucinations without preventing generation of correct slices on unseen programs.

What would settle it

Running the trained model on the Java or Python slicing benchmarks and finding no gain, or even a drop, in ExactMatch compared with standard CodeT5+ baselines.

Figures

Figures reproduced from arXiv: 2604.26961 by Muhammad Asaduzzaman, Pengfei He, Shaowei Wang, Tse-Hsun (Peter) Chen.

**Figure 2.** Figure 2: Illustration of dataflow-aware pre-training ob [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: TSED scores at statement boundaries for syn [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: The ablation analysis results highlight the con [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Workflow of constrained decoding [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: An erroneous example predicted by NS-slicer. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

read the original abstract

Static program slicing is a fundamental software engineering technique for isolating code relevant to specific variables. While recent learning-based approaches using language models (LMs) show promise in automating slice prediction, they suffer from inaccurate dependency modeling and unconstrained generation, where LMs fail to capture precise data flow relations and produce slices containing hallucinated tokens and statements. To address these challenges, we propose Sliceformer, a novel approach that reformulates static program slicing as a sequence-to-sequence task using small language models such as CodeT5+. Sliceformer introduces two key innovations that directly target the identified limitations. First, to improve dependency modeling, we design dataflow-aware pretraining objectives that leverage data flow graphs (DFG) to teach models data dependencies through dataflow-preserving statement permutation and dataflow-aware span corruption. Second, to eliminate hallucination, we develop a constrained decoding mechanism that enforces both lexical and syntactic constraints. We evaluate Sliceformer on Java and Python program slicing benchmarks, demonstrating consistent improvements over state-of-the-art baselines with up to 22% gain in ExactMatch.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper pairs DFG-based pretraining with constrained decoding for LM slicing, but the pretraining gives only indirect signals and the abstract lacks any experimental details to back the 22% claim.

read the letter

The main takeaway is that this paper tries to improve language model performance on static program slicing by pretraining with dataflow graph information and using constrained decoding, but the gains are hard to evaluate from what's presented. They reformulate slicing as seq2seq with CodeT5+, then add two things: pretraining objectives that use DFG for permutation of statements while preserving dataflow and span corruption aware of dataflow, and then constrained decoding to keep lexical and syntactic rules. This is a reasonable engineering step. The problems they target - poor dependency modeling and hallucinations - are real for these models on code tasks. What stands out is the concrete way they incorporate DFG into pretraining without needing full supervision. Permutation and corruption that respect the graph could help the model pick up on dependencies indirectly. But that's also the soft spot. These objectives don't force the model to learn exact data dependencies; they just avoid breaking them. Without some probe task after pretraining to check if it actually understands the flows, the slicing improvements could just be from better code understanding overall. The constrained decoding helps with syntax but not with semantic correctness on data flows. The evaluation claims up to 22% ExactMatch gain on Java and Python benchmarks over SOTA, but the abstract gives no numbers on dataset size, what the baselines are, whether they used statistical significance, or ablations for each component. That makes it tough to know if the claims hold. For a reader interested in applying LMs to software engineering tasks like slicing for debugging or analysis, this could be worth looking at once the full details are out. It seems like honest work trying to fix specific issues rather than just scaling up. I'd send it for peer review to get the experimental setup checked, though it might need revisions for more controls and analysis.

Referee Report

3 major / 1 minor

Summary. The paper proposes Sliceformer, a sequence-to-sequence reformulation of static program slicing using small LMs such as CodeT5+. It introduces dataflow-aware pretraining that leverages data-flow graphs (DFG) via dataflow-preserving statement permutation and dataflow-aware span corruption to improve dependency modeling, together with a constrained decoding step that enforces lexical and syntactic constraints to reduce hallucinations. The approach is evaluated on Java and Python slicing benchmarks and reports consistent gains, including up to 22% ExactMatch improvement over prior baselines.

Significance. If the reported gains prove robust and the pretraining objectives demonstrably instill precise data-flow relations rather than generic code patterns, the work would meaningfully advance automated program analysis by showing how structural DFG information can be injected into LMs for a core software-engineering task. The combination of DFG-guided pretraining and constrained decoding is a concrete, practical contribution that directly targets two well-known failure modes of LM-based slicing.

major comments (3)

[§3.1 (Pretraining Objectives)] The dataflow-preserving statement permutation and dataflow-aware span corruption supply only indirect supervision on dependencies. Without an intermediate probe (e.g., dependency-prediction accuracy on held-out code after pretraining only), the ExactMatch gains on the downstream slicing task could arise from improved general code modeling rather than internalization of precise data-flow relations. This distinction is load-bearing for the central claim that the pretraining “directly target[s] … inaccurate dependency modeling.”
[§3.2 (Constrained Decoding)] The constrained decoding mechanism enforces only lexical and syntactic constraints. Because slice correctness is ultimately a semantic property of data-flow preservation, the mechanism leaves open the possibility of data-flow hallucinations on unseen programs; the paper therefore does not yet establish that hallucination is eliminated.
[§5 (Experiments)] The evaluation reports “consistent improvements” and “up to 22% gain in ExactMatch” but supplies no experimental protocol, baseline definitions, dataset sizes, statistical tests, ablation results, or variance across runs. These omissions prevent verification of the empirical claims that underwrite the paper’s contribution.

minor comments (1)

[Abstract and §5] The abstract and method sections would benefit from an explicit statement of the ExactMatch metric definition and the precise baselines used, to aid immediate reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below with point-by-point responses and indicate planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [§3.1 (Pretraining Objectives)] The dataflow-preserving statement permutation and dataflow-aware span corruption supply only indirect supervision on dependencies. Without an intermediate probe (e.g., dependency-prediction accuracy on held-out code after pretraining only), the ExactMatch gains on the downstream slicing task could arise from improved general code modeling rather than internalization of precise data-flow relations. This distinction is load-bearing for the central claim that the pretraining “directly target[s] … inaccurate dependency modeling.”

Authors: We designed the pretraining objectives specifically around DFG structures to force the model to respect data dependencies during permutation and span reconstruction. While the supervision is indirect, the downstream slicing task directly requires precise dependency modeling, and the observed gains align with this intent. We agree that an explicit probe would better isolate the effect. In revision we will add a dependency-prediction probe on held-out code after pretraining only, reporting accuracy before and after the dataflow-aware objectives. revision: yes
Referee: [§3.2 (Constrained Decoding)] The constrained decoding mechanism enforces only lexical and syntactic constraints. Because slice correctness is ultimately a semantic property of data-flow preservation, the mechanism leaves open the possibility of data-flow hallucinations on unseen programs; the paper therefore does not yet establish that hallucination is eliminated.

Authors: The constrained decoding targets lexical and syntactic validity to block obvious token- and statement-level hallucinations, while the dataflow-aware pretraining targets semantic dependency accuracy. We acknowledge that the mechanism does not enforce semantic data-flow constraints and that our original phrasing (“eliminate hallucination”) overstates the guarantee. We will revise the text to state that it reduces hallucinations via structural constraints and will add a limitations paragraph noting that data-flow hallucinations remain possible on unseen programs. revision: partial
Referee: [§5 (Experiments)] The evaluation reports “consistent improvements” and “up to 22% gain in ExactMatch” but supplies no experimental protocol, baseline definitions, dataset sizes, statistical tests, ablation results, or variance across runs. These omissions prevent verification of the empirical claims that underwrite the paper’s contribution.

Authors: We agree that the experimental section requires substantially more detail for reproducibility and verification. In the revised manuscript we will expand §5 to include: the full training and evaluation protocol, precise baseline implementations and hyper-parameters, dataset sizes and train/validation/test splits, results of statistical significance tests, complete ablation tables, and mean ± standard deviation across at least three runs with different random seeds. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical gains measured on external benchmarks

full rationale

The paper reformulates slicing as seq2seq, introduces dataflow-aware pretraining (permutation and span corruption using DFG) plus constrained decoding, then reports ExactMatch gains on Java/Python benchmarks versus external baselines. No equations, fitted parameters, or self-citations appear in the provided text that reduce any claimed result to its own inputs by construction. The pretraining objectives are defined independently of the final slicing metric, and performance is assessed on held-out programs rather than being tautological. This is the standard non-circular empirical ML setup.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on two domain assumptions about the effectiveness of the proposed pretraining tasks and decoding constraints. No explicit free parameters or invented entities are named in the abstract.

free parameters (1)

pretraining task hyperparameters
Rates for statement permutation and span corruption are not specified but are required to define the pretraining objectives.

axioms (2)

domain assumption Dataflow-preserving permutation and dataflow-aware span corruption enable language models to learn precise data dependencies
Invoked when designing the dataflow-aware pretraining objectives.
domain assumption Lexical and syntactic constraints can be enforced during decoding without eliminating valid slices
Invoked when introducing the constrained decoding mechanism.

pith-pipeline@v0.9.0 · 5500 in / 1417 out tokens · 80226 ms · 2026-05-12T02:10:40.246349+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages · 1 internal anchor

[1]

In Software Engineering and Formal Methods: 20th International Conference, SEFM 2022, Berlin, Germany, September 26–30, 2022, Proceedings, page 146–151, Berlin, Heidelberg

A program slicer for java (tool paper). In Software Engineering and Formal Methods: 20th International Conference, SEFM 2022, Berlin, Germany, September 26–30, 2022, Proceedings, page 146–151, Berlin, Heidelberg. Springer-Verlag. Linyuan Gong, Mostafa Elhoushi, and Alvin Cheung

work page 2022
[2]

GraphCodeBERT: Pre-training Code Representations with Data Flow

Ast-t5: structure-aware pretraining for code generation and understanding. In Proceedings of the 41st International Conference on Machine Learning, pages 15839–15853. The MiST group. 2022. slicer. https://github.com/ mistupv/JavaSlicer. Daya Guo, Shuo Ren, Shuai Lu, Zhangyin Feng, Duyu Tang, Shujie Liu, Long Zhou, Nan Duan, Jian Yin, Daxin Jiang, and M. Z...

work page internal anchor Pith review arXiv 2022
[3]

Vuldeepecker: A deep learning-based system for vulnerability detection,

Vuldeepecker: A deep learning-based sys- tem for vulnerability detection. arXiv preprint arXiv:1801.01681. Fang Liu, Yang Liu, Lin Shi, Houkun Huang, Ruifeng Wang, Zhen Yang, Li Zhang, Zhongqi Li, and Yuchi Ma. 2024. Exploring and evaluating hallucinations in llm-powered code generation. arXiv preprint arXiv:2404.00971. Deep Mind. 2024. Gemma 2: Improving...

work page arXiv 2024