arxiv: 2603.05004 · v2 · submitted 2026-03-05 · 💻 cs.LG · cs.AI

Recognition: no theorem link

Poisoning the Inner Prediction Logic of Graph Neural Networks for Clean-Label Backdoor Attacks

Yuxiang Zhang , Bin Ma , Enyan Dai

Authors on Pith no claims yet

Pith reviewed 2026-05-15 16:12 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords graph neural networksbackdoor attacksclean-label attacksadversarial poisoningprediction logictrigger generationnode selectionGNN security

0 comments

The pith

Coordinating a poisoned node selector with a logic-poisoning trigger generator lets graph neural networks learn to treat triggers as dominant predictors while training labels remain unchanged.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that standard graph backdoor attacks fail under clean-label constraints because triggers never become important to the model's decisions when labels cannot be flipped. It shows that the inner prediction logic of GNNs can be altered directly by first selecting specific nodes to poison and then generating triggers that shift how the model weighs features during message passing. The resulting BA-Logic method produces models that classify any test node carrying the trigger as the target class. A reader should care because clean-label attacks match real deployment settings where labels are fixed by external sources and cannot be edited by the attacker.

Core claim

The central claim is that existing backdoor methods cannot succeed in clean-label settings because they leave the prediction logic unpoisoned, so triggers remain unimportant; coordinating a poisoned node selector and a logic-poisoning trigger generator solves this by making the triggers control the model's output on triggered test nodes without any label modification during training.

What carries the argument

BA-Logic, which pairs a poisoned node selector that identifies suitable training nodes with a logic-poisoning trigger generator that produces triggers capable of dominating the GNN's internal decision process.

If this is right

Attack success rates rise above those of prior graph backdoor methods when labels must stay fixed.
Triggers attached only at test time become the dominant factor in the model's output.
The poisoned logic persists through ordinary training without special detection steps.
The same coordination works across multiple real-world graph datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Defenses that rely solely on label-consistency checks would miss the attack entirely.
The same selector-generator pattern could be tested on other GNN tasks such as link prediction or graph classification.
Inspecting changes in node embedding importance or attention scores after training might reveal the poisoned logic before deployment.
Extending the approach to dynamic or heterogeneous graphs would test whether the logic-poisoning effect generalizes beyond static homogeneous cases.

Load-bearing premise

The inner prediction logic of a GNN can be altered independently of label changes so that a specific trigger reliably overrides normal predictions at test time.

What would settle it

Training a GNN with the proposed selector and generator on a standard dataset and measuring that triggered test nodes are not classified as the target class at rates higher than baseline clean-label attacks.

Figures

Figures reproduced from arXiv: 2603.05004 by Bin Ma, Enyan Dai, Yuxiang Zhang.

**Figure 1.** Figure 1: Illustration of graph backdoor attacks under both the [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: GNNExplainer’s visualization of important [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Framework of Ba-Logic. Our preliminary analysis reveals two key challenges for logic poisoning clean-label graph backdoor attacks: (i) How to select the poisoned nodes that are most effective in logic poisoning? (ii) How to efficiently compute the objective function of prediction logic poisoning to guide the training of the clean-label backdoor trigger generator? To overcome the above challenges, we prop… view at source ↗

**Figure 4.** Figure 4: Generalization ability of Ba-Logic. (a) ASR (%) with varying surrogate and target models on Arxiv. (b-c) ASR (%) on graph classification and edge prediction. Logic. We attribute this to the sampling of a subset of nodes, which dilutes the trigger’s impact. More analysis on how diverse sampling strategies affect Ba-Logic’s performance can be found in Appendix A.3. Generalization to More Tasks. As node class… view at source ↗

**Figure 5.** Figure 5: (a-b) Ablation studies on GCN and GIN. (c) Hyperparameter sensitivity analysis on Arxiv. [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Performance of Ba-Logic on sampling-based GNNs. A.4 Additional Results of Clean Accuracy Drop Analysis In the main text, [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: Clean accuracy of backdoored models. A.5 Additional Comparison with Cross-Domain Baselines In the main text of our work, we mainly focus on evaluating backdoor attacks in the graph domain. However, we found that the clean-label setting presents a shared challenge for both image and graph domains. And the representative works from the image domain solve unique challenges and make significant contributions i… view at source ↗

**Figure 8.** Figure 8: Hyperparameter sensitivity analysis of Ba-Logic. In the main text of our work, we introduce hyperparameters to regulate the magnitude of the logic poisoning loss during optimization. To thoroughly understand their impact, we further investigate how the main hyperparameters, i.e., T in Eq.(7) and β in Eq.(10), affect the performance of Ba-Logic. Specifically, T controls the expected margin of importance sco… view at source ↗

**Figure 9.** Figure 9: Comparison of IRT distribution. We further reveal that the attack success rate is bounded by the important rate of triggers, a novel metric proposed in our preliminary analysis. However, there is still a research question waiting to be addressed: Does Ba-Logic successfully poison the target GNNs’ prediction logic as designed? To answer this question, we employ GNNExplainer to measure trigger importance sco… view at source ↗

**Figure 10.** Figure 10: Impact of attack budget. In our preliminary analysis, we evaluate Ba-Logic and competitors by varying the size of poisoned nodes VP , which is also the attack budget of backdoor attack. Here, we further explore the attack performance with various attack budgets. Specifically, we vary the size of VP as {80, 160, 240, 320, 400, 480}, and record the results on Arxiv with GCN and GAT in [PITH_FULL_IMAGE:figu… view at source ↗

**Figure 11.** Figure 11: Performance of Ba-Logic on synthetic graphs with different feature-label correlations. In the main text of our work, we conducted a theoretical analysis to show that the core failure of existing methods under clean-label settings lies in that their triggers are deemed unimportant for prediction by GNN models. Specifically, we propose a novel metric, IRT, to measure the importance rate of triggers and esta… view at source ↗

**Figure 12.** Figure 12: Performance of Ba-Logic under sparse, noisy, and imbalanced label settings. From the figures, we obtain the following key insights: • ASR and IRT values are closely aligned for different h. It indicates that the theoretical analysis of triggers with a higher importance rate can achieve better attack performance remains valid for complex feature-structure correlations • Our Ba-Logic consistently achieves h… view at source ↗

**Figure 13.** Figure 13: The Jaccard overlap of poisoned node selection under different label settings. [PITH_FULL_IMAGE:figures/full_fig_p040_13.png] view at source ↗

**Figure 14.** Figure 14: Performance of Ba-Logic under noisy and partially accessible feature settings. I.2 Generalizability to Noisy and Partially Accessible Features In addition to the challenges posed by labels, node features in real-world graphs can be noisy or only partially accessible. To investigate the performance of Ba-Logic under degraded feature quality and accessibility, we further introduce two challenging settings o… view at source ↗

read the original abstract

Graph Neural Networks (GNNs) have achieved remarkable results in various tasks. Recent studies reveal that graph backdoor attacks can poison the GNN model to predict test nodes with triggers attached as the target class. However, apart from injecting triggers to training nodes, these graph backdoor attacks generally require altering the labels of trigger-attached training nodes into the target class, which is impractical in real-world scenarios. In this work, we focus on the clean-label graph backdoor attack, a realistic but understudied topic where training labels are not modifiable. According to our preliminary analysis, existing graph backdoor attacks generally fail under the clean-label setting. Our further analysis identifies that the core failure of existing methods lies in their inability to poison the prediction logic of GNN models, leading to the triggers being deemed unimportant for prediction. Therefore, we study a novel problem of effective clean-label graph backdoor attacks by poisoning the inner prediction logic of GNN models. We propose BA-Logic to solve the problem by coordinating a poisoned node selector and a logic-poisoning trigger generator. Extensive experiments on real-world datasets demonstrate that our method effectively enhances the attack success rate and surpasses state-of-the-art graph backdoor attack competitors under clean-label settings. Our code is available at https://anonymous.4open.science/r/BA-Logic

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BA-Logic shows a workable selector-generator coordination that lifts clean-label backdoor success on GNNs, but the evidence that it truly poisons inner prediction logic rather than exploiting node influence remains thin.

read the letter

The main point is that this paper tackles clean-label backdoor attacks on GNNs by trying to alter the model's internal prediction logic instead of just attaching triggers and changing labels. The BA-Logic method pairs a poisoned node selector with a trigger generator, and the experiments claim this beats prior clean-label attempts on real datasets by making triggers matter more at test time. That coordination is the new piece relative to earlier graph backdoor work, which the abstract says fails because triggers stay unimportant to the learned logic. The experiments and code release give it some practical value for anyone testing GNN robustness. The soft spot is the mechanism. The stress-test note is right to flag that success could come from the selector simply picking high-influence nodes where triggers have outsized effect due to structure or features, rather than any genuine change to how the model reasons. Without clear before-and-after checks like attention shifts or attribution maps tied specifically to the generator, it is hard to know if the logic-poisoning claim is load-bearing or mostly descriptive. The preliminary analysis that existing methods fail is plausible but would be stronger with more ablation detail on what the coordination actually changes. This is for people working on graph security and adversarial ML. It fills a realistic gap in clean-label settings, so it deserves referee time even if the mechanism needs tighter validation in revision.

Referee Report

3 major / 2 minor

Summary. The paper identifies that existing clean-label graph backdoor attacks on GNNs fail because they do not poison the models' inner prediction logic, leaving triggers unimportant to predictions. It proposes BA-Logic, which coordinates a poisoned node selector with a logic-poisoning trigger generator to alter this logic under clean labels, and reports that extensive experiments on real-world datasets yield higher attack success rates than prior methods.

Significance. If the central mechanism is mechanistically validated, the work would demonstrate a practical route to clean-label backdoor attacks that reliably make triggers dominate GNN predictions, exposing a previously under-appreciated vulnerability in graph models used for node classification and similar tasks.

major comments (3)

[Abstract and §3] Abstract and §3 (preliminary analysis): the assertion that existing methods fail specifically because they leave the prediction logic unpoisoned is presented without quantitative support such as trigger importance scores, attention weights, or gradient attributions comparing poisoned vs. clean models; this makes the diagnosis of the 'core failure' difficult to verify.
[§4] §4 (BA-Logic): the coordination of the poisoned node selector and logic-poisoning trigger generator is claimed to poison inner prediction logic independently of label changes, yet no before/after analysis (e.g., parameter inspection, feature attribution maps, or ablation isolating the generator from the selector) is described to rule out the alternative that success arises from selector bias toward high-influence nodes rather than genuine logic alteration.
[§5] §5 (experiments): while the abstract states that the method 'surpasses state-of-the-art' under clean-label settings, the provided description supplies no dataset sizes, model architectures, attack success rate tables, or ablation results that would allow assessment of whether the reported gains are robust or merely reflect particular graph structures.

minor comments (2)

[Abstract] The anonymous code link should be replaced with a permanent repository or removed if the review process requires reproducibility checks.
[§4] Notation for the trigger generator and selector should be introduced with explicit equations or pseudocode early in §4 to clarify their interaction.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our paper addressing clean-label backdoor attacks on GNNs via inner logic poisoning. We clarify the supporting analyses in Sections 3-5 and outline targeted revisions to strengthen the evidence for our claims.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (preliminary analysis): the assertion that existing methods fail specifically because they leave the prediction logic unpoisoned is presented without quantitative support such as trigger importance scores, attention weights, or gradient attributions comparing poisoned vs. clean models; this makes the diagnosis of the 'core failure' difficult to verify.

Authors: Section 3 presents a preliminary analysis demonstrating that prior clean-label methods yield low attack success rates because triggers do not influence predictions, supported by comparative success rate drops when triggers are removed. To strengthen this, we will add explicit quantitative metrics including trigger importance scores via attention weights and gradient attributions on poisoned versus clean models in the revised Section 3. revision: yes
Referee: [§4] §4 (BA-Logic): the coordination of the poisoned node selector and logic-poisoning trigger generator is claimed to poison inner prediction logic independently of label changes, yet no before/after analysis (e.g., parameter inspection, feature attribution maps, or ablation isolating the generator from the selector) is described to rule out the alternative that success arises from selector bias toward high-influence nodes rather than genuine logic alteration.

Authors: BA-Logic's design ensures the trigger generator directly modifies prediction logic while the selector only identifies nodes; Section 5 already includes ablations isolating the generator's contribution from the selector. We will add before/after feature attribution maps and parameter inspections in the revised Section 4 to explicitly rule out selector bias as the sole source of success. revision: yes
Referee: [§5] §5 (experiments): while the abstract states that the method 'surpasses state-of-the-art' under clean-label settings, the provided description supplies no dataset sizes, model architectures, attack success rate tables, or ablation results that would allow assessment of whether the reported gains are robust or merely reflect particular graph structures.

Authors: The full Section 5 details dataset sizes (e.g., Cora: 2708 nodes, CiteSeer: 3327 nodes), architectures (GCN, GAT, GraphSAGE), comprehensive ASR tables versus baselines, and component ablations across multiple real-world graphs to confirm robustness. We will expand the abstract with a brief summary of key experimental settings for accessibility. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical attack design without derivations or self-referential reductions

full rationale

The paper presents BA-Logic as a coordinated selector-generator method for clean-label graph backdoor attacks, supported by preliminary analysis and extensive experiments on real-world datasets. No equations, mathematical derivations, or first-principles claims are present that could reduce to fitted inputs or self-definitions by construction. The core claim (that selector-generator coordination poisons inner prediction logic) is positioned as an empirical solution rather than a derived result; success is measured via attack success rate comparisons, not via any chain that loops back to the method's own parameters or prior self-citations as load-bearing. This is a standard empirical contribution with no circularity patterns matching the enumerated kinds.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that GNN prediction logic can be isolated and poisoned without label changes. The method introduces two new components whose effectiveness is asserted but not derived from first principles.

axioms (1)

domain assumption GNNs possess an identifiable inner prediction logic that can be altered independently of training labels
Invoked to explain why prior attacks fail and to motivate the new selector-generator design.

invented entities (1)

logic-poisoning trigger generator no independent evidence
purpose: Generates triggers that alter internal GNN decision paths under clean labels
New component proposed to address the identified failure mode of prior methods.

pith-pipeline@v0.9.0 · 5543 in / 1254 out tokens · 59232 ms · 2026-05-15T16:12:01.904767+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

[1]

Explainability techniques for graph convolutional networks

Federico Baldassarre and Hossein Azizpour. Explainability techniques for graph convolutional networks. InInternational Conference on Machine Learning (ICML) Workshops, 2019 Workshop on Learning and Reasoning with Graph-Structured Representations,

work page 2019
[2]

Agsoa: Graph neural network targeted attack based on average gradient and structure optimization.arXiv preprint arXiv:2406.13228,

Yang Chen and Bin Zhou. Agsoa: Graph neural network targeted attack based on average gradient and structure optimization.arXiv preprint arXiv:2406.13228,

work page arXiv
[3]

Say no to the discrimination: Learning fair graph neural networks with limited sensitive attribute information

Enyan Dai and Suhang Wang. Say no to the discrimination: Learning fair graph neural networks with limited sensitive attribute information. InWSDM, pp. 680–688, 2021a. Enyan Dai and Suhang Wang. Towards self-explainable graph neural network. InCIKM, pp. 302–311, 2021b. Enyan Dai, Wei Jin, Hui Liu, and Suhang Wang. Towards robust graph neural networks for n...

work page 2023
[4]

A backdoor attack against link prediction tasks with graph neural networks

Jiazhu Dai and Haoyu Sun. A backdoor attack against link prediction tasks with graph neural networks. arXiv preprint arXiv:2401.02663,

work page arXiv
[5]

Asemanticandclean-labelbackdoorattackagainstgraphconvolutionalnetworks

JiazhuDaiandHaoyuSun. Asemanticandclean-labelbackdoorattackagainstgraphconvolutionalnetworks. arXiv preprint arXiv:2503.14922,

work page arXiv
[6]

Particle swarm optimization

James Kennedy and Russell Eberhart. Particle swarm optimization. InProceedings of ICNN’95-international conference on neural networks, volume 4, pp. 1942–1948. ieee,

work page 1942
[7]

Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann

Christopher Morris, Nils M. Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann. Tudataset: A collection of benchmark datasets for learning with graphs. InICML 2020 Workshop on Graph Representation Learning and Beyond (GRL+ 2020),

work page 2020
[8]

Xuelian Ni, Fei Xiong, Yu Zheng, and Liang Wang

URLwww.graphlearning.io. Xuelian Ni, Fei Xiong, Yu Zheng, and Liang Wang. Graph contrastive learning with kernel dependence maximization for social recommendation. InProceedings of the ACM Web Conference 2024, pp. 481–492,

work page 2024
[9]

Poster: clean-label backdoor attack on graph neural networks

Jing Xu and Stjepan Picek. Poster: clean-label backdoor attack on graph neural networks. InProceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, pp. 3491–3493,

work page 2022
[10]

18 A.2 Additional Results of Attack Performance on Industry-Scale Graph

17 Table of Contents for Appendix A Additional Experiments 18 A.1 Additional Discussion on Extending Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 A.2 Additional Results of Attack Performance on Industry-Scale Graph . . . . . . . . . . . . . . . 20 A.3 Additional Results of Attack Performance towards Sampling-Based GNNs . . . . . . . ....

work page 2020
[11]

•Combined with the results of Tab

From the figure, we can observe that: •Nearly all methods show a decrease in clean accuracy, indicating that their backdoor attack process damages the normal behavior of the model, thereby weakening its practicality. •Combined with the results of Tab. 3, certain baselines (e.g., EBA-C and ECGBA) severely degrade the clean accuracy of target models, which ...

work page 2023
[12]

Notably, the competitors also achieve better ASR onCoraandPubmedthan the records for larger graphArxivin Tab

Compared to competitors,Ba-Logicstill shows significantly higher ASR. Notably, the competitors also achieve better ASR onCoraandPubmedthan the records for larger graphArxivin Tab. 5, likely due to the graphs being smaller, which aids the attack methods in generalizing trigger patterns. We further analyze why the existing defense methods in Tab. 9 fall sho...

work page 1993
[13]

3, i.e., GCN, GIN, and GAT

For each dataset, we report the IRT distributions averaged over the three clean models in Tab. 3, i.e., GCN, GIN, and GAT. From these figures, we have the following key observations: •Ba-LogicshowsconcentrationofnodeswithlargeIRTvalues, withpeaksclosetothemaximalimportance score. Thisindicatesthatthelogicpoisoningtriggersareidentifiedasimportantbythelogic...

work page 2023
[14]

(2024b), andUGBA-CDai et al

F.2 Details of Compared Methods In the main text of our work, we compareBa-Logicwith representative and state-of-the-art graph backdoor attack methods, such asDPGBA-CZhang et al. (2024b), andUGBA-CDai et al. (2023). These methods originally required altering the labels of poisoned nodes. In our experiments, we extend them to the clean- label setting by se...

work page 2023
[15]

The additional time is acceptable given that ourBa-Logicachieves an ASR over 90%, while these competitors achieve an ASR over 60%

The results are consistent with the time complexity analysis in Appendix D, indicating that theBa-Logicrequires only approximately 60 seconds more training time than the two most powerful competitors on a larger graph. The additional time is acceptable given that ourBa-Logicachieves an ASR over 90%, while these competitors achieve an ASR over 60%. This de...

work page 2016
[16]

From the table, we obtain the following key observations: •The adaptive defense can partially weaken the backdoor, indicating promising directions against logic poisoning

The gray cell indicates the competitor with the highest ASR. From the table, we obtain the following key observations: •The adaptive defense can partially weaken the backdoor, indicating promising directions against logic poisoning. However, under ourBa-Logic, the ASR remains generally high, while clean accuracy signifi- cantly drops after applying adapti...

work page 2020