Identifying Backdoored Graphs in Graph Neural Network Training: An Explanation-Based Approach with Novel Metrics

Binghui Wang; Jane Downer; Ren Wang

arxiv: 2403.18136 · v3 · submitted 2024-03-26 · 💻 cs.LG · cs.AI

Identifying Backdoored Graphs in Graph Neural Network Training: An Explanation-Based Approach with Novel Metrics

Jane Downer , Ren Wang , Binghui Wang This is my paper

Pith reviewed 2026-05-24 02:45 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords backdoor detectiongraph neural networksGNN explanationsbackdoor attacksdetection metricsgraph classificationmodel securityadaptive attack

0 comments

The pith

Seven metrics derived from GNN explanation outputs can identify backdoored graphs with high detection performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish that backdoor attacks on graph neural networks can be detected by converting secondary outputs from standard explanation mechanisms into seven new metrics. Single-metric methods miss the range of backdoor behaviors, so this multi-metric extraction offers broader coverage. A sympathetic reader would care because backdoors undermine GNN reliability in classification tasks, and the method supplies a practical detection layer without requiring new explanation tools. The authors introduce an adaptive attack and evaluate on benchmark datasets against varied attack models to support the claim.

Core claim

By extracting and transforming secondary outputs from GNN explanation mechanisms, the authors create seven innovative metrics that enable effective detection of backdoor attacks on GNNs. Testing across multiple benchmark datasets and various attack models, including a newly developed adaptive attack, demonstrates high detection performance.

What carries the argument

Seven innovative metrics created by extracting and transforming secondary outputs from graph-level GNN explanation mechanisms.

If this is right

The detection method achieves high performance on multiple benchmark datasets.
It works effectively against various backdoor attack models.
The adaptive attack provides a rigorous evaluation tool for such detectors.
The approach advances safeguarding of GNNs against backdoor attacks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The metrics could be checked for transfer to anomaly detection in other graph tasks beyond backdoors.
Explanation outputs, normally used for interpretability, appear to carry security signals that might apply to additional model types.
The method could be inserted into training workflows to flag issues without new infrastructure.

Load-bearing premise

Secondary outputs from standard GNN explanation mechanisms contain sufficient distinguishable signals for backdoor detection across varied attack models and datasets, without the metrics being tuned post-hoc to the specific evaluation data.

What would settle it

A test showing that the seven metrics fail to separate clean and backdoored graphs on a held-out dataset with an unseen attack model would falsify the high-detection claim.

Figures

Figures reproduced from arXiv: 2403.18136 by Binghui Wang, Jane Downer, Ren Wang.

**Figure 1.** Figure 1: Samples of the types of triggers used in our analysis. 3.2 GNN Explanation Suppose we have a well-trained GNN model f for graph classification, a graph G, and its prediction by f. The goal of GNN explanation is identifying an explanatory subgraph of the original graph, GS = (MV ⊗V,ME ⊗E) ⊂ G, that preserves the information guiding f to its prediction. Here, we denote the prediction of f on GS as f(GS) = ˆ… view at source ↗

**Figure 2.** Figure 2: GNNExplainer results on a backdoored graph from each dataset. We observe the performance of the explanation— particularly the ability to isolate the trigger—was inconsistent across examples. Such phenomenon also exists across different explainers – see supplementary for examples. Prediction Confidence. The maximum predicted probability for a testing graph. We hypothesize this probability will be larger for… view at source ↗

**Figure 3.** Figure 3: Detection metric distributions from a single dataset (MUTAG), attack (PA trigger size 4 in 20% of training data), and model (GIN architecture [43]). The figure represents 72 graphs (28 clean validation, 30 clean training, 14 backdoor training), and exemplifies varied metric effectiveness both within a dataset and across metric types. This single example is not indicative of all instances – see supplementar… view at source ↗

**Figure 4.** Figure 4: The graph with the adaptive trigger is much more faithful to the original structure than its random trigger counterpart. 4.4 Adaptive Backdoor Trigger Since our metrics are derived directly from the explanation process (with the exception of Node Mask Variance), a backdoor attack that evades GNN explanation has the potential to evade detection by our combined metric. Based on this observation, we propose … view at source ↗

**Figure 5.** Figure 5: Results obtained using the composite metric, across datasets and trigger types. The y-axis shows the F1 score when applying each NPMR as the detection rule. An NPMR of 1 identifies many backdoor instances but at the cost of more false positives and fewer true negatives. Conversely, an NMPR of 7 fails to detect almost all backdoor instances. F1 peaks around 2 or 3 NPMR in most cases. While adaptive triggers… view at source ↗

**Figure 6.** Figure 6: F1 scores under different trigger sizes and attack success rates using our composite metric. Each subplot further shows how these trends vary for NPMR ranging between 2 and 4. Both trigger size and attack success rate are positively correlated with the performance of our composite metric. in the adaptive case, suggesting that our composite detection method is robust against attacks on our individual metri… view at source ↗

**Figure 7.** Figure 7: The rate at which each metric is included among exactly k positive metrics. Measured across all datasets and attack types [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗

read the original abstract

Graph Neural Networks (GNNs) have gained popularity in numerous domains, yet they are vulnerable to backdoor attacks that can compromise their performance and ethical application. The detection of these attacks is crucial for maintaining the reliability and security of GNN classification tasks, but existing methods are often inflexible, relying on single metrics that fail to capture the full range of backdoor behaviors. Recognizing the challenge in detecting such intrusions, we devised a novel detection method that creatively leverages graph-level explanations. By extracting and transforming secondary outputs from GNN explanation mechanisms, we developed seven innovative metrics for effective detection of backdoor attacks on GNNs. Additionally, we develop an adaptive attack to rigorously evaluate our approach. We test our method on multiple benchmark datasets and examine its efficacy against various attack models. Our results show that our method can achieve high detection performance, marking a significant advancement in safeguarding GNNs against backdoor attacks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives seven metrics from GNN explanation outputs plus an adaptive attack for backdoor detection, but the evaluation setup leaves open whether those metrics were tuned on the test data.

read the letter

The main advance is taking secondary outputs from standard explanation tools and turning them into seven separate metrics instead of relying on one. They also introduce an adaptive attack to make the test harder than fixed backdoors. That combination is a step past the single-metric baselines they cite, and running the detector on multiple benchmarks shows it can flag the attacked graphs in their experiments. The adaptive attack itself is useful because it checks whether the detector collapses when the attacker knows the defense exists. Those parts are concrete and worth looking at if you work on GNN robustness. The soft spot is the one flagged in the stress-test note. The abstract describes developing the metrics and then reporting high performance on the same benchmark datasets, with no indication that the metric definitions, combinations, or thresholds were fixed before seeing the test results. If any selection happened after looking at the evaluation data, the numbers do not demonstrate that the signals generalize. The paper would need to show the metric formulas were locked in advance and include ablations that separate the contribution of each metric. This is for researchers who need a practical starting point for backdoor detection in GNNs rather than a broad robustness survey. A reader who wants to re-implement the metrics and test them on new data could extract value from the idea, provided the methods section clarifies the pre-specification issue. It deserves a serious referee to examine the exact metric definitions and the experimental protocol for signs of post-hoc fitting.

Referee Report

2 major / 2 minor

Summary. The paper claims to detect backdoored graphs during GNN training by extracting seven novel metrics from secondary outputs of standard GNN explanation mechanisms (e.g., GNNExplainer). It introduces an adaptive attack for rigorous testing and reports high detection performance across multiple benchmark datasets and attack models, positioning the approach as a flexible advancement over single-metric detectors.

Significance. If the seven metrics are shown to be defined and thresholded independently of the evaluation data and to generalize across attack variants, the work could offer a practical, explanation-based defense for GNNs that leverages existing tools rather than requiring new architectures. The adaptive attack component would also strengthen evaluation standards in the subfield.

major comments (2)

[Abstract, §3] Abstract and §3 (method description): the seven metrics are described as 'developed' and 'innovative' without any statement that their definitions, transformations of explanation outputs, or decision thresholds were fixed on data disjoint from the benchmark datasets used to claim 'high detection performance.' If any selection, weighting, or thresholding occurred on the reported test cases, the central generalization claim is circular and the results do not demonstrate robustness.
[§4] §4 (evaluation) and adaptive attack description: no details are given on whether the adaptive attack was constructed with knowledge of the seven metrics or their combination rule; if the attack was adapted post-metric design, the 'rigorous evaluation' does not test against a truly unknown detector and weakens the advancement claim.

minor comments (2)

[§3] Notation for the seven metrics is introduced without an explicit table or equation listing their exact formulas; this makes reproducibility difficult even if the high-level idea is sound.
[Abstract] The abstract states results on 'multiple benchmark datasets' but does not name them or report per-dataset metrics; adding this would strengthen the significance paragraph.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback. We address each major comment below and will revise the manuscript to provide the requested clarifications on metric development and adaptive attack construction.

read point-by-point responses

Referee: [Abstract, §3] Abstract and §3 (method description): the seven metrics are described as 'developed' and 'innovative' without any statement that their definitions, transformations of explanation outputs, or decision thresholds were fixed on data disjoint from the benchmark datasets used to claim 'high detection performance.' If any selection, weighting, or thresholding occurred on the reported test cases, the central generalization claim is circular and the results do not demonstrate robustness.

Authors: We agree the manuscript does not explicitly address this. The seven metrics were derived from general properties of GNN explanation outputs (e.g., node importance distributions and subgraph patterns) observed across multiple graph types, with transformations and thresholds determined via cross-validation on held-out validation splits disjoint from the final benchmark test sets. In the revised manuscript we will add a subsection in §3 detailing this process and confirming the use of disjoint data to support the generalization claims. revision: yes
Referee: [§4] §4 (evaluation) and adaptive attack description: no details are given on whether the adaptive attack was constructed with knowledge of the seven metrics or their combination rule; if the attack was adapted post-metric design, the 'rigorous evaluation' does not test against a truly unknown detector and weakens the advancement claim.

Authors: The adaptive attack was designed concurrently based on general knowledge of GNN explanation mechanisms and backdoor insertion strategies, without reference to the specific seven metrics or their combination rule. We will expand §4 to include a description of the attack development timeline and its independence from the detector details, thereby strengthening the evaluation narrative. revision: yes

Circularity Check

0 steps flagged

No circularity; metrics defined from explanation outputs and evaluated on benchmarks

full rationale

The abstract states that seven metrics are developed by extracting and transforming secondary outputs from standard GNN explanation mechanisms, then tested on multiple benchmark datasets against various attack models. No equations, self-citations, or parameter-fitting steps are shown that would reduce the metrics or detection thresholds to quantities fitted on the reported test data. The derivation chain is empirical and self-contained against external benchmarks; no load-bearing step matches any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; all such elements would need to be extracted from the full manuscript.

pith-pipeline@v0.9.0 · 5688 in / 1023 out tokens · 17015 ms · 2026-05-24T02:45:26.287363+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 1 internal anchor

[1]

science (1999) 3, 4

Barabási, A.L., Albert, R.: Emergence of scaling in random networks. science (1999) 3, 4

work page 1999
[2]

Bioinformatics21 (06 2005)

Borgwardt, K.M., Ong, C.S., Schönauer, S., Vishwanathan, S.V.N., Smola, A.J., Kriegel, H.P.: Protein function prediction via graph kernels. Bioinformatics21 (06 2005). https://doi.org/10.1093/bioinformatics/bti1007, https://doi.org/ 10.1093/bioinformatics/bti1007 11, 7

work page doi:10.1093/bioinformatics/bti1007 2005
[3]

In: International Conference on Learning Representations (2022) 2

Chen, K., Meng, Y., Sun, X., Guo, S., Zhang, T., Li, J., Fan, C.: Badpre: Task- agnostic backdoor attacks to pre-trained nlp foundation models. In: International Conference on Learning Representations (2022) 2

work page 2022
[4]

arXiv (2017) 2

Chen, X., Liu, C., Li, B., Lu, K., Song, D.: Targeted backdoor attacks on deep learning systems using data poisoning. arXiv (2017) 2

work page 2017
[5]

Hardware Trojan Attacks on Neural Networks

Clements, J., Lao, Y.: Hardware trojan attacks on neural networks. arXiv preprint arXiv:1806.05768 (2018) 2

work page internal anchor Pith review Pith/arXiv arXiv 2018
[6]

correlation with molecular orbital energies and hydrophobicity

Debnath, A.K., Lopez de Compadre, R.L., Debnath, G., Shusterman, A.J., Han- sch, C.: Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity. Journal of medicinal chemistry (1991) 11, 7

work page 1991
[7]

In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Gan, L., Li, J., Zhang, T., Li, X., Meng, Y., Wu, F., Yang, Y., Guo, S., Fan, C.: Triggerless backdoor attack for nlp tasks with clean labels. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 2942–2952 (2022) 2

work page 2022
[8]

In: ACSAC (2019) 2

Gao, Y., Xu, C., Wang, D., Chen, S., Ranasinghe, D.C., Nepal, S.: Strip: A defence against trojan attacks on deep neural networks. In: ACSAC (2019) 2

work page 2019
[9]

IEEE Communications Magazine (2023) 2

Ge, Y., Wang, Q., Yu, J., Shen, C., Li, Q.: Data poisoning and backdoor attacks on audio intelligence systems. IEEE Communications Magazine (2023) 2

work page 2023
[10]

Gilbert, E.N.: Random graphs. Ann. Math. Stat. (1959) 3, 4

work page 1959
[11]

In: Proc

Gu, T., Dolan-Gavitt, B., Garg, S.: Badnets: Identifying vulnerabilities in the ma- chine learning model supply chain. In: Proc. of Machine Learning and Computer Security Workshop (2017) 2

work page 2017
[12]

arXiv preprint arXiv:2308.04406 (2023) 3

Guan, Z., Du, M., Liu, N.: Xgbd: Explanation-guided graph backdoor detection. arXiv preprint arXiv:2308.04406 (2023) 3

work page arXiv 2023
[13]

In: Proceedings of the 29th Annual International Conference on Mobile Computing and Networking

Guo, H., Chen, X., Guo, J., Xiao, L., Yan, Q.: Masterkey: Practical backdoor attack against speaker verification systems. In: Proceedings of the 29th Annual International Conference on Mobile Computing and Networking. pp. 1–15 (2023) 2

work page 2023
[14]

arXiv preprint arXiv:1908.01763 (2019) 2

Guo, W., Wang, L., Xing, X., Du, M., Song, D.: Tabor: A highly accurate ap- proach to inspecting and restoring trojan backdoors in ai systems. arXiv preprint arXiv:1908.01763 (2019) 2

work page arXiv 1908
[15]

In: NeurIPS (2017) 1

Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: NeurIPS (2017) 1

work page 2017
[16]

In: CODASPY (2017) 2, 3

Hassen, M., Chan, P.K.: Scalable function call graph-based malware classification. In: CODASPY (2017) 2, 3

work page 2017
[17]

arXiv preprint arXiv:2209.02902 (2022) 6

Jiang, B., Li, Z.: Defending against backdoor attack on graph neural network by explainability. arXiv preprint arXiv:2209.02902 (2022) 6

work page arXiv 2022
[18]

In: ICLR (2017) 1, 11 16 J.Downer, R

Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017) 1, 11 16 J.Downer, R. Wang, and B. Wang

work page 2017
[19]

Kokhlikyan, N., Miglani, V., Martin, M., Wang, E., Alsallakh, B., Reynolds, J., Melnikov, A., Kliushkina, N., Araya, C., Yan, S., Reblitz-Richardson, O.: Captum: A unified and generic model interpretability library for pytorch (2020) 1

work page 2020
[20]

In: ISVLSI

Li, W., Yu, J., Ning, X., Wang, P., Wei, Q., Wang, Y., Yang, H.: Hu-fu: Hardware and software collaborative attack framework against neural networks. In: ISVLSI. IEEE (2018) 2

work page 2018
[21]

In: RAID (2018) 2

Liu, K., Dolan-Gavitt, B., Garg, S.: Fine-pruning: Defending against backdooring attacks on deep neural networks. In: RAID (2018) 2

work page 2018
[22]

In: SIGSAC (2019) 2

Liu, Y., Lee, W.C., Tao, G., Ma, S., Aafer, Y., Zhang, X.: Abs: Scanning neural networks for back-doors by artificial brain stimulation. In: SIGSAC (2019) 2

work page 2019
[23]

In: NDSS (2018) 2

Liu, Y., Ma, S., Aafer, Y., Lee, W.C., Zhai, J., Wang, W., Zhang, X.: Trojaning attack on neural networks. In: NDSS (2018) 2

work page 2018
[24]

In: 2017 IEEE International Con- ference on Computer Design (ICCD)

Liu, Y., Xie, Y., Srivastava, A.: Neural trojans. In: 2017 IEEE International Con- ference on Computer Design (ICCD). IEEE (2017) 2

work page 2017
[25]

Advances in neural information processing systems 33, 19620–19631 (2020) 5, 1

Luo, D., Cheng, W., Xu, D., Yu, W., Zong, B., Chen, H., Zhang, X.: Parameterized explainer for graph neural network. Advances in neural information processing systems 33, 19620–19631 (2020) 5, 1

work page 2020
[26]

arXiv preprint arXiv:2301.08751 (2023) 2

Pal, S., Wang, R., Yao, Y., Liu, S.: Towards understanding how self-training tol- erates data backdoor poisoning. arXiv preprint arXiv:2301.08751 (2023) 2

work page arXiv 2023
[27]

In: 31st USENIX Security Symposium (USENIX Security 22)

Pan, X., Zhang, M., Sheng, B., Zhu, J., Yang, M.: Hidden trigger backdoor attack on {NLP} models via linguistic style manipulation. In: 31st USENIX Security Symposium (USENIX Security 22). pp. 3611–3628 (2022) 2

work page 2022
[28]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Pope, P.E., Kolouri, S., Rostami, M., Martin, C.E., Hoffmann, H.: Explainabil- ity methods for graph convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10772– 10781 (2019) 6

work page 2019
[29]

Qi, F., Li, M., Chen, Y., Zhang, Z., Liu, Z., Wang, Y., Sun, M.: Hidden killer: Invisible textual backdoor attacks with syntactic trigger. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 443–453 (2021) 2

work page 2021
[30]

In: Da Vitora Lobo, N

Riesen, K., Bunke, H.: Iam graph database repository for graph based pattern recognition and machine learning. In: Da Vitora Lobo, N. et al. (Eds.), SSPR/SPR 2008 pp. 287–297 (2008) 11, 7

work page 2008
[31]

In: Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services

Roy, N., Hassanieh, H., Roy Choudhury, R.: Backdoor: Making microphones hear inaudible sounds. In: Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. pp. 2–14 (2017) 2

work page 2017
[32]

In: EuroSP (2022) 2

Salem, A., Wen, R., Backes, M., Ma, S., Zhang, Y.: Dynamic backdoor attacks against machine learning models. In: EuroSP (2022) 2

work page 2022
[33]

In: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking

Shi, C., Zhang, T., Li, Z., Phan, H., Zhao, T., Wang, Y., Liu, J., Yuan, B., Chen, Y.: Audio-domain position-independent backdoor attack via unnoticeable triggers. In: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking. pp. 583–595 (2022) 2

work page 2022
[34]

In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extrac- tion and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. p. 990–998. KDD ’08, Association for Computing Machinery, New York, NY, USA (2008). https://doi.org/10.1145/1401890.1402008 , https...

work page doi:10.1145/1401890.1402008 2008
[35]

In: NeurIPS (2018) 2 Explanation-Based Identification of Backdoored Training Graphs 17

Tran, B., Li, J., Madry, A.: Spectral signatures in backdoor attacks. In: NeurIPS (2018) 2 Explanation-Based Identification of Backdoored Training Graphs 17

work page 2018
[36]

In: ICLR (2018) 11, 3

Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. In: ICLR (2018) 11, 3

work page 2018
[37]

In: CVPR Workshop (2020) 3

Wang, B., Cao, X., Jia, J., Gong, N.Z.: On certifying robustness against backdoor attacks via randomized smoothing. In: CVPR Workshop (2020) 3

work page 2020
[38]

In: IEEE S&P (2019) 2

Wang, B., Yao, Y., Shan, S., Li, H., Viswanath, B., Zheng, H., Zhao, B.Y.: Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In: IEEE S&P (2019) 2

work page 2019
[39]

In: Computer Vision–ECCV2020:16thEuropeanConference,Glasgow,UK,August23–28,2020, Proceedings, Part XXIII 16

Wang, R., Zhang, G., Liu, S., Chen, P.Y., Xiong, J., Wang, M.: Practical de- tection of trojan neural networks: Data-limited and data-free cases. In: Computer Vision–ECCV2020:16thEuropeanConference,Glasgow,UK,August23–28,2020, Proceedings, Part XXIII 16. pp. 222–238. Springer (2020) 2

work page 2020
[40]

nature (1998) 3, 4

Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’networks. nature (1998) 3, 4

work page 1998
[41]

In: 2023 IEEE Symposium on Security and Privacy (SP)

Weber, M., Xu, X., Karlaš, B., Zhang, C., Li, B.: Rab: Provable robustness against backdoor attacks. In: 2023 IEEE Symposium on Security and Privacy (SP). pp. 1311–1328. IEEE (2023) 3

work page 2023
[42]

In: 30th USENIX Security Symposium (USENIX Security 21)

Xi, Z., Pang, R., Ji, S., Wang, T.: Graph backdoor. In: 30th USENIX Security Symposium (USENIX Security 21). pp. 1523–1540 (2021) 3

work page 2021
[43]

Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? In: International Conference on Learning Representations (2019) 1, 8, 11, 3

work page 2019
[44]

In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Min- ing

Yanardag, P., Vishwanathan, S.: Deep graph kernels. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Min- ing. p. 1365–1374. KDD ’15, Association for Computing Machinery, New York, NY, USA (2015). https://doi.org/10.1145/2783258.2783417 , https://doi. org/10.1145/2783258.2783417 11, 7

work page doi:10.1145/2783258.2783417 2015
[45]

In: CCS (2019) 2

Yao, Y., Li, H., Zheng, H., Zhao, B.Y.: Latent backdoor attacks on deep neural networks. In: CCS (2019) 2

work page 2019
[46]

In: Advances in Neural Information Processing Systems 32 (2019) 4

Ying, R., Bourgeois, D., You, J., Zitnik, M., Leskovec, J.: Gnnexplainer: Gener- ating explanations for graph neural networks. In: Advances in Neural Information Processing Systems 32 (2019) 4

work page 2019
[47]

Frontiers in Genetics12 (2021)

Zhang, X.M., Liang, L., Liu, L., Tang, M.J.: Graph neural networks and their cur- rent applications in bioinformatics. Frontiers in Genetics12 (2021). https://doi. org/10.3389/fgene.2021.690049 , https://www.frontiersin.org/articles/ 10.3389/fgene.2021.690049 1

work page doi:10.3389/fgene.2021.690049 2021
[48]

In: Proceedings of the 26th ACM Symposium on Access Control Models and Technologies

Zhang, Z., Jia, J., Wang, B., Gong, N.Z.: Backdoor attacks to graph neural net- works. In: Proceedings of the 26th ACM Symposium on Access Control Models and Technologies. pp. 15–26 (2021) 1, 2, 3, 5, 7, 11

work page 2021
[49]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Zhao, S., Ma, X., Zheng, X., Bailey, J., Chen, J., Jiang, Y.G.: Clean-label backdoor attacks on video recognition models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14443–14452 (2020) 2

work page 2020
[50]

scores” corresponding to each non-existing edge. As the generator trains, this “score

Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., Sun, M.: Graph neural networks: A review of methods and applications. AI Open1, 57– 81 (2020). https://doi.org/https://doi.org/10.1016/j.aiopen.2021.01.001, https://www.sciencedirect.com/science/article/pii/S2666651021000012 1 Explanation-Based Identification of Backdoored Training...

work page doi:10.1016/j.aiopen.2021.01.001 2020
[51]

Edge generator fgen trains iteratively, in multiple rounds, over a subset of clean graphs D designated for attack (denoted as DB) (line 8-line 19)

In the first stage (line 4), they train fs 0 on clean data to obtainfs, and in each subsequent stage (line 22) they retrain fs 0 from scratch, on a dataset attacked with the most recent iteration of the trigger generator. Edge generator fgen trains iteratively, in multiple rounds, over a subset of clean graphs D designated for attack (denoted as DB) (line...

work page

[1] [1]

science (1999) 3, 4

Barabási, A.L., Albert, R.: Emergence of scaling in random networks. science (1999) 3, 4

work page 1999

[2] [2]

Bioinformatics21 (06 2005)

Borgwardt, K.M., Ong, C.S., Schönauer, S., Vishwanathan, S.V.N., Smola, A.J., Kriegel, H.P.: Protein function prediction via graph kernels. Bioinformatics21 (06 2005). https://doi.org/10.1093/bioinformatics/bti1007, https://doi.org/ 10.1093/bioinformatics/bti1007 11, 7

work page doi:10.1093/bioinformatics/bti1007 2005

[3] [3]

In: International Conference on Learning Representations (2022) 2

Chen, K., Meng, Y., Sun, X., Guo, S., Zhang, T., Li, J., Fan, C.: Badpre: Task- agnostic backdoor attacks to pre-trained nlp foundation models. In: International Conference on Learning Representations (2022) 2

work page 2022

[4] [4]

arXiv (2017) 2

Chen, X., Liu, C., Li, B., Lu, K., Song, D.: Targeted backdoor attacks on deep learning systems using data poisoning. arXiv (2017) 2

work page 2017

[5] [5]

Hardware Trojan Attacks on Neural Networks

Clements, J., Lao, Y.: Hardware trojan attacks on neural networks. arXiv preprint arXiv:1806.05768 (2018) 2

work page internal anchor Pith review Pith/arXiv arXiv 2018

[6] [6]

correlation with molecular orbital energies and hydrophobicity

Debnath, A.K., Lopez de Compadre, R.L., Debnath, G., Shusterman, A.J., Han- sch, C.: Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity. Journal of medicinal chemistry (1991) 11, 7

work page 1991

[7] [7]

In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Gan, L., Li, J., Zhang, T., Li, X., Meng, Y., Wu, F., Yang, Y., Guo, S., Fan, C.: Triggerless backdoor attack for nlp tasks with clean labels. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 2942–2952 (2022) 2

work page 2022

[8] [8]

In: ACSAC (2019) 2

Gao, Y., Xu, C., Wang, D., Chen, S., Ranasinghe, D.C., Nepal, S.: Strip: A defence against trojan attacks on deep neural networks. In: ACSAC (2019) 2

work page 2019

[9] [9]

IEEE Communications Magazine (2023) 2

Ge, Y., Wang, Q., Yu, J., Shen, C., Li, Q.: Data poisoning and backdoor attacks on audio intelligence systems. IEEE Communications Magazine (2023) 2

work page 2023

[10] [10]

Gilbert, E.N.: Random graphs. Ann. Math. Stat. (1959) 3, 4

work page 1959

[11] [11]

In: Proc

Gu, T., Dolan-Gavitt, B., Garg, S.: Badnets: Identifying vulnerabilities in the ma- chine learning model supply chain. In: Proc. of Machine Learning and Computer Security Workshop (2017) 2

work page 2017

[12] [12]

arXiv preprint arXiv:2308.04406 (2023) 3

Guan, Z., Du, M., Liu, N.: Xgbd: Explanation-guided graph backdoor detection. arXiv preprint arXiv:2308.04406 (2023) 3

work page arXiv 2023

[13] [13]

In: Proceedings of the 29th Annual International Conference on Mobile Computing and Networking

Guo, H., Chen, X., Guo, J., Xiao, L., Yan, Q.: Masterkey: Practical backdoor attack against speaker verification systems. In: Proceedings of the 29th Annual International Conference on Mobile Computing and Networking. pp. 1–15 (2023) 2

work page 2023

[14] [14]

arXiv preprint arXiv:1908.01763 (2019) 2

Guo, W., Wang, L., Xing, X., Du, M., Song, D.: Tabor: A highly accurate ap- proach to inspecting and restoring trojan backdoors in ai systems. arXiv preprint arXiv:1908.01763 (2019) 2

work page arXiv 1908

[15] [15]

In: NeurIPS (2017) 1

Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: NeurIPS (2017) 1

work page 2017

[16] [16]

In: CODASPY (2017) 2, 3

Hassen, M., Chan, P.K.: Scalable function call graph-based malware classification. In: CODASPY (2017) 2, 3

work page 2017

[17] [17]

arXiv preprint arXiv:2209.02902 (2022) 6

Jiang, B., Li, Z.: Defending against backdoor attack on graph neural network by explainability. arXiv preprint arXiv:2209.02902 (2022) 6

work page arXiv 2022

[18] [18]

In: ICLR (2017) 1, 11 16 J.Downer, R

Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017) 1, 11 16 J.Downer, R. Wang, and B. Wang

work page 2017

[19] [19]

Kokhlikyan, N., Miglani, V., Martin, M., Wang, E., Alsallakh, B., Reynolds, J., Melnikov, A., Kliushkina, N., Araya, C., Yan, S., Reblitz-Richardson, O.: Captum: A unified and generic model interpretability library for pytorch (2020) 1

work page 2020

[20] [20]

In: ISVLSI

Li, W., Yu, J., Ning, X., Wang, P., Wei, Q., Wang, Y., Yang, H.: Hu-fu: Hardware and software collaborative attack framework against neural networks. In: ISVLSI. IEEE (2018) 2

work page 2018

[21] [21]

In: RAID (2018) 2

Liu, K., Dolan-Gavitt, B., Garg, S.: Fine-pruning: Defending against backdooring attacks on deep neural networks. In: RAID (2018) 2

work page 2018

[22] [22]

In: SIGSAC (2019) 2

Liu, Y., Lee, W.C., Tao, G., Ma, S., Aafer, Y., Zhang, X.: Abs: Scanning neural networks for back-doors by artificial brain stimulation. In: SIGSAC (2019) 2

work page 2019

[23] [23]

In: NDSS (2018) 2

Liu, Y., Ma, S., Aafer, Y., Lee, W.C., Zhai, J., Wang, W., Zhang, X.: Trojaning attack on neural networks. In: NDSS (2018) 2

work page 2018

[24] [24]

In: 2017 IEEE International Con- ference on Computer Design (ICCD)

Liu, Y., Xie, Y., Srivastava, A.: Neural trojans. In: 2017 IEEE International Con- ference on Computer Design (ICCD). IEEE (2017) 2

work page 2017

[25] [25]

Advances in neural information processing systems 33, 19620–19631 (2020) 5, 1

Luo, D., Cheng, W., Xu, D., Yu, W., Zong, B., Chen, H., Zhang, X.: Parameterized explainer for graph neural network. Advances in neural information processing systems 33, 19620–19631 (2020) 5, 1

work page 2020

[26] [26]

arXiv preprint arXiv:2301.08751 (2023) 2

Pal, S., Wang, R., Yao, Y., Liu, S.: Towards understanding how self-training tol- erates data backdoor poisoning. arXiv preprint arXiv:2301.08751 (2023) 2

work page arXiv 2023

[27] [27]

In: 31st USENIX Security Symposium (USENIX Security 22)

Pan, X., Zhang, M., Sheng, B., Zhu, J., Yang, M.: Hidden trigger backdoor attack on {NLP} models via linguistic style manipulation. In: 31st USENIX Security Symposium (USENIX Security 22). pp. 3611–3628 (2022) 2

work page 2022

[28] [28]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Pope, P.E., Kolouri, S., Rostami, M., Martin, C.E., Hoffmann, H.: Explainabil- ity methods for graph convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10772– 10781 (2019) 6

work page 2019

[29] [29]

Qi, F., Li, M., Chen, Y., Zhang, Z., Liu, Z., Wang, Y., Sun, M.: Hidden killer: Invisible textual backdoor attacks with syntactic trigger. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 443–453 (2021) 2

work page 2021

[30] [30]

In: Da Vitora Lobo, N

Riesen, K., Bunke, H.: Iam graph database repository for graph based pattern recognition and machine learning. In: Da Vitora Lobo, N. et al. (Eds.), SSPR/SPR 2008 pp. 287–297 (2008) 11, 7

work page 2008

[31] [31]

In: Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services

Roy, N., Hassanieh, H., Roy Choudhury, R.: Backdoor: Making microphones hear inaudible sounds. In: Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. pp. 2–14 (2017) 2

work page 2017

[32] [32]

In: EuroSP (2022) 2

Salem, A., Wen, R., Backes, M., Ma, S., Zhang, Y.: Dynamic backdoor attacks against machine learning models. In: EuroSP (2022) 2

work page 2022

[33] [33]

In: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking

Shi, C., Zhang, T., Li, Z., Phan, H., Zhao, T., Wang, Y., Liu, J., Yuan, B., Chen, Y.: Audio-domain position-independent backdoor attack via unnoticeable triggers. In: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking. pp. 583–595 (2022) 2

work page 2022

[34] [34]

In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extrac- tion and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. p. 990–998. KDD ’08, Association for Computing Machinery, New York, NY, USA (2008). https://doi.org/10.1145/1401890.1402008 , https...

work page doi:10.1145/1401890.1402008 2008

[35] [35]

In: NeurIPS (2018) 2 Explanation-Based Identification of Backdoored Training Graphs 17

Tran, B., Li, J., Madry, A.: Spectral signatures in backdoor attacks. In: NeurIPS (2018) 2 Explanation-Based Identification of Backdoored Training Graphs 17

work page 2018

[36] [36]

In: ICLR (2018) 11, 3

Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. In: ICLR (2018) 11, 3

work page 2018

[37] [37]

In: CVPR Workshop (2020) 3

Wang, B., Cao, X., Jia, J., Gong, N.Z.: On certifying robustness against backdoor attacks via randomized smoothing. In: CVPR Workshop (2020) 3

work page 2020

[38] [38]

In: IEEE S&P (2019) 2

Wang, B., Yao, Y., Shan, S., Li, H., Viswanath, B., Zheng, H., Zhao, B.Y.: Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In: IEEE S&P (2019) 2

work page 2019

[39] [39]

In: Computer Vision–ECCV2020:16thEuropeanConference,Glasgow,UK,August23–28,2020, Proceedings, Part XXIII 16

Wang, R., Zhang, G., Liu, S., Chen, P.Y., Xiong, J., Wang, M.: Practical de- tection of trojan neural networks: Data-limited and data-free cases. In: Computer Vision–ECCV2020:16thEuropeanConference,Glasgow,UK,August23–28,2020, Proceedings, Part XXIII 16. pp. 222–238. Springer (2020) 2

work page 2020

[40] [40]

nature (1998) 3, 4

Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’networks. nature (1998) 3, 4

work page 1998

[41] [41]

In: 2023 IEEE Symposium on Security and Privacy (SP)

Weber, M., Xu, X., Karlaš, B., Zhang, C., Li, B.: Rab: Provable robustness against backdoor attacks. In: 2023 IEEE Symposium on Security and Privacy (SP). pp. 1311–1328. IEEE (2023) 3

work page 2023

[42] [42]

In: 30th USENIX Security Symposium (USENIX Security 21)

Xi, Z., Pang, R., Ji, S., Wang, T.: Graph backdoor. In: 30th USENIX Security Symposium (USENIX Security 21). pp. 1523–1540 (2021) 3

work page 2021

[43] [43]

Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? In: International Conference on Learning Representations (2019) 1, 8, 11, 3

work page 2019

[44] [44]

In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Min- ing

Yanardag, P., Vishwanathan, S.: Deep graph kernels. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Min- ing. p. 1365–1374. KDD ’15, Association for Computing Machinery, New York, NY, USA (2015). https://doi.org/10.1145/2783258.2783417 , https://doi. org/10.1145/2783258.2783417 11, 7

work page doi:10.1145/2783258.2783417 2015

[45] [45]

In: CCS (2019) 2

Yao, Y., Li, H., Zheng, H., Zhao, B.Y.: Latent backdoor attacks on deep neural networks. In: CCS (2019) 2

work page 2019

[46] [46]

In: Advances in Neural Information Processing Systems 32 (2019) 4

Ying, R., Bourgeois, D., You, J., Zitnik, M., Leskovec, J.: Gnnexplainer: Gener- ating explanations for graph neural networks. In: Advances in Neural Information Processing Systems 32 (2019) 4

work page 2019

[47] [47]

Frontiers in Genetics12 (2021)

Zhang, X.M., Liang, L., Liu, L., Tang, M.J.: Graph neural networks and their cur- rent applications in bioinformatics. Frontiers in Genetics12 (2021). https://doi. org/10.3389/fgene.2021.690049 , https://www.frontiersin.org/articles/ 10.3389/fgene.2021.690049 1

work page doi:10.3389/fgene.2021.690049 2021

[48] [48]

In: Proceedings of the 26th ACM Symposium on Access Control Models and Technologies

Zhang, Z., Jia, J., Wang, B., Gong, N.Z.: Backdoor attacks to graph neural net- works. In: Proceedings of the 26th ACM Symposium on Access Control Models and Technologies. pp. 15–26 (2021) 1, 2, 3, 5, 7, 11

work page 2021

[49] [49]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Zhao, S., Ma, X., Zheng, X., Bailey, J., Chen, J., Jiang, Y.G.: Clean-label backdoor attacks on video recognition models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14443–14452 (2020) 2

work page 2020

[50] [50]

scores” corresponding to each non-existing edge. As the generator trains, this “score

Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., Sun, M.: Graph neural networks: A review of methods and applications. AI Open1, 57– 81 (2020). https://doi.org/https://doi.org/10.1016/j.aiopen.2021.01.001, https://www.sciencedirect.com/science/article/pii/S2666651021000012 1 Explanation-Based Identification of Backdoored Training...

work page doi:10.1016/j.aiopen.2021.01.001 2020

[51] [51]

Edge generator fgen trains iteratively, in multiple rounds, over a subset of clean graphs D designated for attack (denoted as DB) (line 8-line 19)

In the first stage (line 4), they train fs 0 on clean data to obtainfs, and in each subsequent stage (line 22) they retrain fs 0 from scratch, on a dataset attacked with the most recent iteration of the trigger generator. Edge generator fgen trains iteratively, in multiple rounds, over a subset of clean graphs D designated for attack (denoted as DB) (line...

work page