Identifying Backdoored Graphs in Graph Neural Network Training: An Explanation-Based Approach with Novel Metrics
Pith reviewed 2026-05-24 02:45 UTC · model grok-4.3
The pith
Seven metrics derived from GNN explanation outputs can identify backdoored graphs with high detection performance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By extracting and transforming secondary outputs from GNN explanation mechanisms, the authors create seven innovative metrics that enable effective detection of backdoor attacks on GNNs. Testing across multiple benchmark datasets and various attack models, including a newly developed adaptive attack, demonstrates high detection performance.
What carries the argument
Seven innovative metrics created by extracting and transforming secondary outputs from graph-level GNN explanation mechanisms.
If this is right
- The detection method achieves high performance on multiple benchmark datasets.
- It works effectively against various backdoor attack models.
- The adaptive attack provides a rigorous evaluation tool for such detectors.
- The approach advances safeguarding of GNNs against backdoor attacks.
Where Pith is reading between the lines
- The metrics could be checked for transfer to anomaly detection in other graph tasks beyond backdoors.
- Explanation outputs, normally used for interpretability, appear to carry security signals that might apply to additional model types.
- The method could be inserted into training workflows to flag issues without new infrastructure.
Load-bearing premise
Secondary outputs from standard GNN explanation mechanisms contain sufficient distinguishable signals for backdoor detection across varied attack models and datasets, without the metrics being tuned post-hoc to the specific evaluation data.
What would settle it
A test showing that the seven metrics fail to separate clean and backdoored graphs on a held-out dataset with an unseen attack model would falsify the high-detection claim.
Figures
read the original abstract
Graph Neural Networks (GNNs) have gained popularity in numerous domains, yet they are vulnerable to backdoor attacks that can compromise their performance and ethical application. The detection of these attacks is crucial for maintaining the reliability and security of GNN classification tasks, but existing methods are often inflexible, relying on single metrics that fail to capture the full range of backdoor behaviors. Recognizing the challenge in detecting such intrusions, we devised a novel detection method that creatively leverages graph-level explanations. By extracting and transforming secondary outputs from GNN explanation mechanisms, we developed seven innovative metrics for effective detection of backdoor attacks on GNNs. Additionally, we develop an adaptive attack to rigorously evaluate our approach. We test our method on multiple benchmark datasets and examine its efficacy against various attack models. Our results show that our method can achieve high detection performance, marking a significant advancement in safeguarding GNNs against backdoor attacks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to detect backdoored graphs during GNN training by extracting seven novel metrics from secondary outputs of standard GNN explanation mechanisms (e.g., GNNExplainer). It introduces an adaptive attack for rigorous testing and reports high detection performance across multiple benchmark datasets and attack models, positioning the approach as a flexible advancement over single-metric detectors.
Significance. If the seven metrics are shown to be defined and thresholded independently of the evaluation data and to generalize across attack variants, the work could offer a practical, explanation-based defense for GNNs that leverages existing tools rather than requiring new architectures. The adaptive attack component would also strengthen evaluation standards in the subfield.
major comments (2)
- [Abstract, §3] Abstract and §3 (method description): the seven metrics are described as 'developed' and 'innovative' without any statement that their definitions, transformations of explanation outputs, or decision thresholds were fixed on data disjoint from the benchmark datasets used to claim 'high detection performance.' If any selection, weighting, or thresholding occurred on the reported test cases, the central generalization claim is circular and the results do not demonstrate robustness.
- [§4] §4 (evaluation) and adaptive attack description: no details are given on whether the adaptive attack was constructed with knowledge of the seven metrics or their combination rule; if the attack was adapted post-metric design, the 'rigorous evaluation' does not test against a truly unknown detector and weakens the advancement claim.
minor comments (2)
- [§3] Notation for the seven metrics is introduced without an explicit table or equation listing their exact formulas; this makes reproducibility difficult even if the high-level idea is sound.
- [Abstract] The abstract states results on 'multiple benchmark datasets' but does not name them or report per-dataset metrics; adding this would strengthen the significance paragraph.
Simulated Author's Rebuttal
Thank you for the constructive feedback. We address each major comment below and will revise the manuscript to provide the requested clarifications on metric development and adaptive attack construction.
read point-by-point responses
-
Referee: [Abstract, §3] Abstract and §3 (method description): the seven metrics are described as 'developed' and 'innovative' without any statement that their definitions, transformations of explanation outputs, or decision thresholds were fixed on data disjoint from the benchmark datasets used to claim 'high detection performance.' If any selection, weighting, or thresholding occurred on the reported test cases, the central generalization claim is circular and the results do not demonstrate robustness.
Authors: We agree the manuscript does not explicitly address this. The seven metrics were derived from general properties of GNN explanation outputs (e.g., node importance distributions and subgraph patterns) observed across multiple graph types, with transformations and thresholds determined via cross-validation on held-out validation splits disjoint from the final benchmark test sets. In the revised manuscript we will add a subsection in §3 detailing this process and confirming the use of disjoint data to support the generalization claims. revision: yes
-
Referee: [§4] §4 (evaluation) and adaptive attack description: no details are given on whether the adaptive attack was constructed with knowledge of the seven metrics or their combination rule; if the attack was adapted post-metric design, the 'rigorous evaluation' does not test against a truly unknown detector and weakens the advancement claim.
Authors: The adaptive attack was designed concurrently based on general knowledge of GNN explanation mechanisms and backdoor insertion strategies, without reference to the specific seven metrics or their combination rule. We will expand §4 to include a description of the attack development timeline and its independence from the detector details, thereby strengthening the evaluation narrative. revision: yes
Circularity Check
No circularity; metrics defined from explanation outputs and evaluated on benchmarks
full rationale
The abstract states that seven metrics are developed by extracting and transforming secondary outputs from standard GNN explanation mechanisms, then tested on multiple benchmark datasets against various attack models. No equations, self-citations, or parameter-fitting steps are shown that would reduce the metrics or detection thresholds to quantities fitted on the reported test data. The derivation chain is empirical and self-contained against external benchmarks; no load-bearing step matches any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Barabási, A.L., Albert, R.: Emergence of scaling in random networks. science (1999) 3, 4
work page 1999
-
[2]
Borgwardt, K.M., Ong, C.S., Schönauer, S., Vishwanathan, S.V.N., Smola, A.J., Kriegel, H.P.: Protein function prediction via graph kernels. Bioinformatics21 (06 2005). https://doi.org/10.1093/bioinformatics/bti1007, https://doi.org/ 10.1093/bioinformatics/bti1007 11, 7
-
[3]
In: International Conference on Learning Representations (2022) 2
Chen, K., Meng, Y., Sun, X., Guo, S., Zhang, T., Li, J., Fan, C.: Badpre: Task- agnostic backdoor attacks to pre-trained nlp foundation models. In: International Conference on Learning Representations (2022) 2
work page 2022
-
[4]
Chen, X., Liu, C., Li, B., Lu, K., Song, D.: Targeted backdoor attacks on deep learning systems using data poisoning. arXiv (2017) 2
work page 2017
-
[5]
Hardware Trojan Attacks on Neural Networks
Clements, J., Lao, Y.: Hardware trojan attacks on neural networks. arXiv preprint arXiv:1806.05768 (2018) 2
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[6]
correlation with molecular orbital energies and hydrophobicity
Debnath, A.K., Lopez de Compadre, R.L., Debnath, G., Shusterman, A.J., Han- sch, C.: Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity. Journal of medicinal chemistry (1991) 11, 7
work page 1991
-
[7]
Gan, L., Li, J., Zhang, T., Li, X., Meng, Y., Wu, F., Yang, Y., Guo, S., Fan, C.: Triggerless backdoor attack for nlp tasks with clean labels. In: Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. pp. 2942–2952 (2022) 2
work page 2022
-
[8]
Gao, Y., Xu, C., Wang, D., Chen, S., Ranasinghe, D.C., Nepal, S.: Strip: A defence against trojan attacks on deep neural networks. In: ACSAC (2019) 2
work page 2019
-
[9]
IEEE Communications Magazine (2023) 2
Ge, Y., Wang, Q., Yu, J., Shen, C., Li, Q.: Data poisoning and backdoor attacks on audio intelligence systems. IEEE Communications Magazine (2023) 2
work page 2023
-
[10]
Gilbert, E.N.: Random graphs. Ann. Math. Stat. (1959) 3, 4
work page 1959
- [11]
-
[12]
arXiv preprint arXiv:2308.04406 (2023) 3
Guan, Z., Du, M., Liu, N.: Xgbd: Explanation-guided graph backdoor detection. arXiv preprint arXiv:2308.04406 (2023) 3
-
[13]
In: Proceedings of the 29th Annual International Conference on Mobile Computing and Networking
Guo, H., Chen, X., Guo, J., Xiao, L., Yan, Q.: Masterkey: Practical backdoor attack against speaker verification systems. In: Proceedings of the 29th Annual International Conference on Mobile Computing and Networking. pp. 1–15 (2023) 2
work page 2023
-
[14]
arXiv preprint arXiv:1908.01763 (2019) 2
Guo, W., Wang, L., Xing, X., Du, M., Song, D.: Tabor: A highly accurate ap- proach to inspecting and restoring trojan backdoors in ai systems. arXiv preprint arXiv:1908.01763 (2019) 2
-
[15]
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: NeurIPS (2017) 1
work page 2017
-
[16]
Hassen, M., Chan, P.K.: Scalable function call graph-based malware classification. In: CODASPY (2017) 2, 3
work page 2017
-
[17]
arXiv preprint arXiv:2209.02902 (2022) 6
Jiang, B., Li, Z.: Defending against backdoor attack on graph neural network by explainability. arXiv preprint arXiv:2209.02902 (2022) 6
-
[18]
In: ICLR (2017) 1, 11 16 J.Downer, R
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017) 1, 11 16 J.Downer, R. Wang, and B. Wang
work page 2017
-
[19]
Kokhlikyan, N., Miglani, V., Martin, M., Wang, E., Alsallakh, B., Reynolds, J., Melnikov, A., Kliushkina, N., Araya, C., Yan, S., Reblitz-Richardson, O.: Captum: A unified and generic model interpretability library for pytorch (2020) 1
work page 2020
-
[20]
Li, W., Yu, J., Ning, X., Wang, P., Wei, Q., Wang, Y., Yang, H.: Hu-fu: Hardware and software collaborative attack framework against neural networks. In: ISVLSI. IEEE (2018) 2
work page 2018
-
[21]
Liu, K., Dolan-Gavitt, B., Garg, S.: Fine-pruning: Defending against backdooring attacks on deep neural networks. In: RAID (2018) 2
work page 2018
-
[22]
Liu, Y., Lee, W.C., Tao, G., Ma, S., Aafer, Y., Zhang, X.: Abs: Scanning neural networks for back-doors by artificial brain stimulation. In: SIGSAC (2019) 2
work page 2019
-
[23]
Liu, Y., Ma, S., Aafer, Y., Lee, W.C., Zhai, J., Wang, W., Zhang, X.: Trojaning attack on neural networks. In: NDSS (2018) 2
work page 2018
-
[24]
In: 2017 IEEE International Con- ference on Computer Design (ICCD)
Liu, Y., Xie, Y., Srivastava, A.: Neural trojans. In: 2017 IEEE International Con- ference on Computer Design (ICCD). IEEE (2017) 2
work page 2017
-
[25]
Advances in neural information processing systems 33, 19620–19631 (2020) 5, 1
Luo, D., Cheng, W., Xu, D., Yu, W., Zong, B., Chen, H., Zhang, X.: Parameterized explainer for graph neural network. Advances in neural information processing systems 33, 19620–19631 (2020) 5, 1
work page 2020
-
[26]
arXiv preprint arXiv:2301.08751 (2023) 2
Pal, S., Wang, R., Yao, Y., Liu, S.: Towards understanding how self-training tol- erates data backdoor poisoning. arXiv preprint arXiv:2301.08751 (2023) 2
-
[27]
In: 31st USENIX Security Symposium (USENIX Security 22)
Pan, X., Zhang, M., Sheng, B., Zhu, J., Yang, M.: Hidden trigger backdoor attack on {NLP} models via linguistic style manipulation. In: 31st USENIX Security Symposium (USENIX Security 22). pp. 3611–3628 (2022) 2
work page 2022
-
[28]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Pope, P.E., Kolouri, S., Rostami, M., Martin, C.E., Hoffmann, H.: Explainabil- ity methods for graph convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10772– 10781 (2019) 6
work page 2019
-
[29]
Qi, F., Li, M., Chen, Y., Zhang, Z., Liu, Z., Wang, Y., Sun, M.: Hidden killer: Invisible textual backdoor attacks with syntactic trigger. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). pp. 443–453 (2021) 2
work page 2021
-
[30]
Riesen, K., Bunke, H.: Iam graph database repository for graph based pattern recognition and machine learning. In: Da Vitora Lobo, N. et al. (Eds.), SSPR/SPR 2008 pp. 287–297 (2008) 11, 7
work page 2008
-
[31]
Roy, N., Hassanieh, H., Roy Choudhury, R.: Backdoor: Making microphones hear inaudible sounds. In: Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. pp. 2–14 (2017) 2
work page 2017
-
[32]
Salem, A., Wen, R., Backes, M., Ma, S., Zhang, Y.: Dynamic backdoor attacks against machine learning models. In: EuroSP (2022) 2
work page 2022
-
[33]
In: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking
Shi, C., Zhang, T., Li, Z., Phan, H., Zhao, T., Wang, Y., Liu, J., Yuan, B., Chen, Y.: Audio-domain position-independent backdoor attack via unnoticeable triggers. In: Proceedings of the 28th Annual International Conference on Mobile Computing And Networking. pp. 583–595 (2022) 2
work page 2022
-
[34]
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: Arnetminer: extrac- tion and mining of academic social networks. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. p. 990–998. KDD ’08, Association for Computing Machinery, New York, NY, USA (2008). https://doi.org/10.1145/1401890.1402008 , https...
-
[35]
In: NeurIPS (2018) 2 Explanation-Based Identification of Backdoored Training Graphs 17
Tran, B., Li, J., Madry, A.: Spectral signatures in backdoor attacks. In: NeurIPS (2018) 2 Explanation-Based Identification of Backdoored Training Graphs 17
work page 2018
-
[36]
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. In: ICLR (2018) 11, 3
work page 2018
-
[37]
Wang, B., Cao, X., Jia, J., Gong, N.Z.: On certifying robustness against backdoor attacks via randomized smoothing. In: CVPR Workshop (2020) 3
work page 2020
-
[38]
Wang, B., Yao, Y., Shan, S., Li, H., Viswanath, B., Zheng, H., Zhao, B.Y.: Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In: IEEE S&P (2019) 2
work page 2019
-
[39]
Wang, R., Zhang, G., Liu, S., Chen, P.Y., Xiong, J., Wang, M.: Practical de- tection of trojan neural networks: Data-limited and data-free cases. In: Computer Vision–ECCV2020:16thEuropeanConference,Glasgow,UK,August23–28,2020, Proceedings, Part XXIII 16. pp. 222–238. Springer (2020) 2
work page 2020
-
[40]
Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’networks. nature (1998) 3, 4
work page 1998
-
[41]
In: 2023 IEEE Symposium on Security and Privacy (SP)
Weber, M., Xu, X., Karlaš, B., Zhang, C., Li, B.: Rab: Provable robustness against backdoor attacks. In: 2023 IEEE Symposium on Security and Privacy (SP). pp. 1311–1328. IEEE (2023) 3
work page 2023
-
[42]
In: 30th USENIX Security Symposium (USENIX Security 21)
Xi, Z., Pang, R., Ji, S., Wang, T.: Graph backdoor. In: 30th USENIX Security Symposium (USENIX Security 21). pp. 1523–1540 (2021) 3
work page 2021
-
[43]
Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? In: International Conference on Learning Representations (2019) 1, 8, 11, 3
work page 2019
-
[44]
Yanardag, P., Vishwanathan, S.: Deep graph kernels. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Min- ing. p. 1365–1374. KDD ’15, Association for Computing Machinery, New York, NY, USA (2015). https://doi.org/10.1145/2783258.2783417 , https://doi. org/10.1145/2783258.2783417 11, 7
-
[45]
Yao, Y., Li, H., Zheng, H., Zhao, B.Y.: Latent backdoor attacks on deep neural networks. In: CCS (2019) 2
work page 2019
-
[46]
In: Advances in Neural Information Processing Systems 32 (2019) 4
Ying, R., Bourgeois, D., You, J., Zitnik, M., Leskovec, J.: Gnnexplainer: Gener- ating explanations for graph neural networks. In: Advances in Neural Information Processing Systems 32 (2019) 4
work page 2019
-
[47]
Frontiers in Genetics12 (2021)
Zhang, X.M., Liang, L., Liu, L., Tang, M.J.: Graph neural networks and their cur- rent applications in bioinformatics. Frontiers in Genetics12 (2021). https://doi. org/10.3389/fgene.2021.690049 , https://www.frontiersin.org/articles/ 10.3389/fgene.2021.690049 1
-
[48]
In: Proceedings of the 26th ACM Symposium on Access Control Models and Technologies
Zhang, Z., Jia, J., Wang, B., Gong, N.Z.: Backdoor attacks to graph neural net- works. In: Proceedings of the 26th ACM Symposium on Access Control Models and Technologies. pp. 15–26 (2021) 1, 2, 3, 5, 7, 11
work page 2021
-
[49]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Zhao, S., Ma, X., Zheng, X., Bailey, J., Chen, J., Jiang, Y.G.: Clean-label backdoor attacks on video recognition models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 14443–14452 (2020) 2
work page 2020
-
[50]
scores” corresponding to each non-existing edge. As the generator trains, this “score
Zhou, J., Cui, G., Hu, S., Zhang, Z., Yang, C., Liu, Z., Wang, L., Li, C., Sun, M.: Graph neural networks: A review of methods and applications. AI Open1, 57– 81 (2020). https://doi.org/https://doi.org/10.1016/j.aiopen.2021.01.001, https://www.sciencedirect.com/science/article/pii/S2666651021000012 1 Explanation-Based Identification of Backdoored Training...
-
[51]
In the first stage (line 4), they train fs 0 on clean data to obtainfs, and in each subsequent stage (line 22) they retrain fs 0 from scratch, on a dataset attacked with the most recent iteration of the trigger generator. Edge generator fgen trains iteratively, in multiple rounds, over a subset of clean graphs D designated for attack (denoted as DB) (line...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.