A method for the systematic generation of graph XAI benchmarks via Weisfeiler-Leman coloring

Alessio Micheli; Domenico Tortorella; Marco Podda; Michele Fontanesi

arxiv: 2505.12437 · v2 · submitted 2025-05-18 · 💻 cs.LG · cs.AI

A method for the systematic generation of graph XAI benchmarks via Weisfeiler-Leman coloring

Michele Fontanesi , Alessio Micheli , Marco Podda , Domenico Tortorella This is my paper

Pith reviewed 2026-05-22 13:45 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords graph neural networksexplainable AIWeisfeiler-Lemanbenchmark generationsubgraph matchingmolecular classificationgraph classificationXAI evaluation

0 comments

The pith

Weisfeiler-Leman color refinement mines class-discriminating motifs from any graph classification dataset to serve as proxy ground-truth explanations for GNNs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a method that automates the creation of benchmarks for testing graph explainers. It applies the Weisfeiler-Leman algorithm to identify motifs that distinguish between classes in existing datasets and treats these motifs as reliable stand-ins for the explanations a GNN should produce. Because the motifs are chosen to match the distinguishing power of WL procedures, they remain within the capabilities of standard GNNs. The approach yields fifteen ready-to-use molecular datasets plus open code that can produce thousands more benchmarks from generic inputs.

Core claim

The central claim is that Weisfeiler-Leman color refinement performs efficient approximate subgraph matching to extract class-discriminating motifs, which then function as proxy ground-truth class explanations; these motifs are learnable by GNNs precisely because their discriminating power is aligned with the expressiveness limit of the WL algorithm.

What carries the argument

The Weisfeiler-Leman color refinement algorithm applied to approximate subgraph matching for extraction of class-discriminating motifs.

If this is right

Any existing graph classification dataset can be turned into a ready XAI benchmark without requiring domain-expert curation of explanations.
The resulting collection supplies a large, reproducible testbed for measuring how well graph explainers recover the motifs that actually drive predictions.
Because the motifs respect WL expressiveness, the benchmarks remain appropriate for evaluating standard message-passing GNNs rather than more powerful architectures.
The public release of fifteen molecular datasets and code for over two thousand additional ones removes the previous scarcity of systematic evaluation resources.
Experimental comparisons of explainers gain statistical power when performed across many independently generated benchmarks instead of a handful of hand-crafted tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same motif-mining procedure could be adapted to regression or link-prediction tasks by replacing class labels with continuous targets or edge properties.
Benchmarks produced this way may expose weaknesses in current explainers that remain hidden on the small synthetic sets used today.
Developers of new GNN architectures could use the WL alignment test as a quick filter to decide whether a proposed model can even represent the explanations in these benchmarks.
Extending the method to heterogeneous or temporal graphs would require only a suitable generalization of the color-refinement step.

Load-bearing premise

The motifs identified by Weisfeiler-Leman color refinement accurately reflect the actual decision rules that GNNs learn when solving the same classification task.

What would settle it

Train a GNN to high accuracy on one of the generated benchmarks and then show that the highest-ranked explanations returned by multiple popular graph explainers consistently fail to recover the WL-mined motif.

read the original abstract

Graph neural networks have become the de facto model for learning from structured data. However, the decision-making process of GNNs remains opaque to the end user, which undermines their use in safety-critical applications. Several explainable AI techniques for graphs have been developed to address this major issue. Focusing on graph classification, these explainers identify subgraph motifs that explain predictions. Therefore, a robust benchmarking of graph explainers is required to ensure that the produced explanations are of high quality, i.e., aligned with the GNN's decision process. However, current graph-XAI benchmarks are limited to simplistic synthetic datasets or a few real-world tasks curated by domain experts, hindering rigorous and reproducible evaluation, and consequently stalling progress in the field. To overcome these limitations, we propose a method to automate the construction of graph XAI benchmarks from generic graph classification datasets. Our approach leverages the Weisfeiler-Leman color refinement algorithm to efficiently perform approximate subgraph matching and mine class-discriminating motifs, which serve as proxy ground-truth class explanations. At the same time, we ensure that these motifs can be learned by GNNs because their discriminating power aligns with WL expressiveness. This work also introduces the OpenGraphXAI benchmark suite, which consists of 15 ready-made graph-XAI datasets derived by applying our method to real-world molecular classification datasets. The suite is available to the public along with a codebase to generate over 2,000 additional graph-XAI benchmarks. Finally, we present a use case that illustrates how the suite can be used to assess the effectiveness of a selection of popular graph explainers, demonstrating the critical role of a sufficiently large benchmark collection for improving the significance of experimental results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a practical WL-based method to turn ordinary graph classification datasets into XAI benchmarks with proxy motifs, but the claim that these motifs match what trained GNNs actually use rests on an untested assumption.

read the letter

The main thing to know is that the authors describe a method to mine class-discriminating subgraphs from any graph classification dataset using Weisfeiler-Leman color refinement for approximate matching, then package those subgraphs as proxy ground-truth explanations. They apply it to molecular datasets to produce the OpenGraphXAI suite of 15 benchmarks plus code that can generate over 2000 more, and they include a use case running several standard explainers on the collection.

Referee Report

2 major / 2 minor

Summary. The paper proposes a method to systematically generate graph XAI benchmarks from generic graph classification datasets by leveraging the Weisfeiler-Leman color refinement algorithm for approximate subgraph matching and mining class-discriminating motifs. These motifs are positioned as proxy ground-truth explanations for GNN predictions, with the claim that their alignment to WL expressiveness ensures they are learnable by GNNs. The method is used to construct the OpenGraphXAI suite of 15 datasets derived from real-world molecular classification tasks, accompanied by public code for generating over 2000 additional benchmarks, and illustrated via a use case evaluating several popular graph explainers.

Significance. If the proxy motifs reliably reflect GNN decision processes, the work would provide a scalable, automated alternative to simplistic synthetic or expert-curated benchmarks, enabling more rigorous, reproducible, and statistically powered evaluations of graph XAI methods. The public release of the benchmark suite and generation codebase is a clear strength for community adoption and reproducibility.

major comments (2)

[Method and Use Case sections] The central claim that WL-derived motifs serve as valid proxy ground-truth explanations because 'their discriminating power aligns with WL expressiveness' (abstract and method overview) does not address whether a trained GNN (e.g., GIN or GCN) on the resulting dataset actually bases its predictions on the mined motifs rather than other label-correlated substructures that are also WL-expressible. This assumption is load-bearing for the proxy validity but receives only illustrative treatment in the use case.
[Use Case / Experiments] No quantitative verification is described (e.g., via motif ablation, feature importance alignment, or comparison against GNN attention/gradients) to confirm that the WL-mined motifs are the features actually used by the GNN rather than co-occurring alternatives. This weakens the claim that the benchmarks test explainer fidelity to the model's decision process.

minor comments (2)

[Method] Clarify the exact WL iteration depth and subgraph matching threshold used in the mining procedure, as these choices directly affect motif size and fidelity.
[Benchmark Suite] The abstract states the suite contains 15 datasets; provide a table listing the source molecular datasets, number of graphs, and motif statistics for each.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, providing clarifications on the proxy validity of the WL-derived motifs and indicating where revisions have been made to strengthen the presentation.

read point-by-point responses

Referee: [Method and Use Case sections] The central claim that WL-derived motifs serve as valid proxy ground-truth explanations because 'their discriminating power aligns with WL expressiveness' (abstract and method overview) does not address whether a trained GNN (e.g., GIN or GCN) on the resulting dataset actually bases its predictions on the mined motifs rather than other label-correlated substructures that are also WL-expressible. This assumption is load-bearing for the proxy validity but receives only illustrative treatment in the use case.

Authors: We agree that the proxy validity rests on an assumption about GNN behavior. The method identifies class-discriminating motifs through WL color refinement, which guarantees both their discriminative power and their expressibility within the WL framework that bounds standard GNNs such as GIN. While a trained GNN could theoretically rely on other co-occurring WL-expressible features, the mined motifs are the systematically extracted primary patterns that distinguish classes in the original data. In the revised manuscript we have added an explicit discussion paragraph in the Method section clarifying this assumption, noting that the benchmarks evaluate explainer fidelity relative to these salient WL-discriminating motifs under the standard premise that high-accuracy GNNs learn such features. We have also expanded the use-case discussion to better connect the mined motifs to the observed explainer outputs. revision: yes
Referee: [Use Case / Experiments] No quantitative verification is described (e.g., via motif ablation, feature importance alignment, or comparison against GNN attention/gradients) to confirm that the WL-mined motifs are the features actually used by the GNN rather than co-occurring alternatives. This weakens the claim that the benchmarks test explainer fidelity to the model's decision process.

Authors: We acknowledge that the original use case was primarily illustrative. To address the request for quantitative verification, the revised manuscript now includes a new subsection in the Experiments section reporting motif-ablation results on three representative datasets from the suite. Removing the mined motifs leads to measurable drops in GNN accuracy, while ablation of randomly selected WL-expressible substructures does not, supporting that the motifs are among the features driving predictions. We have also added a comparison of explainer rankings against gradient-based feature importance scores on the same datasets. These additions provide the requested quantitative grounding without altering the core method. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the WL-based benchmark construction

full rationale

The paper presents an explicit construction method that applies the standard Weisfeiler-Leman color refinement algorithm to existing graph classification datasets in order to mine class-discriminating motifs. These motifs are positioned as proxy ground-truth explanations precisely because their discriminating power is defined by WL, and WL-expressiveness is an independently established property of certain GNN architectures. No step reduces a claimed prediction or uniqueness result to a fitted parameter, a self-referential definition, or a load-bearing self-citation chain; the derivation remains self-contained against external graph-theoretic benchmarks and does not rename or smuggle in prior author-specific ansatzes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on standard properties of the Weisfeiler-Leman algorithm and the introduction of WL-derived motifs as proxy explanations without external validation of their fidelity to GNN decision processes.

axioms (1)

domain assumption Weisfeiler-Leman color refinement performs efficient approximate subgraph matching for class-discriminating motifs
Invoked to mine proxy ground-truth explanations from graph classification datasets.

invented entities (1)

WL-derived class-discriminating motifs as proxy ground-truth explanations no independent evidence
purpose: Serve as aligned, learnable explanations for evaluating graph explainers
Postulated as faithful proxies because of alignment with WL expressiveness; no independent evidence provided in abstract.

pith-pipeline@v0.9.0 · 5851 in / 1378 out tokens · 61644 ms · 2026-05-22T13:45:50.093388+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean bare_distinguishability_of_absolute_floor echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Our approach leverages the Weisfeiler-Leman color refinement algorithm to efficiently perform approximate subgraph matching and mine class-discriminating motifs, which serve as proxy ground-truth class explanations. At the same time, we ensure that these motifs can be learned by GNNs because their discriminating power aligns with WL expressiveness.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the discriminating motifs are extracted from the set C = union over G in D of C(<=L)_G ... the explanation motif M_G is a subgraph whose nodes are those of the union graph comprising all rooted subgraphs in G corresponding to c-bar-colored nodes

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · 3 internal anchors

[1]

Agarwal, C., Queen, O., Lakkaraju, H., Zitnik, M. (2023). Evaluating explainability for graph neural networks.Scientific Data,10(1), 144,

work page 2023
[2]

De Albuquerque, V.H.C

Rahouti, M., . . . De Albuquerque, V.H.C. (2024). Explainable artificial intel- ligence for drug discovery and development: A comprehensive survey.IEEE Access,12, 35796-35812, https://doi.org/10.1109/ACCESS.2024.3373195

work page doi:10.1109/access.2024.3373195 2024
[3]

Zhang, C

Amara, K., Ying, Z., Zhang, Z., Han, Z., Zhao, Y., Shan, Y., . . . Zhang, C. (2022, 09–12 Dec). GraphFramEx: Towards systematic evaluation of explainability methods for graph neural networks. B. Rieck & R. Pascanu (Eds.),Proceedings of the first learning on graphs conference(Vol. 198, pp. 44:1–44:23). PMLR

work page 2022
[4]

Bacciu, D., Errica, F., Gravina, A., Madeddu, L., Podda, M., Stilo, G. (2023). Deep graph networks for drug repurposing with multi-protein targets.IEEE Trans- actions on Emerging Topics in Computing, 1–14, https://doi.org/10.1109/ TETC.2023.3238963

work page arXiv 2023
[5]

Bacciu, D., Errica, F., Micheli, A., Podda, M. (2020). A gentle introduction to deep learning for graphs.Neural Networks,129, 203-221, https://doi.org/10.1016/ j.neunet.2020.06.006

work page 2020
[6]

Explainability Techniques for Graph Convolutional Networks

Baldassarre, F., & Azizpour, H. (2019). Explainability techniques for graph convolutional networks.Icml 2019 workshop “learning and reasoning with graph- structured representations”.Retrieved from https://arxiv.org/abs/1905.13686 Barcel´ o, P., Kostylev, E.V., Monet, M., P´ erez, J., Reutter, J., Silva, J.P. (2020). The Logical Expressiveness of Graph Neu...

work page internal anchor Pith review Pith/arXiv arXiv 2019
[7]

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L. (2009). Imagenet: A large- scale hierarchical image database.2009 ieee conference on computer vision and pattern recognition(p. 248-255). D’Inverno, G.A., Bianchini, M., Sampoli, M.L., Scarselli, F. (2024). On the approxima- tion capability of GNNs in node classification/regression tasks.Soft ...

work page doi:10.1007/s00500-024-09676-1 2009
[8]

Dipalma, A., Fontanesi, M., Micheli, A., Milazzo, P., Podda, M. (2025). Sensitivity analysis on protein-protein interaction networks through deep graph networks. BMC Bioinformatics,26(124), , https://doi.org/10.1186/s12859-025-06140-1

work page doi:10.1186/s12859-025-06140-1 2025
[9]

Moghaddam, A., Wattenhofer, R

Faber, L., K. Moghaddam, A., Wattenhofer, R. (2021). When comparing to ground truth is wrong: On evaluating gnn explanation methods.Proceedings of the 27th acm sigkdd conference on knowledge discovery & data mining(pp. 332–341)

work page 2021
[10]

Fey, M., & Lenssen, J.E. (2019). Fast graph representation learning with PyTorch Geometric.Iclr workshop on representation learning on graphs and manifolds. Retrieved from https://arxiv.org/abs/1903.02428

work page internal anchor Pith review Pith/arXiv arXiv 2019
[11]

Fontanesi, M., Micheli, A., Podda, M. (2024b). XAI and bias of deep graph networks. Proceedings of the 32nd european symposium on artificial neural networks, com- putational intelligence and machine learning intelligence (esann 2024)(pp. 41–46). Ciaco - i6doc.com

work page 2024
[12]

(2017, 06– 11 Aug)

Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E. (2017, 06– 11 Aug). Neural message passing for quantum chemistry. D. Precup & Y.W. Teh (Eds.),Proceedings of the 34th international conference on machine learning(Vol. 70, pp. 1263–1272). PMLR. Retrieved from https://proceedings.mlr.press/v70/gilmer17a.html

work page 2017
[13]

European Union regulations on algorithmic decision-making and a "right to explanation"

Goodman, B., & Flaxman, S. (2016). EU regulations on algorithmic decision-making and a “right to explanation”.Icml workshop on human interpretability in machine learning (whi 2016).New York, NY. Retrieved from http://arxiv. org/abs/1606.08813

work page internal anchor Pith review Pith/arXiv arXiv 2016
[14]

Graziani, C., Drucks, T., Jogl, F., Bianchini, M., Scarselli, F., G¨ artner, T. (2024). The Expressive Power of Path-Based Graph Neural Networks. R. Salakhutdinov et al. (Eds.),Proceedings of the 41st international con- ference on machine learning(Vol. 235, pp. 16226–16249). Retrieved from https://proceedings.mlr.press/v235/graziani24a.html

work page 2024
[15]

A survey on explain- ability of graph neural networks

Kakkad, J., Jannu, J., Sharma, K., Aggarwal, C., Medya, S. (2023).A survey on explainability of graph neural networks.Retrieved from https://arxiv.org/abs/2306.01958 24

work page arXiv 2023
[16]

Kazius, J., McGuire, R., Bursi, R. (2005). Derivation and validation of toxicophores for mutagenicity prediction.Journal of Medicinal Chemistry,48(1), 312-320, https://doi.org/10.1021/jm040835a

work page doi:10.1021/jm040835a 2005
[17]

Kiefer, S., & Mckay, B.D. (2020). The Iteration Number of Colour Refinement. Proceedings of the 47th international colloquium on automata, languages, and programming(pp. 1–19)

work page 2020
[18]

Kingma, D.P., & Ba, J. (2015). Adam: A method for stochastic optimization. Proceedings of the 3rd international conference on learning representations

work page 2015
[19]

Kriege, N.M. (2022). Weisfeiler and leman go walking: Random walk ker- nels revisited. A.H. Oh, A. Agarwal, D. Belgrave, & K. Cho (Eds.), Advances in neural information processing systems.Retrieved from https://openreview.net/forum?id=Inj9ed0mzQb

work page 2022
[20]

Longa, A., Azzolin, S., Santin, G., Cencetti, G., Li` o, P., Lepri, B., Passerini, A. (2025). Explaining the explainers in graph neural networks: a comparative study.ACM Computing Surveys,57(5), 1–37, https://doi.org/10.1145/3696444

work page doi:10.1145/3696444 2025
[21]

Luo, D., Cheng, W., Xu, D., Yu, W., Zong, B., Chen, H., Zhang, X. (2020). Parameter- ized explainer for graph neural network. H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, & H. Lin (Eds.),Advances in neural information processing systems (Vol. 33, pp. 19620–19631). Curran Associates, Inc

work page 2020
[22]

Micheli, A. (2009). Neural network for graphs: A contextual constructive approach. IEEE Transactions on Neural Networks,20(3), 498-511, https://doi.org/10 .1109/TNN.2008.2010350

work page arXiv 2009
[23]

Morris, C., Kriege, N.M., Bause, F., Kersting, K., Mutzel, P., Neumann, M. (2020). TUDataset: A collection of benchmark datasets for learning with graphs. Icml 2020 workshop on graph representation learning and beyond (grl+ 2020). Retrieved from www.graphlearning.io

work page 2020
[24]

Borgwardt, K

Morris, C., Lipman, Y., Maron, H., Rieck, B., Kriege, N.M., Grohe, M., . . . Borgwardt, K. (2023). Weisfeiler and Leman go Machine Learning: The Story so far.Journal of Machine Learning Research,24(333), 1–59,

work page 2023
[25]

Morris, C., Ritzert, M., Fey, M., Hamilton, W.L., Lenssen, J.E., Rattan, G., Grohe, M. (2019). Weisfeiler and Leman Go Neural: Higher-Order Graph Neural Networks. Proceedings of the aaai conference on artificial intelligence(Vol. 33, pp. 4602– 4609). 25

work page 2019
[26]

others (2022)

Oneto, L., Navarin, N., Biggio, B., Errica, F., Micheli, A., Scarselli, F., . . . others (2022). Towards learning trustworthily, automatically, and with guarantees on graphs: An overview.Neurocomputing,493, 217–243,

work page 2022
[27]

Pope, P.E., Kolouri, S., Rostami, M., Martin, C.E., Hoffmann, H. (2019). Explainabil- ity methods for graph convolutional neural networks.2019 ieee/cvf conference on computer vision and pattern recognition (cvpr)(p. 10764-10773)

work page 2019
[28]

Bagel: A benchmark for assessing graph neural network explanations

Rathee, M., Funke, T., Anand, A., Khosla, M. (2022).BAGEL: A bench- mark for assessing graph neural network explanations.Retrieved from https://arxiv.org/abs/2206.13983

work page arXiv 2022
[29]

Riesen, K., & Bunke, H. (2008). IAM graph database repository for graph based pattern recognition and machine learning. N. da Vitoria Lobo et al. (Eds.), Structural, syntactic, and statistical pattern recognition(pp. 287–297). Berlin, Heidelberg: Springer Berlin Heidelberg

work page 2008
[30]

Wiltschko, A

Sanchez-Lengeling, B., Wei, J., Lee, B., Reif, E., Wang, P., Qian, W., . . . Wiltschko, A. (2020). Evaluating attribution for graph neural networks. H. Larochelle, M. Ran- zato, R. Hadsell, M. Balcan, & H. Lin (Eds.),Advances in neural information processing systems(Vol. 33, pp. 5898–5910). Curran Associates, Inc

work page 2020
[31]

Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G. (2009a). Com- putational capabilities of graph neural networks.IEEE Transactions on Neural Networks,20(1), 81–102, https://doi.org/10.1109/TNN.2008.2005141

work page doi:10.1109/tnn.2008.2005141 2008
[32]

Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G. (2009b). The graph neural network model.IEEE Transactions on Neural Networks,20(1), 61-80, https://doi.org/10.1109/TNN.2008.2005605

work page doi:10.1109/tnn.2008.2005605 2008
[33]

Simonyan, K., Vedaldi, A., Zisserman, A. (2014). Deep inside convolutional net- works: Visualising image classification models and saliency maps.Workshop at the 2nd international conference on learning representations.Retrieved from https://openreview.net/forum?id=cO4ycnpqxKcS9

work page 2014
[34]

Sterling, T., & Irwin, J.J. (2015). Zinc 15 – ligand discovery for everyone.Journal of Chemical Information and Modeling,55(11), 2324-2337, https://doi.org/ 10.1021/acs.jcim.5b00559

work page doi:10.1021/acs.jcim.5b00559 2015
[35]

(2017, 06–11 Aug)

Sundararajan, M., Taly, A., Yan, Q. (2017, 06–11 Aug). Axiomatic attribution for deep networks. D. Precup & Y.W. Teh (Eds.),Proceedings of the 34th international conference on machine learning(Vol. 70, pp. 3319–3328). PMLR. 26

work page 2017
[36]

Varbella, A., Amara, K., Gjorgiev, B., El-Assady, M., Sansavini, G. (2024). Power- graph: A power grid benchmark dataset for graph neural networks.Advances in Neural Information Processing Systems,37, 110784–110804,

work page 2024
[37]

(2018, Novem- ber)

Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S. (2018, Novem- ber). GLUE: A multi-task benchmark and analysis platform for natural language understanding. T. Linzen, G. Chrupa la, & A. Alishahi (Eds.),Proceedings of the 2018 EMNLP workshop BlackboxNLP: Analyzing and interpreting neural net- works for NLP(pp. 353–355). Brussels, Belgium: A...

work page 2018
[38]

(2019, July)

Wang, X., He, X., Wang, M., Feng, F., Chua, T.-S. (2019, July). Neural graph collaborative filtering.Proceedings of the 42nd international acm sigir confer- ence on research and development in information retrieval(p. 165–174). ACM. Retrieved from http://dx.doi.org/10.1145/3331184.3331267

work page doi:10.1145/3331184.3331267 2019
[39]

Weisfeiler, B.Y., & Leman, A.A. (1968). A reduction of a graph to a canonical form and an algebra arising during this reduction.Nauchno-Technicheskaya Informatsia, 2(9), 12–16,

work page 1968
[40]

Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S. (2021). A Comprehensive Survey on Graph Neural Networks.IEEE Transactions on Neural Networks and Learning Systems,32(1), 4–24, https://doi.org/10.1109/TNNLS.2020.2978386

work page doi:10.1109/tnnls.2020.2978386 2021
[41]

Xu, K., Hu, W., Leskovec, J., Jegelka, S. (2019). How powerful are graph neu- ral networks?Proceedings of the 7th international conference on learning representations

work page 2019
[42]

Ying, Z., Bourgeois, D., You, J., Zitnik, M., Leskovec, J. (2019). GNNExplainer: Generating explanations for graph neural networks. H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch´ e-Buc, E. Fox, & R. Garnett (Eds.),Advances in neural information processing systems(Vol. 32). Curran Associates, Inc

work page 2019
[43]

(2023, may)

Yuan, H., Yu, H., Gui, S., Ji, S. (2023, may). Explainability in graph neural networks: A taxonomic survey.IEEE Transactions on Pattern Analysis and Machine Intelligence,45(05), 5782-5799, https://doi.org/10.1109/TPAMI.2022 .3204236

work page doi:10.1109/tpami.2022 2023
[44]

Zhang, B., Fan, C., Liu, S., Huang, K., Zhao, X., Huang, J., Liu, Z. (2025). The Expressive Power of Graph Neural Networks: A Survey.IEEE Transactions on Knowledge and Data Engineering,37(3), 1455–1474, https://doi.org/10.1109/ TKDE.2024.3523700 27

work page arXiv 2025
[45]

Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A. (2016). Learning deep features for discriminative localization.Proceedings of the ieee conference on computer vision and pattern recognition(pp. 2921–2929)

work page 2016
[46]

Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y. (2019). Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch´ e-Buc, E. Fox, & R. Garnett (Eds.),Advances in neural information processing systems(Vol. 32). Curran Associates, Inc

work page 2019
[47]

Zhu, W., Wen, T., Song, G., Wang, L., Zheng, B. (2023). On Structural Expressive Power of Graph Transformers.Proceedings of the 29th acm sigkdd conference on knowledge discovery and data mining(pp. 3628–3637). New York, NY, USA: ACM. 28

work page 2023

[1] [1]

Agarwal, C., Queen, O., Lakkaraju, H., Zitnik, M. (2023). Evaluating explainability for graph neural networks.Scientific Data,10(1), 144,

work page 2023

[2] [2]

De Albuquerque, V.H.C

Rahouti, M., . . . De Albuquerque, V.H.C. (2024). Explainable artificial intel- ligence for drug discovery and development: A comprehensive survey.IEEE Access,12, 35796-35812, https://doi.org/10.1109/ACCESS.2024.3373195

work page doi:10.1109/access.2024.3373195 2024

[3] [3]

Zhang, C

Amara, K., Ying, Z., Zhang, Z., Han, Z., Zhao, Y., Shan, Y., . . . Zhang, C. (2022, 09–12 Dec). GraphFramEx: Towards systematic evaluation of explainability methods for graph neural networks. B. Rieck & R. Pascanu (Eds.),Proceedings of the first learning on graphs conference(Vol. 198, pp. 44:1–44:23). PMLR

work page 2022

[4] [4]

Bacciu, D., Errica, F., Gravina, A., Madeddu, L., Podda, M., Stilo, G. (2023). Deep graph networks for drug repurposing with multi-protein targets.IEEE Trans- actions on Emerging Topics in Computing, 1–14, https://doi.org/10.1109/ TETC.2023.3238963

work page arXiv 2023

[5] [5]

Bacciu, D., Errica, F., Micheli, A., Podda, M. (2020). A gentle introduction to deep learning for graphs.Neural Networks,129, 203-221, https://doi.org/10.1016/ j.neunet.2020.06.006

work page 2020

[6] [6]

Explainability Techniques for Graph Convolutional Networks

Baldassarre, F., & Azizpour, H. (2019). Explainability techniques for graph convolutional networks.Icml 2019 workshop “learning and reasoning with graph- structured representations”.Retrieved from https://arxiv.org/abs/1905.13686 Barcel´ o, P., Kostylev, E.V., Monet, M., P´ erez, J., Reutter, J., Silva, J.P. (2020). The Logical Expressiveness of Graph Neu...

work page internal anchor Pith review Pith/arXiv arXiv 2019

[7] [7]

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L. (2009). Imagenet: A large- scale hierarchical image database.2009 ieee conference on computer vision and pattern recognition(p. 248-255). D’Inverno, G.A., Bianchini, M., Sampoli, M.L., Scarselli, F. (2024). On the approxima- tion capability of GNNs in node classification/regression tasks.Soft ...

work page doi:10.1007/s00500-024-09676-1 2009

[8] [8]

Dipalma, A., Fontanesi, M., Micheli, A., Milazzo, P., Podda, M. (2025). Sensitivity analysis on protein-protein interaction networks through deep graph networks. BMC Bioinformatics,26(124), , https://doi.org/10.1186/s12859-025-06140-1

work page doi:10.1186/s12859-025-06140-1 2025

[9] [9]

Moghaddam, A., Wattenhofer, R

Faber, L., K. Moghaddam, A., Wattenhofer, R. (2021). When comparing to ground truth is wrong: On evaluating gnn explanation methods.Proceedings of the 27th acm sigkdd conference on knowledge discovery & data mining(pp. 332–341)

work page 2021

[10] [10]

Fey, M., & Lenssen, J.E. (2019). Fast graph representation learning with PyTorch Geometric.Iclr workshop on representation learning on graphs and manifolds. Retrieved from https://arxiv.org/abs/1903.02428

work page internal anchor Pith review Pith/arXiv arXiv 2019

[11] [11]

Fontanesi, M., Micheli, A., Podda, M. (2024b). XAI and bias of deep graph networks. Proceedings of the 32nd european symposium on artificial neural networks, com- putational intelligence and machine learning intelligence (esann 2024)(pp. 41–46). Ciaco - i6doc.com

work page 2024

[12] [12]

(2017, 06– 11 Aug)

Gilmer, J., Schoenholz, S.S., Riley, P.F., Vinyals, O., Dahl, G.E. (2017, 06– 11 Aug). Neural message passing for quantum chemistry. D. Precup & Y.W. Teh (Eds.),Proceedings of the 34th international conference on machine learning(Vol. 70, pp. 1263–1272). PMLR. Retrieved from https://proceedings.mlr.press/v70/gilmer17a.html

work page 2017

[13] [13]

European Union regulations on algorithmic decision-making and a "right to explanation"

Goodman, B., & Flaxman, S. (2016). EU regulations on algorithmic decision-making and a “right to explanation”.Icml workshop on human interpretability in machine learning (whi 2016).New York, NY. Retrieved from http://arxiv. org/abs/1606.08813

work page internal anchor Pith review Pith/arXiv arXiv 2016

[14] [14]

Graziani, C., Drucks, T., Jogl, F., Bianchini, M., Scarselli, F., G¨ artner, T. (2024). The Expressive Power of Path-Based Graph Neural Networks. R. Salakhutdinov et al. (Eds.),Proceedings of the 41st international con- ference on machine learning(Vol. 235, pp. 16226–16249). Retrieved from https://proceedings.mlr.press/v235/graziani24a.html

work page 2024

[15] [15]

A survey on explain- ability of graph neural networks

Kakkad, J., Jannu, J., Sharma, K., Aggarwal, C., Medya, S. (2023).A survey on explainability of graph neural networks.Retrieved from https://arxiv.org/abs/2306.01958 24

work page arXiv 2023

[16] [16]

Kazius, J., McGuire, R., Bursi, R. (2005). Derivation and validation of toxicophores for mutagenicity prediction.Journal of Medicinal Chemistry,48(1), 312-320, https://doi.org/10.1021/jm040835a

work page doi:10.1021/jm040835a 2005

[17] [17]

Kiefer, S., & Mckay, B.D. (2020). The Iteration Number of Colour Refinement. Proceedings of the 47th international colloquium on automata, languages, and programming(pp. 1–19)

work page 2020

[18] [18]

Kingma, D.P., & Ba, J. (2015). Adam: A method for stochastic optimization. Proceedings of the 3rd international conference on learning representations

work page 2015

[19] [19]

Kriege, N.M. (2022). Weisfeiler and leman go walking: Random walk ker- nels revisited. A.H. Oh, A. Agarwal, D. Belgrave, & K. Cho (Eds.), Advances in neural information processing systems.Retrieved from https://openreview.net/forum?id=Inj9ed0mzQb

work page 2022

[20] [20]

Longa, A., Azzolin, S., Santin, G., Cencetti, G., Li` o, P., Lepri, B., Passerini, A. (2025). Explaining the explainers in graph neural networks: a comparative study.ACM Computing Surveys,57(5), 1–37, https://doi.org/10.1145/3696444

work page doi:10.1145/3696444 2025

[21] [21]

Luo, D., Cheng, W., Xu, D., Yu, W., Zong, B., Chen, H., Zhang, X. (2020). Parameter- ized explainer for graph neural network. H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, & H. Lin (Eds.),Advances in neural information processing systems (Vol. 33, pp. 19620–19631). Curran Associates, Inc

work page 2020

[22] [22]

Micheli, A. (2009). Neural network for graphs: A contextual constructive approach. IEEE Transactions on Neural Networks,20(3), 498-511, https://doi.org/10 .1109/TNN.2008.2010350

work page arXiv 2009

[23] [23]

Morris, C., Kriege, N.M., Bause, F., Kersting, K., Mutzel, P., Neumann, M. (2020). TUDataset: A collection of benchmark datasets for learning with graphs. Icml 2020 workshop on graph representation learning and beyond (grl+ 2020). Retrieved from www.graphlearning.io

work page 2020

[24] [24]

Borgwardt, K

Morris, C., Lipman, Y., Maron, H., Rieck, B., Kriege, N.M., Grohe, M., . . . Borgwardt, K. (2023). Weisfeiler and Leman go Machine Learning: The Story so far.Journal of Machine Learning Research,24(333), 1–59,

work page 2023

[25] [25]

Morris, C., Ritzert, M., Fey, M., Hamilton, W.L., Lenssen, J.E., Rattan, G., Grohe, M. (2019). Weisfeiler and Leman Go Neural: Higher-Order Graph Neural Networks. Proceedings of the aaai conference on artificial intelligence(Vol. 33, pp. 4602– 4609). 25

work page 2019

[26] [26]

others (2022)

Oneto, L., Navarin, N., Biggio, B., Errica, F., Micheli, A., Scarselli, F., . . . others (2022). Towards learning trustworthily, automatically, and with guarantees on graphs: An overview.Neurocomputing,493, 217–243,

work page 2022

[27] [27]

Pope, P.E., Kolouri, S., Rostami, M., Martin, C.E., Hoffmann, H. (2019). Explainabil- ity methods for graph convolutional neural networks.2019 ieee/cvf conference on computer vision and pattern recognition (cvpr)(p. 10764-10773)

work page 2019

[28] [28]

Bagel: A benchmark for assessing graph neural network explanations

Rathee, M., Funke, T., Anand, A., Khosla, M. (2022).BAGEL: A bench- mark for assessing graph neural network explanations.Retrieved from https://arxiv.org/abs/2206.13983

work page arXiv 2022

[29] [29]

Riesen, K., & Bunke, H. (2008). IAM graph database repository for graph based pattern recognition and machine learning. N. da Vitoria Lobo et al. (Eds.), Structural, syntactic, and statistical pattern recognition(pp. 287–297). Berlin, Heidelberg: Springer Berlin Heidelberg

work page 2008

[30] [30]

Wiltschko, A

Sanchez-Lengeling, B., Wei, J., Lee, B., Reif, E., Wang, P., Qian, W., . . . Wiltschko, A. (2020). Evaluating attribution for graph neural networks. H. Larochelle, M. Ran- zato, R. Hadsell, M. Balcan, & H. Lin (Eds.),Advances in neural information processing systems(Vol. 33, pp. 5898–5910). Curran Associates, Inc

work page 2020

[31] [31]

Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G. (2009a). Com- putational capabilities of graph neural networks.IEEE Transactions on Neural Networks,20(1), 81–102, https://doi.org/10.1109/TNN.2008.2005141

work page doi:10.1109/tnn.2008.2005141 2008

[32] [32]

Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G. (2009b). The graph neural network model.IEEE Transactions on Neural Networks,20(1), 61-80, https://doi.org/10.1109/TNN.2008.2005605

work page doi:10.1109/tnn.2008.2005605 2008

[33] [33]

Simonyan, K., Vedaldi, A., Zisserman, A. (2014). Deep inside convolutional net- works: Visualising image classification models and saliency maps.Workshop at the 2nd international conference on learning representations.Retrieved from https://openreview.net/forum?id=cO4ycnpqxKcS9

work page 2014

[34] [34]

Sterling, T., & Irwin, J.J. (2015). Zinc 15 – ligand discovery for everyone.Journal of Chemical Information and Modeling,55(11), 2324-2337, https://doi.org/ 10.1021/acs.jcim.5b00559

work page doi:10.1021/acs.jcim.5b00559 2015

[35] [35]

(2017, 06–11 Aug)

Sundararajan, M., Taly, A., Yan, Q. (2017, 06–11 Aug). Axiomatic attribution for deep networks. D. Precup & Y.W. Teh (Eds.),Proceedings of the 34th international conference on machine learning(Vol. 70, pp. 3319–3328). PMLR. 26

work page 2017

[36] [36]

Varbella, A., Amara, K., Gjorgiev, B., El-Assady, M., Sansavini, G. (2024). Power- graph: A power grid benchmark dataset for graph neural networks.Advances in Neural Information Processing Systems,37, 110784–110804,

work page 2024

[37] [37]

(2018, Novem- ber)

Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S. (2018, Novem- ber). GLUE: A multi-task benchmark and analysis platform for natural language understanding. T. Linzen, G. Chrupa la, & A. Alishahi (Eds.),Proceedings of the 2018 EMNLP workshop BlackboxNLP: Analyzing and interpreting neural net- works for NLP(pp. 353–355). Brussels, Belgium: A...

work page 2018

[38] [38]

(2019, July)

Wang, X., He, X., Wang, M., Feng, F., Chua, T.-S. (2019, July). Neural graph collaborative filtering.Proceedings of the 42nd international acm sigir confer- ence on research and development in information retrieval(p. 165–174). ACM. Retrieved from http://dx.doi.org/10.1145/3331184.3331267

work page doi:10.1145/3331184.3331267 2019

[39] [39]

Weisfeiler, B.Y., & Leman, A.A. (1968). A reduction of a graph to a canonical form and an algebra arising during this reduction.Nauchno-Technicheskaya Informatsia, 2(9), 12–16,

work page 1968

[40] [40]

Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S. (2021). A Comprehensive Survey on Graph Neural Networks.IEEE Transactions on Neural Networks and Learning Systems,32(1), 4–24, https://doi.org/10.1109/TNNLS.2020.2978386

work page doi:10.1109/tnnls.2020.2978386 2021

[41] [41]

Xu, K., Hu, W., Leskovec, J., Jegelka, S. (2019). How powerful are graph neu- ral networks?Proceedings of the 7th international conference on learning representations

work page 2019

[42] [42]

Ying, Z., Bourgeois, D., You, J., Zitnik, M., Leskovec, J. (2019). GNNExplainer: Generating explanations for graph neural networks. H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch´ e-Buc, E. Fox, & R. Garnett (Eds.),Advances in neural information processing systems(Vol. 32). Curran Associates, Inc

work page 2019

[43] [43]

(2023, may)

Yuan, H., Yu, H., Gui, S., Ji, S. (2023, may). Explainability in graph neural networks: A taxonomic survey.IEEE Transactions on Pattern Analysis and Machine Intelligence,45(05), 5782-5799, https://doi.org/10.1109/TPAMI.2022 .3204236

work page doi:10.1109/tpami.2022 2023

[44] [44]

Zhang, B., Fan, C., Liu, S., Huang, K., Zhao, X., Huang, J., Liu, Z. (2025). The Expressive Power of Graph Neural Networks: A Survey.IEEE Transactions on Knowledge and Data Engineering,37(3), 1455–1474, https://doi.org/10.1109/ TKDE.2024.3523700 27

work page arXiv 2025

[45] [45]

Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A. (2016). Learning deep features for discriminative localization.Proceedings of the ieee conference on computer vision and pattern recognition(pp. 2921–2929)

work page 2016

[46] [46]

Zhou, Y., Liu, S., Siow, J., Du, X., Liu, Y. (2019). Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch´ e-Buc, E. Fox, & R. Garnett (Eds.),Advances in neural information processing systems(Vol. 32). Curran Associates, Inc

work page 2019

[47] [47]

Zhu, W., Wen, T., Song, G., Wang, L., Zheng, B. (2023). On Structural Expressive Power of Graph Transformers.Proceedings of the 29th acm sigkdd conference on knowledge discovery and data mining(pp. 3628–3637). New York, NY, USA: ACM. 28

work page 2023