pith. machine review for the scientific record. sign in

arxiv: 2605.06814 · v1 · submitted 2026-05-07 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

From Model to Data (M2D): Shifting Complexity from GNNs to Graphs for Transparent Graph Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-11 01:04 UTC · model grok-4.3

classification 💻 cs.LG
keywords graph neural networksmodel distillationgraph augmentationmodel transparencyexplainabilityknowledge distillationGNN architectures
0
0 comments X

The pith

M2D distills a complex GNN into an augmented graph so a simple student model matches the teacher's performance while exposing architectural mechanisms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Model-to-Data distillation to move complexity out of opaque Graph Neural Networks and into the input graph itself. It enriches the graph's features and structure with information from a teacher model, after which a basic student model achieves equivalent accuracy. This shift lets people examine the data directly to see why certain architectures succeed, such as by revealing fairness objectives or attention patterns. The method preserves performance while increasing human inspectability of GNN design choices.

Core claim

M2D distills the teacher model into an augmented graph with enriched features and structure, enabling a simple student to match the teacher's performance. By materializing model behavior in the data, the approach allows humans to inspect architectural advantages directly and reveals underlying mechanisms such as fairness objectives and attention-based aggregation in an interpretable way.

What carries the argument

M2D distillation, which transfers teacher model behavior into augmented graph features and edges so the data itself carries the complexity.

If this is right

  • Simple non-GNN models can reach high performance on graph tasks once the graph carries the necessary enriched structure.
  • Performance differences between GNN architectures become attributable to visible data properties rather than hidden computations.
  • Architectural features such as attention aggregation or fairness constraints appear as explicit patterns in the augmented graph.
  • Transparency is achieved without sacrificing predictive accuracy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same distillation idea could simplify models in other modalities by augmenting images or text with model-derived signals.
  • Inspecting the augmented graph might surface data-level biases that affect downstream fairness even when the original model is complex.
  • Graph design itself could become a tunable step that reduces the need for ever-deeper GNN layers.

Load-bearing premise

The augmented graph fully encodes the teacher model's behavior without distortion, so a simple student matches performance and direct data inspection reveals the true architectural mechanisms.

What would settle it

An experiment in which the student model on the M2D-augmented graph still underperforms the original teacher or in which inspecting the graph fails to surface the expected mechanisms such as fairness or attention patterns.

Figures

Figures reproduced from arXiv: 2605.06814 by Arlei Silva, Debolina Halder Lina.

Figure 1
Figure 1. Figure 1: Increasing transparency of a fair Graph Convolutional Network (GCN) via M2D using a [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distillation design space with data (x-axis) and model (y￾axis) complexity. Knowledge (K) distillation reduces model complex￾ity (vertical), and data (D) distilla￾tion reduces data complexity (hori￾zontal). Model-to-Data (M2D) dis￾tillation transfers capacity from the model to the data (diagonal). By shifting complexity to the data (right), fairness is no longer hidden within opaque adversarial training or… view at source ↗
Figure 3
Figure 3. Figure 3: The feature and structure learners iteratively refine features and structure by optimizing [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Accuracy–fairness trade-offs for FairGNN, FairVGNN, and NIFTY and their distilled [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Learned features colored by class label and sensitive attribute for the German dataset. [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Original and augmented structure for a sample of 40 nodes from the German dataset, [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Analysis of learned node features with respect to the attention of the teacher GAT. For [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Visualization of edge weights on Cora (homophilic) using row-normalized M2D learned [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Learned features visualization across five benchmark datasets (Cora, Citeseer, Photo, [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Learned features visualization across five benchmark datasets (Cora, Citeseer, Photo, [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Learned feature visualization on CORA with GAT as teacher and GCN as student. Black circles denote nodes misclassified by standard GCN distillation but correctly classified by GCN(Feat.). The learned features form clear class-wise clusters, and these nodes lie near cluster boundaries, where GCN(Feat.) successfully matches the teacher while standard GCN fails. near the boundaries between clusters. While st… view at source ↗
Figure 12
Figure 12. Figure 12: Visualization of edge importance on Cornell (heterophilic) using row-normalized M2D [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Analysis of learned features with respect to GT attention. We bin feature similarity [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗
read the original abstract

Graph Neural Networks (GNNs) achieve high performance but can be opaque to humans, making it difficult to understand and compare the many proposed architectures. While existing explainability methods attribute individual predictions to nodes, edges, or features, they do not provide architectural transparency or explain the fundamental performance gap between simple and more complex models. To address this limitation, we introduce Model-to-Data (M2D) distillation, a new framework that increases transparency by transferring model complexity into the data space. M2D distills the teacher model into an augmented graph with enriched features and structure, enabling a simple student to match the teacher's performance. By materializing model behavior in the data, our approach allows humans to inspect architectural advantages directly. We show that M2D reveals underlying mechanisms such as fairness objectives and attention-based aggregation in an interpretable way, enhancing GNN transparency while preserving performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces the Model-to-Data (M2D) distillation framework for Graph Neural Networks (GNNs). It claims that distilling a complex teacher GNN into an augmented graph with enriched features and structure allows a simple student GNN to match the teacher's performance. By materializing model behavior directly in the data, M2D enables human inspection of architectural advantages such as fairness objectives and attention-based aggregation, addressing limitations in existing GNN explainability methods that focus only on individual predictions.

Significance. If validated, the approach could meaningfully advance transparent graph learning by providing a data-centric alternative to post-hoc explanations, potentially allowing direct comparison of architectural mechanisms across GNN variants while preserving accuracy. This has implications for interpretability in domains requiring fairness or mechanistic understanding.

major comments (2)
  1. Abstract: The central claim that M2D augmentation fully materializes the teacher's internal mechanisms (attention aggregation, fairness objectives) such that a basic student matches performance and direct inspection reveals those mechanisms without loss or distortion lacks any formal definition, algorithm, or equation for the augmentation mapping. Without this, performance transfer alone does not establish inspectability, as the mapping could embed computations opaquely.
  2. Abstract: No experimental results, datasets, baselines, quantitative metrics (e.g., accuracy deltas, inspection examples), or error analysis are provided to support the claims that the student matches the teacher and that mechanisms are revealed interpretably. This is load-bearing for the framework's asserted benefits.
minor comments (1)
  1. The abstract would be strengthened by briefly outlining the high-level steps of the M2D process or naming example teacher/student architectures to make the framework more concrete.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the two major comments on the abstract point by point below, agreeing that the abstract can be strengthened to better convey the formal and empirical support present in the full paper.

read point-by-point responses
  1. Referee: Abstract: The central claim that M2D augmentation fully materializes the teacher's internal mechanisms (attention aggregation, fairness objectives) such that a basic student matches performance and direct inspection reveals those mechanisms without loss or distortion lacks any formal definition, algorithm, or equation for the augmentation mapping. Without this, performance transfer alone does not establish inspectability, as the mapping could embed computations opaquely.

    Authors: We agree that the abstract is high-level and omits the formal details. Section 3 of the manuscript provides the precise definition of the M2D augmentation mapping: it is a deterministic function that injects teacher-derived quantities (attention coefficients as new edge attributes, fairness-regularized embeddings as node features, and structure modifications) directly into the input graph. Because these quantities are explicit and human-readable, the student GNN operates on an interpretable augmented graph rather than opaque internal states. We will revise the abstract to include a concise reference to this mapping (e.g., “via an augmentation mapping that materializes attention and fairness terms as explicit graph elements”) so the inspectability claim is formally grounded. revision: yes

  2. Referee: Abstract: No experimental results, datasets, baselines, quantitative metrics (e.g., accuracy deltas, inspection examples), or error analysis are provided to support the claims that the student matches the teacher and that mechanisms are revealed interpretably. This is load-bearing for the framework's asserted benefits.

    Authors: Abstracts conventionally omit specific numbers to stay within length limits. The full manuscript reports experiments on multiple standard graph datasets (citation networks, social graphs) using common GNN baselines. Results show the student GNN recovers teacher accuracy within small deltas while the augmented graph directly exposes attention weights and fairness adjustments for inspection. We will add one sentence to the abstract summarizing these outcomes at a high level (e.g., “Experiments demonstrate that the student matches teacher performance while the augmented structures make attention and fairness mechanisms directly inspectable”) to make the empirical support explicit. revision: yes

Circularity Check

0 steps flagged

No significant circularity; conceptual framework presented without self-referential derivations or fitted predictions

full rationale

The paper introduces M2D distillation as a method to augment graphs with enriched features and structure derived from a teacher GNN, allowing a simple student to match performance while enabling inspection of mechanisms like fairness and attention. No equations, derivations, or parameter-fitting steps appear in the abstract or described content. The central claim is a proposed empirical outcome of the augmentation process rather than a tautological reduction (e.g., no self-definition of 'architectural advantage' via the augmentation itself, no fitted inputs renamed as predictions, and no load-bearing self-citations or uniqueness theorems). The derivation chain is self-contained as a high-level framework with implied validation, not forced by construction from its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no information on free parameters, axioms, or invented entities; all lists are therefore empty.

pith-pipeline@v0.9.0 · 5455 in / 1140 out tokens · 47684 ms · 2026-05-11T01:04:32.146384+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

71 extracted references · 10 canonical work pages · 2 internal anchors

  1. [1]

    Towards a unified framework for fair and stable graph representation learning

    Chirag Agarwal, Himabindu Lakkaraju, and Marinka Zitnik. Towards a unified framework for fair and stable graph representation learning. InUncertainty in Artificial Intelligence, pages 2114–2124. PMLR, 2021

  2. [2]

    Graph Rewiring in GNNs to Mitigate Over-Squashing and Over-Smoothing: A Survey

    Hugo Attali, Davide Buscaldi, and Nathalie Pernelle. Rewiring techniques to mitigate over- squashing and oversmoothing in gnns: A survey.arXiv preprint arXiv:2411.17429, 2024

  3. [3]

    arXiv preprint arXiv:1905.13686 (2019)

    Federico Baldassarre and Hossein Azizpour. Explainability techniques for graph convolutional networks.arXiv preprint arXiv:1905.13686, 2019

  4. [4]

    Iterative deep graph learning for graph neural networks: Better and robust node embeddings.Advances in Neural Information Processing Systems, 33:19314–19326, 2020

    Yu Chen, Lingfei Wu, and Mohammed Zaki. Iterative deep graph learning for graph neural networks: Better and robust node embeddings.Advances in Neural Information Processing Systems, 33:19314–19326, 2020

  5. [5]

    On self- distilling graph neural network.arXiv preprint arXiv:2011.02255, 2020

    Yuzhao Chen, Yatao Bian, Xi Xiao, Yu Rong, Tingyang Xu, and Junzhou Huang. On self- distilling graph neural network.arXiv preprint arXiv:2011.02255, 2020

  6. [6]

    Say no to the discrimination: Learning fair graph neural networks with limited sensitive attribute information

    Enyan Dai and Suhang Wang. Say no to the discrimination: Learning fair graph neural networks with limited sensitive attribute information. InProceedings of the 14th ACM International Conference on Web Search and Data Mining, pages 680–688, 2021

  7. [7]

    Graph-free knowledge distillation for graph neural networks

    Xiang Deng and Zhongfei Zhang. Graph-free knowledge distillation for graph neural networks. arXiv preprint arXiv:2105.07519, 2021

  8. [8]

    Exgc: Bridging efficiency and explainability in graph condensation

    Junfeng Fang, Xinglin Li, Yongduo Sui, Yuan Gao, Guibin Zhang, Kun Wang, Xiang Wang, and Xiangnan He. Exgc: Bridging efficiency and explainability in graph condensation. In Proceedings of the ACM Web Conference 2024, WWW ’24, page 721–732, New York, NY , USA, 2024. Association for Computing Machinery

  9. [9]

    Freekd: Free-direction knowledge distillation for graph neural networks

    Kaituo Feng, Changsheng Li, Ye Yuan, and Guoren Wang. Freekd: Free-direction knowledge distillation for graph neural networks. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 357–366, 2022

  10. [10]

    Fair graph distillation.Advances in Neural Information Processing Systems, 36:80644–80660, 2023

    Qizhang Feng, Zhimeng Stephen Jiang, Ruiquan Li, Yicheng Wang, Na Zou, Jiang Bian, and Xia Hu. Fair graph distillation.Advances in Neural Information Processing Systems, 36:80644–80660, 2023

  11. [11]

    Graph condensation for inductive node representation learning

    Xinyi Gao, Tong Chen, Yilong Zang, Wentao Zhang, Quoc Viet Hung Nguyen, Kai Zheng, and Hongzhi Yin. Graph condensation for inductive node representation learning. In2024 IEEE 40th International Conference on Data Engineering (ICDE), pages 3056–3069, 2024

  12. [12]

    Graph condensation for open-world graph learning

    Xinyi Gao, Tong Chen, Wentao Zhang, Yayong Li, Xiangguo Sun, and Hongzhi Yin. Graph condensation for open-world graph learning. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 851–862, 2024

  13. [13]

    Robgc: Towards robust graph condensation.IEEE Transactions on Knowledge and Data Engineering, 37(8):4791–4804, 2025

    Xinyi Gao, Hongzhi Yin, Tong Chen, Guanhua Ye, Wentao Zhang, and Bin Cui. Robgc: Towards robust graph condensation.IEEE Transactions on Knowledge and Data Engineering, 37(8):4791–4804, 2025

  14. [14]

    Using fuzzy logic to leverage html markup for web page representation.IEEE Transactions on Fuzzy Systems, 25(4):919–933, 2016

    Alberto P García-Plaza, Víctor Fresno, Raquel Martínez Unanue, and Arkaitz Zubiaga. Using fuzzy logic to leverage html markup for web page representation.IEEE Transactions on Fuzzy Systems, 25(4):919–933, 2016

  15. [15]

    Neural message passing for quantum chemistry

    Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. InInternational Conference on Machine Learning, pages 1263–1272. PMLR, 2017

  16. [16]

    Knowledge distillation: A survey.International Journal of Computer Vision, 129(6):1789–1819, 2021

    Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. Knowledge distillation: A survey.International Journal of Computer Vision, 129(6):1789–1819, 2021

  17. [17]

    Implicit regularization in matrix factorization.Advances in neural information processing systems, 30, 2017

    Suriya Gunasekar, Blake E Woodworth, Srinadh Bhojanapalli, Behnam Neyshabur, and Nati Sre- bro. Implicit regularization in matrix factorization.Advances in neural information processing systems, 30, 2017. 10

  18. [18]

    Alignahead: Online cross-layer knowledge extraction on graph neural networks

    Jiongyu Guo, Defang Chen, and Can Wang. Alignahead: Online cross-layer knowledge extraction on graph neural networks. In2022 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2022

  19. [19]

    Inductive representation learning on large graphs

    Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. InNeurIPS, 2017

  20. [20]

    A comprehensive survey on graph reduction: Sparsification, coarsening, and condensation

    Mohammad Hashemi, Shengbo Gong, Juntong Ni, Wenqi Fan, B Aditya Prakash, and Wei Jin. A comprehensive survey on graph reduction: Sparsification, coarsening, and condensation. arXiv preprint arXiv:2402.03358, 2024

  21. [21]

    Compressing deep graph neural networks via adversarial knowledge distillation

    Huarui He, Jie Wang, Zhanqiu Zhang, and Feng Wu. Compressing deep graph neural networks via adversarial knowledge distillation. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 534–544, 2022

  22. [22]

    Distilling the Knowledge in a Neural Network

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015

  23. [23]

    Statlog (German Credit Data)

    Hans Hofmann. Statlog (German Credit Data). UCI Machine Learning Repository, 1994

  24. [24]

    Graphlime: Local interpretable model explanations for graph neural networks.IEEE Transactions on Knowledge and Data Engineering, 35(7):6968–6972, 2022

    Qiang Huang, Makoto Yamada, Yuan Tian, Dinesh Singh, and Yi Chang. Graphlime: Local interpretable model explanations for graph neural networks.IEEE Transactions on Knowledge and Data Engineering, 35(7):6968–6972, 2022

  25. [25]

    T2-gnn: Graph neural networks for graphs with incomplete features and structure via teacher-student distillation

    Cuiying Huo, Di Jin, Yawen Li, Dongxiao He, Yu-Bin Yang, and Lingfei Wu. T2-gnn: Graph neural networks for graphs with incomplete features and structure via teacher-student distillation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 4339–4346, 2023

  26. [26]

    Condensing graphs via one-step gradient matching

    Wei Jin, Xianfeng Tang, Haoming Jiang, Zheng Li, Danqing Zhang, Jiliang Tang, and Bing Yin. Condensing graphs via one-step gradient matching. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, page 720–730, New York, NY , USA, 2022. Association for Computing Machinery

  27. [27]

    On representation knowledge distillation for graph neural networks.IEEE transactions on Neural Networks and Learning Systems, 35(4):4656–4667, 2022

    Chaitanya K Joshi, Fayao Liu, Xu Xun, Jie Lin, and Chuan Sheng Foo. On representation knowledge distillation for graph neural networks.IEEE transactions on Neural Networks and Learning Systems, 35(4):4656–4667, 2022

  28. [28]

    How to learn a graph from smooth signals

    Vassilis Kalofolias. How to learn a graph from smooth signals. InArtificial Intelligence and Statistics, pages 920–929. PMLR, 2016

  29. [29]

    Ganexplainer: Explainability method for graph neural network with generative adversarial nets

    Xinrui Kang, Dong Liang, and Qinfeng Li. Ganexplainer: Explainability method for graph neural network with generative adversarial nets. InInternational Conference on Computing and Pattern Recognition, pages 297–302, 2022

  30. [30]

    Semi-supervised classification with graph convolutional networks

    Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. InICLR, 2017

  31. [31]

    A comprehensive survey of dataset distillation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):17–32, 2023

    Shiye Lei and Dacheng Tao. A comprehensive survey of dataset distillation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):17–32, 2023

  32. [32]

    The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery.Queue, 16(3):31–57, 2018

    Zachary C Lipton. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery.Queue, 16(3):31–57, 2018

  33. [33]

    Learning model-level explanations of graph neural networks via subgraph order embedding space.Neural Networks, 191:107815, 2025

    Li Liu, Pengyu Wan, Feiyan Zhang, Youmin Zhang, Qun Liu, and Guoyin Wang. Learning model-level explanations of graph neural networks via subgraph order embedding space.Neural Networks, 191:107815, 2025

  34. [34]

    Graph distillation with eigenbasis matching

    Yang Liu, Deyu Bo, and Chuan Shi. Graph distillation with eigenbasis matching.arXiv preprint arXiv:2310.09202, 2023

  35. [35]

    Cat: Balanced continual graph learning with graph condensation

    Yilun Liu, Ruihong Qiu, and Zi Huang. Cat: Balanced continual graph learning with graph condensation. In2023 IEEE International Conference on Data Mining (ICDM), pages 1157–

  36. [36]

    Graph data condensation via self-expressive graph structure reconstruction

    Zhanyu Liu, Chaolv Zeng, and Guanjie Zheng. Graph data condensation via self-expressive graph structure reconstruction. InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1992–2002, 2024

  37. [37]

    Parameterized explainer for graph neural network.Advances in Neural Information Processing Systems, 33:19620–19631, 2020

    Dongsheng Luo, Wei Cheng, Dongkuan Xu, Wenchao Yu, Bo Zong, Haifeng Chen, and Xiang Zhang. Parameterized explainer for graph neural network.Advances in Neural Information Processing Systems, 33:19620–19631, 2020

  38. [38]

    Gcare: Mitigating subgroup unfairness in graph conden- sation through adversarial regularization.Applied Sciences, 2023

    Runze Mao, Wenqi Fan, and Qing Li. Gcare: Mitigating subgroup unfairness in graph conden- sation through adversarial regularization.Applied Sciences, 2023

  39. [39]

    Dis- till n’explain: explaining graph neural networks using simple surrogates

    Tamara Pereira, Erik Nascimento, Lucas E Resck, Diego Mesquita, and Amauri Souza. Dis- till n’explain: explaining graph neural networks using simple surrogates. InInternational Conference on Artificial Intelligence and Statistics, pages 6199–6214. PMLR, 2023

  40. [40]

    Explainability methods for graph convolutional neural networks

    Phillip E Pope, Soheil Kolouri, Mohammad Rostami, Charles E Martin, and Heiko Hoff- mann. Explainability methods for graph convolutional neural networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10772–10781, 2019

  41. [41]

    Interpreting graph neural net- works for nlp with differentiable edge masking

    Michael Sejr Schlichtkrull, Nicola De Cao, and Ivan Titov. Interpreting graph neural net- works for nlp with differentiable edge masking. InInternational Conference on Learning Representations, 2020

  42. [42]

    Collective classification in network data.AI Magazine, 29(3):93–93, 2008

    Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi- Rad. Collective classification in network data.AI Magazine, 29(3):93–93, 2008

  43. [43]

    Pitfalls of Graph Neural Network Evaluation

    Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann. Pitfalls of graph neural network evaluation.arXiv preprint arXiv:1811.05868, 2018

  44. [44]

    Page: Prototype-based model-level explanations for graph neural networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(10):6559–6576, 2024

    Yong-Min Shin, Sun-Woo Kim, and Won-Yong Shin. Page: Prototype-based model-level explanations for graph neural networks.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(10):6559–6576, 2024

  45. [45]

    Learning MLPs on graphs: A unified view of effectiveness, robustness, and efficiency

    Yijun Tian, Chuxu Zhang, Zhichun Guo, Xiangliang Zhang, and Nitesh Chawla. Learning MLPs on graphs: A unified view of effectiveness, robustness, and efficiency. InInternational Conference on Learning Representations, 2023

  46. [46]

    [re] gnninterpreter: A probabilistic generative model-level explanation for graph neural networks.Transactions on Machine Learning Research, 2024

    Ana Vasilcoiu, THF Stessen, Thies Kersten, and Batu Helvacioglu. [re] gnninterpreter: A probabilistic generative model-level explanation for graph neural networks.Transactions on Machine Learning Research, 2024

  47. [47]

    Attention is all you need.Advances in Neural Information Processing Systems, 30, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in Neural Information Processing Systems, 30, 2017

  48. [48]

    Graph attention networks

    Petar Veliˇckovi´c, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, Yoshua Bengio, et al. Graph attention networks. InInternational Conference on Learning Representa- tions, 2018

  49. [49]

    Pgm-explainer: Probabilistic graphical model explanations for graph neural networks.Advances in Neural Information Processing Systems, 33:12225–12235, 2020

    Minh Vu and My T Thai. Pgm-explainer: Probabilistic graphical model explanations for graph neural networks.Advances in Neural Information Processing Systems, 33:12225–12235, 2020

  50. [50]

    Collaborative knowledge distillation for heterogeneous information network embedding

    Can Wang, Sheng Zhou, Kang Yu, Defang Chen, Bolang Li, Yan Feng, and Chun Chen. Collaborative knowledge distillation for heterogeneous information network embedding. In Proceedings of the ACM Web Conference, pages 1631–1639, 2022

  51. [51]

    Improving fairness in graph neural networks via mitigating sensitive attribute leakage

    Yu Wang, Yuying Zhao, Yushun Dong, Huiyuan Chen, Jundong Li, and Tyler Derr. Improving fairness in graph neural networks via mitigating sensitive attribute leakage. InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 1938–1948, 2022

  52. [52]

    Knowledge distillation improves graph structure augmentation for graph neural networks.Advances in Neural Information Processing Systems, 35:11815–11827, 2022

    Lirong Wu, Haitao Lin, Yufei Huang, and Stan Z Li. Knowledge distillation improves graph structure augmentation for graph neural networks.Advances in Neural Information Processing Systems, 35:11815–11827, 2022. 12

  53. [53]

    Kernel ridge regression-based graph dataset distillation

    Zhe Xu, Yuzhong Chen, Menghai Pan, Huiyuan Chen, Mahashweta Das, Hao Yang, and Hanghang Tong. Kernel ridge regression-based graph dataset distillation. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 2850–2861, 2023

  54. [54]

    Global concept-based interpretability for graph neural networks via neuron analysis

    Han Xuanyuan, Pietro Barbiero, Dobrik Georgiev, Lucie Charlotte Magister, and Pietro Liò. Global concept-based interpretability for graph neural networks via neuron analysis. InPro- ceedings of the AAAI Conference on Artificial Intelligence, volume 37, pages 10675–10683, 2023

  55. [55]

    Tinygnn: Learning efficient graph neural networks

    Bencheng Yan, Chaokun Wang, Gaoyang Guo, and Yunkai Lou. Tinygnn: Learning efficient graph neural networks. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1848–1856, 2020

  56. [56]

    Extract the knowledge of graph neural networks and go beyond it: An effective knowledge distillation framework

    Cheng Yang, Jiawei Liu, and Chuan Shi. Extract the knowledge of graph neural networks and go beyond it: An effective knowledge distillation framework. InProceedings of the Web Conference, pages 1227–1237, 2021

  57. [57]

    Geometric knowledge distillation: Topology compression for graph neural networks.Advances in Neural Information Processing Systems, 35:29761–29775, 2022

    Chenxiao Yang, Qitian Wu, and Junchi Yan. Geometric knowledge distillation: Topology compression for graph neural networks.Advances in Neural Information Processing Systems, 35:29761–29775, 2022

  58. [58]

    Distilling knowledge from graph convolutional networks

    Yiding Yang, Jiayan Qiu, Mingli Song, Dacheng Tao, and Xinchao Wang. Distilling knowledge from graph convolutional networks. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7074–7083, 2020

  59. [59]

    Do transformers really perform badly for graph representation?Advances in Neural Information Processing Systems, 34:28877–28888, 2021

    Chengxuan Ying, Tianle Cai, Shengjie Luo, Shuxin Zheng, Guolin Ke, Di He, Yanming Shen, and Tie-Yan Liu. Do transformers really perform badly for graph representation?Advances in Neural Information Processing Systems, 34:28877–28888, 2021

  60. [60]

    Gnnexplainer: Generating explanations for graph neural networks.Advances in Neural Information Processing Systems, 32, 2019

    Zhitao Ying, Dylan Bourgeois, Jiaxuan You, Marinka Zitnik, and Jure Leskovec. Gnnexplainer: Generating explanations for graph neural networks.Advances in Neural Information Processing Systems, 32, 2019

  61. [61]

    Sail: Self-augmented graph contrastive learning

    Lu Yu, Shichao Pei, Lizhong Ding, Jun Zhou, Longfei Li, Chuxu Zhang, and Xiangliang Zhang. Sail: Self-augmented graph contrastive learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8927–8935, 2022

  62. [62]

    On explainability of graph neural networks via subgraph explorations

    Hao Yuan, Haiyang Yu, Jie Wang, Kang Li, and Shuiwang Ji. On explainability of graph neural networks via subgraph explorations. InInternational Conference on Machine Learning, pages 12241–12252. PMLR, 2021

  63. [63]

    Multi-scale distillation from multiple graph neural networks

    Chunhai Zhang, Jie Liu, Kai Dang, and Wenzheng Zhang. Multi-scale distillation from multiple graph neural networks. InProceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 4337–4344, 2022

  64. [64]

    Graph-less neural networks: Teaching old mlps new tricks via distillation

    Shichang Zhang, Yozen Liu, Yizhou Sun, and Neil Shah. Graph-less neural networks: Teaching old mlps new tricks via distillation. InInternational Conference on Learning Representations, 2022

  65. [65]

    Two trades is not baffled: Condensing graph via crafting rational gradient matching.arXiv preprint arXiv:2402.04924, 2024

    Tianle Zhang, Yuchen Zhang, Kun Wang, Kai Wang, Beining Yang, Kaipeng Zhang, Wenqi Shao, Ping Liu, Joey Tianyi Zhou, and Yang You. Two trades is not baffled: Condensing graph via crafting rational gradient matching.arXiv preprint arXiv:2402.04924, 2024

  66. [66]

    Rod: reception-aware online distillation for sparse graphs

    Wentao Zhang, Yuezihan Jiang, Yang Li, Zeang Sheng, Yu Shen, Xupeng Miao, Liang Wang, Zhi Yang, and Bin Cui. Rod: reception-aware online distillation for sparse graphs. InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 2232–2242, 2021

  67. [67]

    Reliable data distillation on graph convolutional network

    Wentao Zhang, Xupeng Miao, Yingxia Shao, Jiawei Jiang, Lei Chen, Olivier Ruas, and Bin Cui. Reliable data distillation on graph convolutional network. InProceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pages 1399–1414, 2020. 13

  68. [68]

    Relex: A model-agnostic relational model explainer

    Yue Zhang, David Defazio, and Arti Ramesh. Relex: A model-agnostic relational model explainer. InProceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pages 1042–1049, 2021

  69. [69]

    Cold brew: Distilling graph node representations with incomplete or missing neighborhoods

    Wenqing Zheng, Edward W Huang, Nikhil Rao, Sumeet Katariya, Zhangyang Wang, and Karthik Subbian. Cold brew: Distilling graph node representations with incomplete or missing neighborhoods. InInternational Conference on Learning Representations, 2022

  70. [70]

    Structure-free graph condensation: From large-scale graphs to condensed graph-free data

    Xin Zheng, Miao Zhang, Chunyang Chen, Quoc Viet Hung Nguyen, Xingquan Zhu, and Shirui Pan. Structure-free graph condensation: From large-scale graphs to condensed graph-free data. Advances in Neural Information Processing Systems, 36:6026–6047, 2023

  71. [71]

    Data-free adversarial knowledge distillation for graph neural networks.arXiv preprint arXiv:2205.03811, 2022

    Yuanxin Zhuang, Lingjuan Lyu, Chuan Shi, Carl Yang, and Lichao Sun. Data-free adversarial knowledge distillation for graph neural networks.arXiv preprint arXiv:2205.03811, 2022. 14 A Technical appendices and supplementary material Graph Neural Networks We adopt aGraph Convolutional Network (GCN)as the student and aGraph Attention Network (GAT)as the teach...