Hierarchical Attention-based Graph Neural Network with Relevance-driven Pruning

Seungwoo Kum

arxiv: 2605.09308 · v1 · submitted 2026-05-10 · 💻 cs.LG · cs.AI

Hierarchical Attention-based Graph Neural Network with Relevance-driven Pruning

Seungwoo Kum This is my paper

Pith reviewed 2026-05-12 02:42 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords Graph Neural NetworksHierarchical AttentionRelevance PruningHeterogeneous GraphsGNN ExplainerNode ClassificationGraph Efficiency

0 comments

The pith

A two-tier attention GNN generates relevance scores that let pruning remove 27% of edges while raising accuracy 2.4-6.1%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents HA-HeteroGNN, a model that processes graphs with 16 node types and 18 edge types through separate sensor-level and context-level attention layers. Relevance scores for each node are computed directly from the attention weights using an explainer that skips gradient calculations. These scores then identify nodes to remove, producing a pruned graph that is both smaller and more accurate on a synthetic dataset of 50,000 records. The result directly contradicts the usual view that graph reduction must cost predictive power. The method also reports large drops in training time and stable explanations across different strategies.

Core claim

HA-HeteroGNN uses a hierarchical attention structure to separate computation across node and edge types, yielding per-node relevance scores that serve as a pruning criterion. Removing nodes flagged as consistently uninformative reduces the number of edges by 27% while lifting classification accuracy between 2.4% and 6.1% on all tested variants of the model.

What carries the argument

Two-tier attention mechanism that separates sensor-level and context-level computation to produce per-node relevance scores for pruning without gradient backpropagation.

If this is right

Graph edges drop 27% after removal of low-relevance nodes.
Classification accuracy rises 2.4-6.1% on every model variant tested.
Training time falls by as much as 43.9%.
Real-time inference runs at 58-60 ms per sample.
Explanation stability reaches 97.5% across different pruning strategies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same relevance-driven pruning could be tested on real sensor networks or knowledge graphs to check whether the accuracy gains survive outside synthetic data.
Attention weights might serve as a lightweight substitute for gradient-based explainers in other GNN tasks where backpropagation cost is high.
If low-relevance nodes are reliably noise, the method offers a way to clean large relational datasets before any downstream task.

Load-bearing premise

The attention-derived relevance scores correctly identify nodes whose removal preserves or improves classification performance on the given data.

What would settle it

Run the pruned model on a real heterogeneous graph dataset drawn from actual sensor or report data and measure whether accuracy still rises or instead falls relative to the unpruned baseline.

Figures

Figures reproduced from arXiv: 2605.09308 by Seungwoo Kum.

**Figure 2.** Figure 2: Overview of the HA-HeteroGNN framework. Solid arrows indicate the primary data flow; [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Hierarchical multihead attention architecture. The report type conditioned query [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: GNN Explainer architecture. Attention weights from the hierarchical attention module [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Heterogeneous graph schema. Each report node connects to sensor, alert, and context [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Three model architectures: (a) base inductive without attention; (b) single cross [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

read the original abstract

Graph Neural Networks (GNNs) excel at relational reasoning but face two persistent challenges: the lack of interpretable attribution for heterogeneous node types, and the computational overhead of message passing over large, noisy graphs. We propose the Hierarchical Attention-based Heterogeneous GNN (HA-HeteroGNN), a framework that addresses both issues through a unied explainability-to-pruning pipeline. A two-tier attention mechanism separates sensor-level and context-level computation across 16 node types and 18 edge types, producing per-node relevance scores via an attention-based GNN Explainer without requiring gradient backpropagation. These relevance scores then serve as a principled pruning criterion: removing nodes identied as consistently uninformative yields a 27% reduction in graph edges while simultaneously improving classication accuracy by 2.46.1% across all model variants, challenging the conventional assumption that pruning necessarily trades accuracy for eciency. Experiments on a 50,000-record synthetic dataset spanning 11 report categories demonstrate 97.5% cross-strategy explanation stability and domain consistent sensor attribution, with training-time reductions of up to 43.9% and real-time inference latency of approximately 5860 ms per sample.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes HA-HeteroGNN, a hierarchical attention-based heterogeneous GNN with a two-tier (sensor-level and context-level) attention mechanism over 16 node types and 18 edge types. It generates per-node relevance scores via a gradient-free attention-based GNN Explainer and uses these scores as a pruning criterion to remove consistently uninformative nodes. On a 50,000-record synthetic dataset with 11 report categories, this yields a 27% reduction in graph edges, 2.4-6.1% accuracy gains across variants, 97.5% cross-strategy stability, up to 43.9% training-time reduction, and ~58-60 ms inference latency per sample.

Significance. If the results hold beyond synthetic data, the work would be significant by integrating gradient-free explainability directly into a pruning pipeline for heterogeneous GNNs, demonstrating that targeted removal of low-relevance nodes can simultaneously improve efficiency and accuracy. This challenges the standard pruning trade-off and offers a practical approach for noisy relational data with multiple node/edge types.

major comments (2)

[Experiments] The central empirical claim (27% edge reduction with 2.4-6.1% accuracy improvement) rests exclusively on a synthetic dataset; no experiments on real heterogeneous graphs are reported, leaving open whether the attention-derived relevance scores identify genuinely uninformative nodes or exploit synthetic artifacts (e.g., removable noise by construction).
[Method and Abstract] Relevance scores are produced by the model's own two-tier attention mechanism and then applied to prune the identical graph on which the model was trained, creating a circular dependence; the manuscript must show that the pruning criterion is not merely removing nodes the fitted attention weights have already down-weighted.

minor comments (2)

[Abstract] Abstract contains multiple typographical errors: '2.46.1 percent' (should be 2.4-6.1%), 'unied', 'eciency', 'classication', 'identied', and 'supp'.
[Abstract] The abstract states numerical outcomes without error bars, detailed baseline tables, or the precise definition of the pruning consistency threshold (listed as a free parameter).

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, clarifying our approach where possible and outlining revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Experiments] The central empirical claim (27% edge reduction with 2.4-6.1% accuracy improvement) rests exclusively on a synthetic dataset; no experiments on real heterogeneous graphs are reported, leaving open whether the attention-derived relevance scores identify genuinely uninformative nodes or exploit synthetic artifacts (e.g., removable noise by construction).

Authors: We acknowledge the limitation of using only synthetic data for the reported results. The 50,000-record synthetic dataset was specifically engineered with 16 node types, 18 edge types, and controlled noise across 11 categories to enable precise measurement of explanation stability (97.5%) and attribution consistency, which require ground-truth relevance labels unavailable in most real datasets. This design supports rigorous ablation of the pruning mechanism. We agree that real-world validation is essential and will add a new Limitations and Future Work subsection discussing the synthetic setting and proposing extensions to real heterogeneous graphs (e.g., from knowledge bases or sensor networks). We cannot incorporate new real-data experiments in this revision due to data-access and computational constraints. revision: partial
Referee: [Method and Abstract] Relevance scores are produced by the model's own two-tier attention mechanism and then applied to prune the identical graph on which the model was trained, creating a circular dependence; the manuscript must show that the pruning criterion is not merely removing nodes the fitted attention weights have already down-weighted.

Authors: The relevance scores are generated by a dedicated gradient-free attention-based GNN Explainer that propagates attention independently of the downstream classification loss. Pruning is applied using aggregated scores from multiple independent training runs and cross-validation folds to identify nodes that remain low-relevance across realizations. This process is not equivalent to simply thresholding the raw two-tier attention weights. To make this distinction explicit, we will revise the Method section with additional pseudocode and add an ablation experiment in the revised manuscript comparing our explainer-based pruning against direct attention-weight thresholding and random pruning baselines. These results will demonstrate the incremental benefit of the explainer-derived scores. revision: yes

standing simulated objections not resolved

Experiments on real heterogeneous graphs (we lack access to suitable large-scale labeled real-world datasets with comparable heterogeneity for the current revision cycle)

Circularity Check

0 steps flagged

No significant circularity detected in the proposed explainability-to-pruning pipeline

full rationale

The paper introduces HA-HeteroGNN with a two-tier attention mechanism that produces relevance scores via an attention-based GNN Explainer; these scores are then used as a pruning criterion on the same synthetic dataset. The reported edge reduction and accuracy gains are presented as post-pruning empirical measurements rather than quantities derived by construction from the fitted attention weights. No equations, uniqueness theorems, or self-citations are shown to reduce the central claims to definitional equivalence or fitted inputs. The pipeline is a standard train-then-explain-then-prune workflow whose outcomes remain falsifiable on held-out or real data, making the derivation self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that attention weights constitute reliable relevance scores and on the unstated choice of a consistency threshold for pruning; no explicit free parameters or new physical entities are named in the abstract.

free parameters (1)

pruning consistency threshold
The criterion for nodes being 'consistently uninformative' implies at least one tunable threshold or aggregation rule whose value is not reported.

axioms (1)

domain assumption Attention weights in a heterogeneous GNN can be interpreted directly as node relevance without additional calibration
Invoked when the two-tier attention is said to produce per-node relevance scores usable for pruning.

invented entities (1)

HA-HeteroGNN two-tier attention mechanism no independent evidence
purpose: To separate sensor-level and context-level computation while generating relevance scores
New architectural component introduced by the paper.

pith-pipeline@v0.9.0 · 5501 in / 1442 out tokens · 81932 ms · 2026-05-12T02:42:14.112779+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

Safety e-Report Annual Report 2020

Korea Safety Agency. Safety e-Report Annual Report 2020. Ministry of the Interior and Safety, Republic of Korea, 2020

work page 2020
[2]

Nam and T

T. Nam and T. A. Pardo. The changing face of a city government: A case study of Philly311. Gov. Information Quarterly, 31:S1--S9, 2014

work page 2014
[3]

S. L. Minkoff. NYC 311: A tract-level analysis of citizen--government contacting in New York City. Urban Affairs Review, 52(2):211--246, 2016

work page 2016
[4]

FixMyStreet: Report, view, or discuss local problems

mySociety. FixMyStreet: Report, view, or discuss local problems. https://www.fixmystreet.com/, 2023

work page 2023
[5]

Snap Send Solve

Snap Send Solve Pty Ltd. Snap Send Solve. https://www.snapsendsolve.com/, 2022

work page 2022
[6]

T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In Proc. ICLR, 2017

work page 2017
[7]

W. L. Hamilton, R. Ying, and J. Leskovec. Inductive representation learning on large graphs. In Proc. NeurIPS, pp.\,1024--1034, 2017

work page 2017
[8]

Veli c kovi \'c , G

P. Veli c kovi \'c , G. Cucurull, A. Casanova, A. Romero, P. Li \`o , and Y. Bengio. Graph attention networks. In Proc. ICLR, 2018

work page 2018
[9]

X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu. Heterogeneous graph attention network. In Proc. WWW, pp.\,2022--2032, 2019

work page 2022
[10]

Z. Hu, Y. Dong, K. Wang, and Y. Sun. Heterogeneous graph transformer. In Proc. WWW, pp.\,2704--2710, 2020

work page 2020
[11]

B. Yu, H. Yin, and Z. Zhu. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. In Proc. IJCAI, pp.\,3634--3640, 2018

work page 2018
[12]

Zhang, D

C. Zhang, D. Song, C. Huang, A. Swami, and N. V. Chawla. Heterogeneous graph neural network. In Proc. KDD, pp.\,793--803, 2019

work page 2019
[13]

Fan and A

C. Fan and A. Mostafavi. A graph-based method for social sensing of infrastructure disruptions in disasters. Comput.-Aided Civ. and Infrastruct. Eng., 34(12):1055--1070, 2019

work page 2019
[14]

J. Chen, T. Ma, and C. Xiao. FastGCN: Fast learning with graph convolutional networks via importance sampling. In Proc. ICLR, 2018

work page 2018
[15]

Chiang, X

W. Chiang, X. Liu, S. Si, Y. Li, S. Bengio, and C. Hsieh. Cluster-GCN: An efficient algorithm for training deep and large graph convolutional networks. In Proc. KDD, pp.\,257--266, 2019

work page 2019
[16]

H. Zeng, H. Zhou, A. Srivastava, R. Kanber, and V. Prasanna. GraphSAINT: Graph sampling based inductive learning method. In Proc. ICLR, 2020

work page 2020
[17]

Scarselli, M

F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. The graph neural network model. IEEE Trans. Neural Networks, 20(1):61--80, 2009

work page 2009
[18]

Bruna, W

J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral networks and locally connected networks on graphs. In Proc. ICLR, 2014

work page 2014
[19]

Defferrard, X

M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Proc. NeurIPS, pp.\,3844--3852, 2016

work page 2016
[20]

Gilmer, S

J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural message passing for quantum chemistry. In Proc. ICML, pp.\,1263--1272, 2017

work page 2017
[21]

Schlichtkrull, T

M. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling. Modeling relational data with graph convolutional networks. In Proc. ESWC, pp.\,593--607, 2018

work page 2018
[22]

Lv et al

Q. Lv et al. Are we really making much progress? Revisiting, benchmarking and refining heterogeneous graph neural networks. In Proc. KDD, pp.\,1150--1160, 2021

work page 2021
[23]

Geng et al

X. Geng et al. Spatiotemporal multigraph convolution network for ride-hailing demand forecasting. In Proc. AAAI, pp.\,3656--3663, 2019

work page 2019
[24]

Zheng, X

C. Zheng, X. Fan, C. Wang, and J. Qi. GMAN: A graph multi-attention network for traffic prediction. In Proc. AAAI, pp.\,1234--1241, 2020

work page 2020
[25]

R. Ying, D. Bourgeois, J. You, M. Zitnik, and J. Leskovec. GNNExplainer: Generating explanations for graph neural networks. In Proc. NeurIPS, pp.\,9244--9255, 2019

work page 2019
[26]

Luo et al

D. Luo et al. Parameterized explainer for graph neural network. In Proc. NeurIPS, pp.\,19620--19631, 2020

work page 2020
[27]

H. Yuan, H. Yu, J. Wang, K. Li, and S. Ji. On explainability of graph neural networks via subgraph explorations. In Proc. ICML, pp.\,12241--12252, 2021

work page 2021
[28]

Sundararajan, A

M. Sundararajan, A. Taly, and Q. Yan. Axiomatic attribution for deep networks. In Proc. ICML, pp.\,3319--3328, 2017

work page 2017
[29]

P. E. Pope, S. Kolouri, M. Rostami, C. E. Martin, and H. Hoffmann. Explainability methods for graph convolutional neural networks. In Proc. CVPR, pp.\,10772--10781, 2019

work page 2019
[30]

Brody, U

S. Brody, U. Alon, and E. Yahav. How attentive are graph attention networks? In Proc. ICLR, 2022

work page 2022
[31]

Zheng et al

C. Zheng et al. Robust graph representation learning via neural sparsification. In Proc. ICML, pp.\,11458--11468, 2020

work page 2020
[32]

Chen et al

T. Chen et al. A unified lottery ticket hypothesis for graph neural networks. In Proc. ICML, pp.\,1695--1706, 2021

work page 2021
[33]

Y. Rong, W. Huang, T. Xu, and J. Huang. DropEdge: Towards deep graph convolutional networks on node classification. In Proc. ICLR, 2019

work page 2019
[34]

Yang et al

Y. Yang et al. Distilling knowledge from graph convolutional networks. In Proc. CVPR, pp.\,7074--7083, 2020

work page 2020
[35]

Huang et al

Z. Huang et al. Scaling up graph neural networks via graph coarsening. In Proc. KDD, pp.\,675--685, 2021

work page 2021
[36]

Ying et al

R. Ying et al. Graph convolutional neural networks for web-scale recommender systems. In Proc. KDD, pp.\,974--983, 2018

work page 2018
[37]

Fatemi, L

B. Fatemi, L. El Asri, and S. M. Kazemi. SLAPS: Self-supervision improves structure learning for graph neural networks. In Proc. NeurIPS, pp.\,22667--22681, 2021

work page 2021

[1] [1]

Safety e-Report Annual Report 2020

Korea Safety Agency. Safety e-Report Annual Report 2020. Ministry of the Interior and Safety, Republic of Korea, 2020

work page 2020

[2] [2]

Nam and T

T. Nam and T. A. Pardo. The changing face of a city government: A case study of Philly311. Gov. Information Quarterly, 31:S1--S9, 2014

work page 2014

[3] [3]

S. L. Minkoff. NYC 311: A tract-level analysis of citizen--government contacting in New York City. Urban Affairs Review, 52(2):211--246, 2016

work page 2016

[4] [4]

FixMyStreet: Report, view, or discuss local problems

mySociety. FixMyStreet: Report, view, or discuss local problems. https://www.fixmystreet.com/, 2023

work page 2023

[5] [5]

Snap Send Solve

Snap Send Solve Pty Ltd. Snap Send Solve. https://www.snapsendsolve.com/, 2022

work page 2022

[6] [6]

T. N. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. In Proc. ICLR, 2017

work page 2017

[7] [7]

W. L. Hamilton, R. Ying, and J. Leskovec. Inductive representation learning on large graphs. In Proc. NeurIPS, pp.\,1024--1034, 2017

work page 2017

[8] [8]

Veli c kovi \'c , G

P. Veli c kovi \'c , G. Cucurull, A. Casanova, A. Romero, P. Li \`o , and Y. Bengio. Graph attention networks. In Proc. ICLR, 2018

work page 2018

[9] [9]

X. Wang, H. Ji, C. Shi, B. Wang, Y. Ye, P. Cui, and P. S. Yu. Heterogeneous graph attention network. In Proc. WWW, pp.\,2022--2032, 2019

work page 2022

[10] [10]

Z. Hu, Y. Dong, K. Wang, and Y. Sun. Heterogeneous graph transformer. In Proc. WWW, pp.\,2704--2710, 2020

work page 2020

[11] [11]

B. Yu, H. Yin, and Z. Zhu. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. In Proc. IJCAI, pp.\,3634--3640, 2018

work page 2018

[12] [12]

Zhang, D

C. Zhang, D. Song, C. Huang, A. Swami, and N. V. Chawla. Heterogeneous graph neural network. In Proc. KDD, pp.\,793--803, 2019

work page 2019

[13] [13]

Fan and A

C. Fan and A. Mostafavi. A graph-based method for social sensing of infrastructure disruptions in disasters. Comput.-Aided Civ. and Infrastruct. Eng., 34(12):1055--1070, 2019

work page 2019

[14] [14]

J. Chen, T. Ma, and C. Xiao. FastGCN: Fast learning with graph convolutional networks via importance sampling. In Proc. ICLR, 2018

work page 2018

[15] [15]

Chiang, X

W. Chiang, X. Liu, S. Si, Y. Li, S. Bengio, and C. Hsieh. Cluster-GCN: An efficient algorithm for training deep and large graph convolutional networks. In Proc. KDD, pp.\,257--266, 2019

work page 2019

[16] [16]

H. Zeng, H. Zhou, A. Srivastava, R. Kanber, and V. Prasanna. GraphSAINT: Graph sampling based inductive learning method. In Proc. ICLR, 2020

work page 2020

[17] [17]

Scarselli, M

F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, and G. Monfardini. The graph neural network model. IEEE Trans. Neural Networks, 20(1):61--80, 2009

work page 2009

[18] [18]

Bruna, W

J. Bruna, W. Zaremba, A. Szlam, and Y. LeCun. Spectral networks and locally connected networks on graphs. In Proc. ICLR, 2014

work page 2014

[19] [19]

Defferrard, X

M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. In Proc. NeurIPS, pp.\,3844--3852, 2016

work page 2016

[20] [20]

Gilmer, S

J. Gilmer, S. S. Schoenholz, P. F. Riley, O. Vinyals, and G. E. Dahl. Neural message passing for quantum chemistry. In Proc. ICML, pp.\,1263--1272, 2017

work page 2017

[21] [21]

Schlichtkrull, T

M. Schlichtkrull, T. N. Kipf, P. Bloem, R. van den Berg, I. Titov, and M. Welling. Modeling relational data with graph convolutional networks. In Proc. ESWC, pp.\,593--607, 2018

work page 2018

[22] [22]

Lv et al

Q. Lv et al. Are we really making much progress? Revisiting, benchmarking and refining heterogeneous graph neural networks. In Proc. KDD, pp.\,1150--1160, 2021

work page 2021

[23] [23]

Geng et al

X. Geng et al. Spatiotemporal multigraph convolution network for ride-hailing demand forecasting. In Proc. AAAI, pp.\,3656--3663, 2019

work page 2019

[24] [24]

Zheng, X

C. Zheng, X. Fan, C. Wang, and J. Qi. GMAN: A graph multi-attention network for traffic prediction. In Proc. AAAI, pp.\,1234--1241, 2020

work page 2020

[25] [25]

R. Ying, D. Bourgeois, J. You, M. Zitnik, and J. Leskovec. GNNExplainer: Generating explanations for graph neural networks. In Proc. NeurIPS, pp.\,9244--9255, 2019

work page 2019

[26] [26]

Luo et al

D. Luo et al. Parameterized explainer for graph neural network. In Proc. NeurIPS, pp.\,19620--19631, 2020

work page 2020

[27] [27]

H. Yuan, H. Yu, J. Wang, K. Li, and S. Ji. On explainability of graph neural networks via subgraph explorations. In Proc. ICML, pp.\,12241--12252, 2021

work page 2021

[28] [28]

Sundararajan, A

M. Sundararajan, A. Taly, and Q. Yan. Axiomatic attribution for deep networks. In Proc. ICML, pp.\,3319--3328, 2017

work page 2017

[29] [29]

P. E. Pope, S. Kolouri, M. Rostami, C. E. Martin, and H. Hoffmann. Explainability methods for graph convolutional neural networks. In Proc. CVPR, pp.\,10772--10781, 2019

work page 2019

[30] [30]

Brody, U

S. Brody, U. Alon, and E. Yahav. How attentive are graph attention networks? In Proc. ICLR, 2022

work page 2022

[31] [31]

Zheng et al

C. Zheng et al. Robust graph representation learning via neural sparsification. In Proc. ICML, pp.\,11458--11468, 2020

work page 2020

[32] [32]

Chen et al

T. Chen et al. A unified lottery ticket hypothesis for graph neural networks. In Proc. ICML, pp.\,1695--1706, 2021

work page 2021

[33] [33]

Y. Rong, W. Huang, T. Xu, and J. Huang. DropEdge: Towards deep graph convolutional networks on node classification. In Proc. ICLR, 2019

work page 2019

[34] [34]

Yang et al

Y. Yang et al. Distilling knowledge from graph convolutional networks. In Proc. CVPR, pp.\,7074--7083, 2020

work page 2020

[35] [35]

Huang et al

Z. Huang et al. Scaling up graph neural networks via graph coarsening. In Proc. KDD, pp.\,675--685, 2021

work page 2021

[36] [36]

Ying et al

R. Ying et al. Graph convolutional neural networks for web-scale recommender systems. In Proc. KDD, pp.\,974--983, 2018

work page 2018

[37] [37]

Fatemi, L

B. Fatemi, L. El Asri, and S. M. Kazemi. SLAPS: Self-supervision improves structure learning for graph neural networks. In Proc. NeurIPS, pp.\,22667--22681, 2021

work page 2021