pith. machine review for the scientific record. sign in

arxiv: 2105.14491 · v3 · submitted 2021-05-30 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

How Attentive are Graph Attention Networks?

Authors on Pith no claims yet

Pith reviewed 2026-05-17 02:26 UTC · model grok-4.3

classification 💻 cs.LG
keywords graph attention networksGATGATv2static attentiondynamic attentiongraph neural networksexpressivenessbenchmarks
0
0 comments X

The pith

Graph Attention Networks use static attention that cannot express simple graph problems, fixed by reordering to create dynamic GATv2.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Graph Attention Networks attend to neighbors using a mechanism in which the ranking of attention scores stays the same no matter which node is querying. This static property stops the model from learning certain functions on graphs, as shown when GAT fails to fit the training data on a controlled synthetic task. By changing the order of the linear transformation and the attention score computation, the authors obtain GATv2 that conditions attention rankings on the query node. The change keeps the parameter count identical yet produces strictly higher expressiveness. A reader would care because GAT remains a standard building block for graph tasks, and the fix improves results on real benchmarks without added cost.

Core claim

Graph Attention Networks compute attention scores such that the relative ordering among a node's neighbors is independent of the node's own representation, which the authors term static attention. This restriction means GAT cannot represent functions that require the ranking of neighbors to change with the query. The paper exhibits the limitation on a simple synthetic graph problem where the model cannot fit the training data, then shows that moving the linear transformation inside the attention function produces GATv2, which implements dynamic attention and is provably more expressive than the original GAT.

What carries the argument

Static attention in GAT, defined as neighbor ranking that does not depend on the query node representation.

If this is right

  • GAT cannot solve graph problems that require attention rankings to depend on the querying node.
  • GATv2 is strictly more expressive than GAT while using the same number of parameters.
  • GATv2 outperforms the original GAT on 11 graph benchmarks from OGB and other sources.
  • The modification is simple enough to integrate into existing GNN libraries.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Other attention-based graph models that fix neighbor rankings independently of the query may carry the same limitation.
  • Replacing GAT layers with GATv2 in existing pipelines offers a low-cost way to test whether dynamic attention helps on a given dataset.
  • Expressiveness checks using small controlled graphs could become a routine step when designing new graph attention variants.

Load-bearing premise

That the controlled synthetic problem captures the expressiveness limits that matter for real graph tasks and that reordering the operations fully converts static attention to dynamic attention without side effects.

What would settle it

A graph task in which each node must rank its neighbors differently according to its own features; GAT will fail to fit the data while GATv2 will succeed.

read the original abstract

Graph Attention Networks (GATs) are one of the most popular GNN architectures and are considered as the state-of-the-art architecture for representation learning with graphs. In GAT, every node attends to its neighbors given its own representation as the query. However, in this paper we show that GAT computes a very limited kind of attention: the ranking of the attention scores is unconditioned on the query node. We formally define this restricted kind of attention as static attention and distinguish it from a strictly more expressive dynamic attention. Because GATs use a static attention mechanism, there are simple graph problems that GAT cannot express: in a controlled problem, we show that static attention hinders GAT from even fitting the training data. To remove this limitation, we introduce a simple fix by modifying the order of operations and propose GATv2: a dynamic graph attention variant that is strictly more expressive than GAT. We perform an extensive evaluation and show that GATv2 outperforms GAT across 11 OGB and other benchmarks while we match their parametric costs. Our code is available at https://github.com/tech-srl/how_attentive_are_gats . GATv2 is available as part of the PyTorch Geometric library, the Deep Graph Library, and the TensorFlow GNN library.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that GAT computes only static attention (attention score rankings independent of the query node) because of the additive form LeakyReLU(a^T [W h_i || W h_j]) = LeakyReLU(c_i + d_j). It introduces a controlled synthetic graph problem in which GAT fails to fit the training data, proposes GATv2 via reordering of operations to obtain dynamic attention, proves GATv2 is strictly more expressive, and reports that GATv2 outperforms GAT on 11 OGB and other benchmarks at matched parameter cost.

Significance. If the central claims hold, the work is significant: it isolates a concrete, previously under-appreciated restriction in a widely adopted GNN, supplies a minimal architectural change that restores dynamic attention, and demonstrates consistent empirical gains. The public code release and integration into PyTorch Geometric, DGL, and TensorFlow GNN are concrete strengths that aid reproducibility.

major comments (2)
  1. [Controlled synthetic problem / expressivity argument] Synthetic-task section: the claim that GAT cannot express the target function rests on observed non-convergence rather than a proof that no parameter setting exists. Because reordering to GATv2 also changes the computation graph and gradient flow, the training-set failure could be an optimization artifact rather than a pure expressivity limit; a parameter-existence argument or explicit construction showing the required attention ranking is impossible under the static form would be needed to make the inference load-bearing.
  2. [GATv2 proposal] GATv2 definition and expressivity claim: while the reordering is presented as converting static to dynamic attention, the manuscript should explicitly verify that the new ordering preserves the original parameter count and does not introduce new degrees of freedom that could explain the performance difference independently of the static/dynamic distinction.
minor comments (2)
  1. [Abstract] Abstract: the phrase '11 OGB and other benchmarks' should either list the datasets or point to the specific table/figure that enumerates them.
  2. [Evaluation] Experimental section: confirm that all reported improvements include standard deviations over multiple runs and appropriate statistical tests; the current description leaves the strength of the outperformance claim unclear.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate the changes we will make in the revised manuscript.

read point-by-point responses
  1. Referee: [Controlled synthetic problem / expressivity argument] Synthetic-task section: the claim that GAT cannot express the target function rests on observed non-convergence rather than a proof that no parameter setting exists. Because reordering to GATv2 also changes the computation graph and gradient flow, the training-set failure could be an optimization artifact rather than a pure expressivity limit; a parameter-existence argument or explicit construction showing the required attention ranking is impossible under the static form would be needed to make the inference load-bearing.

    Authors: We agree that the original presentation relies on empirical non-convergence and that a formal argument would make the expressivity claim more robust. In the revised manuscript we will add an explicit construction: we exhibit a small graph and target attention ranking that cannot be realized by any choice of parameters under the static (additive) form LeakyReLU(c_i + d_j), because the ranking is forced to be independent of the query node. This construction is independent of optimization or gradient flow and directly shows that the limitation is architectural rather than an artifact of training dynamics. We retain the synthetic experiment as supporting evidence but now anchor the claim with the parameter-existence argument. revision: yes

  2. Referee: [GATv2 proposal] GATv2 definition and expressivity claim: while the reordering is presented as converting static to dynamic attention, the manuscript should explicitly verify that the new ordering preserves the original parameter count and does not introduce new degrees of freedom that could explain the performance difference independently of the static/dynamic distinction.

    Authors: We confirm that GATv2 uses exactly the same number and dimensionality of parameters as GAT. The only change is the order of the linear transformation, concatenation, and nonlinearity; no additional weight matrices or biases are introduced. In the revised manuscript we will insert a short paragraph (with an accompanying table) that explicitly counts the parameters for both models on the standard OGB setups, showing they are identical. We will also note that any performance difference must therefore stem from the change in attention expressivity rather than from increased model capacity. revision: yes

Circularity Check

0 steps flagged

Derivation is self-contained with no circular reductions

full rationale

The paper derives the static attention property directly from the additive decomposition of GAT's scoring function (LeakyReLU(a^T [W h_i || W h_j]) reducing to a query-independent ranking), introduces an independent formal definition of static vs. dynamic attention, and demonstrates the limitation via a new synthetic task where GAT fails to fit while the reordered GATv2 succeeds. GATv2 is obtained by a straightforward reordering of linear and nonlinearity operations that makes the score query-dependent by construction. No step reduces a claimed result to a prior fitted quantity, self-citation chain, or renamed input; the expressivity argument rests on the new definitions and the controlled experiment rather than any self-referential loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on standard graph-neural-network assumptions about node features and neighborhood aggregation plus the newly introduced GATv2 construction; no ad-hoc fitted constants or new physical entities are invoked.

axioms (1)
  • domain assumption Standard assumptions on graph structure, node features, and message passing in GNNs
    Implicit background for all GAT-style models.
invented entities (1)
  • GATv2 no independent evidence
    purpose: Dynamic-attention variant of GAT
    Proposed architecture whose expressiveness is claimed to exceed GAT.

pith-pipeline@v0.9.0 · 5518 in / 1348 out tokens · 54909 ms · 2026-05-17T02:26:02.690351+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 18 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Frequency-Space Mechanics: A Sequence and Coordinate-Free Representation for Protein Function Prediction

    q-bio.BM 2026-05 unverdicted novelty 7.0

    Vibrational mode graphs from molecular dynamics enable sequence-free protein function prediction via graph neural networks, with entrainment improving signals for collective dynamics.

  2. Graphlets as Building Blocks for Structural Vocabulary in Knowledge Graph Foundation Models

    cs.AI 2026-05 unverdicted novelty 7.0

    Graphlets mined as structural tokens improve zero-shot inductive and transductive link prediction in knowledge graph foundation models across 51 diverse graphs.

  3. Graph Transformers and Stabilized Reinforcement Learning for Large-Scale Dynamic Routing Modulation and Spectrum Allocation in Elastic Optical Networks

    cs.NI 2026-05 conditional novelty 7.0

    Graph transformer RL for dynamic RMSA supports up to 13% more traffic than benchmarks on networks up to 143 nodes and 362 links.

  4. Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks

    cs.LG 2026-05 unverdicted novelty 7.0

    EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.

  5. Concept Graph Convolutions: Message Passing in the Concept Space

    cs.LG 2026-04 unverdicted novelty 7.0

    Concept Graph Convolutions perform message passing on node concepts to increase interpretability of graph neural networks without losing task performance.

  6. Region-Grounded Report Generation for 3D Medical Imaging: A Fine-Grained Dataset and Graph-Enhanced Framework

    cs.CV 2026-04 unverdicted novelty 7.0

    Introduces VietPET-RoI dataset with fine-grained RoI annotations for Vietnamese 3D PET/CT and HiRRA graph framework that improves report generation by modeling region dependencies, claiming large gains over prior models.

  7. Beyond Nodes vs. Edges: A Multi-View Fusion Framework for Provenance-Based Intrusion Detection

    cs.CR 2026-04 unverdicted novelty 7.0

    PROVFUSION fuses three complementary views of provenance data with lightweight schemes and voting to achieve higher detection accuracy and lower false positives than node- or edge-only baselines on nine benchmarks.

  8. CapBench: A Multi-PDK Dataset for Machine-Learning-Based Post-Layout Capacitance Extraction

    cs.AR 2026-04 accept novelty 7.0

    CapBench is a new multi-PDK dataset of post-layout 3D windows with high-fidelity capacitance labels and multiple ML-ready representations, plus baseline results showing CNN accuracy versus GNN speed trade-offs.

  9. SCOT: Multi-Source Cross-City Transfer with Optimal-Transport Soft-Correspondence Objective

    cs.LG 2026-04 unverdicted novelty 7.0

    SCOT learns explicit soft region correspondences via entropic optimal transport and a shared prototype hub to improve multi-source cross-city transfer accuracy and robustness.

  10. SCOT: Multi-Source Cross-City Transfer with Optimal-Transport Soft-Correspondence Objective

    cs.LG 2026-04 unverdicted novelty 7.0

    SCOT uses Sinkhorn entropic optimal transport to learn explicit soft correspondences between unequal region sets for multi-source cross-city transfer, adding contrastive sharpening and cycle reconstruction for stabili...

  11. ID-PaS+ : Identity-Aware Predict-and-Search for General Mixed-Integer Linear Programs

    cs.AI 2025-12 unverdicted novelty 7.0

    ID-PaS+ introduces an identity-aware predict-and-search framework for general parametric MIPs that outperforms Gurobi and prior PAS methods on real-world large-scale instances.

  12. Random-Set Graph Neural Networks

    cs.AI 2026-05 unverdicted novelty 6.0

    RS-GNNs predict random sets over classes using belief functions to jointly produce class probabilities and epistemic uncertainty estimates for graph nodes.

  13. GEM: Graph-Enhanced Mixture-of-Experts with ReAct Agents for Dialogue State Tracking

    cs.CL 2026-05 unverdicted novelty 6.0

    GEM achieves 65.19% joint goal accuracy on MultiWOZ 2.2 by routing between a graph neural network expert for dialogue structure and a T5 expert for sequences, plus ReAct agents for value generation, outperforming prio...

  14. SOAR: Real-Time Joint Optimization of Order Allocation and Robot Scheduling in Robotic Mobile Fulfillment Systems

    cs.AI 2026-05 unverdicted novelty 6.0

    SOAR is a unified DRL method using soft allocations, event-driven MDP, and heterogeneous graph transformers that cuts global makespan by 7.5% and average order completion time by 15.4% at sub-100ms latency in RMFS.

  15. Qubit-Scalable CVRP via Lagrangian Knapsack Decomposition and Noise-Aware Quantum Execution

    quant-ph 2026-04 unverdicted novelty 6.0

    A hybrid quantum framework decomposes CVRP into bounded-width knapsack subproblems, trains a reinforcement learning controller for Lagrangian multipliers, and uses a contextual bandit to adapt quantum hardware executi...

  16. A Structure-Preserving Graph Neural Solver for Parametric Hyperbolic Conservation Laws

    physics.comp-ph 2026-04 unverdicted novelty 6.0

    A structure-preserving GNN solver for parametric hyperbolic conservation laws achieves superior long-horizon stability and orders-of-magnitude speedups over high-resolution simulations on supersonic flow benchmarks.

  17. Learning Ad Hoc Network Dynamics via Graph-Structured World Models

    cs.LG 2026-04 unverdicted novelty 6.0

    G-RSSM learns per-node dynamics in wireless ad hoc networks via graph attention and trains clustering policies through imagined rollouts, generalizing from N=50 training to larger networks.

  18. A Texture-Generalizable Deep Material Network via Orientation-Aware Interaction Learning for Polycrystal Modeling and Texture Evolution

    cs.CE 2025-12 unverdicted novelty 6.0

    TACS-GNN-ODMN infers micromechanical parameters from arbitrary polycrystal textures to build generalizable ODMN surrogates that predict nonlinear responses and texture evolution without retraining.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · cited by 17 Pith papers · 5 internal anchors

  1. [1]

    Learning to represent programs with graphs

    Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to represent programs with graphs. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=BJOFETxR-

  2. [2]

    On the bottleneck of graph neural networks and its practical implications

    Uri Alon and Eran Yahav. On the bottleneck of graph neural networks and its practical implications. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=i80OPhOCVH2

  3. [3]

    Diffusion-convolutional neural networks

    James Atwood and Don Towsley. Diffusion-convolutional neural networks. In Advances in neural information processing systems, pages 1993--2001, 2016

  4. [4]

    Neural Machine Translation by Jointly Learning to Align and Translate

    Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473, 2014. URL http://arxiv.org/abs/1409.0473

  5. [5]

    Interaction networks for learning about objects, relations and physics

    Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, and Koray kavukcuoglu. Interaction networks for learning about objects, relations and physics. In Proceedings of the 30th International Conference on Neural Information Processing Systems, pages 4509--4517, 2016

  6. [6]

    Gnn-film: Graph neural networks with feature-wise linear modulation

    Marc Brockschmidt. Gnn-film: Graph neural networks with feature-wise linear modulation. Proceedings of the 36th International Conference on Machine Learning, ICML , 2020. URL https://github.com/microsoft/tf-gnn-samples

  7. [7]

    Geometric deep learning: going beyond euclidean data

    Michael M Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34 0 (4): 0 18--42, 2017

  8. [8]

    Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković

    Michael M. Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges, 2021

  9. [9]

    Relational Graph Attention Networks

    Dan Busbridge, Dane Sherburn, Pietro Cavallo, and Nils Y Hammerla. Relational graph attention networks. arXiv preprint arXiv:1904.05811, 2019

  10. [10]

    Approximation by superpositions of a sigmoidal function

    George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2 0 (4): 0 303--314, 1989

  11. [11]

    Programmable Agents

    Misha Denil, Sergio G \'o mez Colmenarejo, Serkan Cabi, David Saxton, and Nando de Freitas. Programmable agents. arXiv preprint arXiv:1706.06383, 2017

  12. [12]

    One-shot imitation learning

    Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, and Wojciech Zaremba. One-shot imitation learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 1087--1098, 2017

  13. [13]

    Convolutional networks on graphs for learning molecular fingerprints

    David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Al \'a n Aspuru-Guzik, and Ryan P Adams. Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, pages 2224--2232, 2015

  14. [14]

    A generalization of transformer networks to graphs

    Vijay Prakash Dwivedi and Xavier Bresson. A generalization of transformer networks to graphs. arXiv preprint arXiv:2012.09699, 2020

  15. [15]

    Benchmarking graph neural networks

    Vijay Prakash Dwivedi, Chaitanya K Joshi, Thomas Laurent, Yoshua Bengio, and Xavier Bresson. Benchmarking graph neural networks. arXiv preprint arXiv:2003.00982, 2020

  16. [16]

    Matthias Fey and Jan E. Lenssen. Fast graph representation learning with PyTorch Geometric . In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019

  17. [17]

    On the approximate realization of continuous mappings by neural networks

    Ken-Ichi Funahashi. On the approximate realization of continuous mappings by neural networks. Neural networks, 2 0 (3): 0 183--192, 1989

  18. [18]

    Graph representation learning via hard and channel-wise attention networks

    Hongyang Gao and Shuiwang Ji. Graph representation learning via hard and channel-wise attention networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 741--749, 2019

  19. [19]

    Neural message passing for quantum chemistry

    Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1263--1272. JMLR. org, 2017

  20. [20]

    pytorch-gat

    Aleksa Gordić. pytorch-gat. https://github.com/gordicaleksa/pytorch-GAT, 2020

  21. [21]

    A new model for learning in graph domains

    Marco Gori, Gabriele Monfardini, and Franco Scarselli. A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., volume 2, pages 729--734. IEEE, 2005

  22. [22]

    Inductive representation learning on large graphs

    Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Advances in neural information processing systems, pages 1024--1034, 2017

  23. [23]

    Approximation capabilities of multilayer feedforward networks

    Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural networks, 4 0 (2): 0 251--257, 1991

  24. [24]

    Multilayer feedforward networks are universal approximators

    Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural networks, 2 0 (5): 0 359--366, 1989

  25. [25]

    Vain: attentional multi-agent predictive modeling

    Yedid Hoshen. Vain: attentional multi-agent predictive modeling. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 2698--2708, 2017

  26. [26]

    Open graph benchmark: Datasets for machine learning on graphs

    Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs. arXiv preprint arXiv:2005.00687, 2020

  27. [27]

    Syntax-aware aspect level sentiment classification with graph attention networks

    Binxuan Huang and Kathleen M Carley. Syntax-aware aspect level sentiment classification with graph attention networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5472--5480, 2019

  28. [28]

    Combining label propagation and simple models out-performs graph neural networks

    Qian Huang, Horace He, Abhay Singh, Ser-Nam Lim, and Austin Benson. Combining label propagation and simple models out-performs graph neural networks. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=8E1-f3VhX1o

  29. [29]

    Transformers are graph neural networks

    Chaitanya Joshi. Transformers are graph neural networks. The Gradient, 2020

  30. [30]

    How to find your friendly neighborhood: Graph attention design with self-supervision

    Dongkwan Kim and Alice Oh. How to find your friendly neighborhood: Graph attention design with self-supervision. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=Wi5KUNlqWty

  31. [31]

    Semi-supervised classification with graph convolutional networks

    Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In ICLR, 2017

  32. [32]

    Social-bigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks

    Vineet Kosaraju, Amir Sadeghian, Roberto Mart\' n-Mart\' n, Ian Reid, Hamid Rezatofighi, and Silvio Savarese. Social-bigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alch\' e -Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, v...

  33. [33]

    Attention Models in Graphs: A Survey

    John Boaz Lee, Ryan A Rossi, Sungchul Kim, Nesreen K Ahmed, and Eunyee Koh. Attention models in graphs: A survey. arXiv preprint arXiv:1807.07984, 2018

  34. [34]

    Multilayer feedforward networks with a nonpolynomial activation function can approximate any function

    Moshe Leshno, Vladimir Ya Lin, Allan Pinkus, and Shimon Schocken. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural networks, 6 0 (6): 0 861--867, 1993

  35. [35]

    Deepgcns: Can gcns go as deep as cnns? In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9267--9276, 2019

    Guohao Li, Matthias Muller, Ali Thabet, and Bernard Ghanem. Deepgcns: Can gcns go as deep as cnns? In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9267--9276, 2019

  36. [36]

    Deeper insights into graph convolutional networks for semi-supervised learning

    Qimai Li, Zhichao Han, and Xiao-Ming Wu. Deeper insights into graph convolutional networks for semi-supervised learning. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018

  37. [37]

    Gated graph sequence neural networks

    Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated graph sequence neural networks. In International Conference on Learning Representations, 2016

  38. [38]

    Gated relational graph attention networks, 2021

    Denis Lukovnikov and Asja Fischer. Gated relational graph attention networks, 2021. URL https://openreview.net/forum?id=v-9E8egy_i

  39. [39]

    Thang Luong, Hieu Pham, and Christopher D. Manning. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015 , pages 1412--1421, 2015. URL http://aclweb.org/anthology/D/D15/D15-1166.pdf

  40. [40]

    Entity-aware dependency-based deep graph attention network for comparative preference classification

    Nianzu Ma, Sahisnu Mazumder, Hao Wang, and Bing Liu. Entity-aware dependency-based deep graph attention network for comparative preference classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5782--5788, 2020

  41. [41]

    Geometric deep learning on graphs and manifolds using mixture model cnns

    Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodola, Jan Svoboda, and Michael M Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5115--5124, 2017

  42. [42]

    Learning attention-based embeddings for relation prediction in knowledge graphs

    Deepak Nathani, Jatin Chauhan, Charu Sharma, and Manohar Kaul. Learning attention-based embeddings for relation prediction in knowledge graphs. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4710--4723, 2019

  43. [43]

    Minimum width for universal approximation

    Sejun Park, Chulhee Yun, Jaeho Lee, and Jinwoo Shin. Minimum width for universal approximation. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=O-XJwyoIF-k

  44. [44]

    Approximation theory of the mlp model

    Allan Pinkus. Approximation theory of the mlp model. Acta Numerica 1999: Volume 8, 8: 0 143--195, 1999

  45. [45]

    Deepinf: Social influence prediction with deep learning

    Jiezhong Qiu, Jian Tang, Hao Ma, Yuxiao Dong, Kuansan Wang, and Jie Tang. Deepinf: Social influence prediction with deep learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’18), 2018

  46. [46]

    Quantum chemistry structures and properties of 134 kilo molecules

    Raghunathan Ramakrishnan, Pavlo O Dral, Matthias Rupp, and O Anatole Von Lilienfeld. Quantum chemistry structures and properties of 134 kilo molecules. Scientific data, 1: 0 140022, 2014

  47. [47]

    Self-supervised graph transformer on large-scale molecular data

    Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing Huang, and Junzhou Huang. Self-supervised graph transformer on large-scale molecular data. Advances in Neural Information Processing Systems, 33, 2020 a

  48. [48]

    Dropedge: Towards deep graph convolutional networks on node classification

    Yu Rong, Wenbing Huang, Tingyang Xu, and Junzhou Huang. Dropedge: Towards deep graph convolutional networks on node classification. In International Conference on Learning Representations, 2020 b . URL https://openreview.net/forum?id=Hkx1qkrKPr

  49. [49]

    A simple neural network module for relational reasoning

    Adam Santoro, David Raposo, David GT Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, and Timothy Lillicrap. A simple neural network module for relational reasoning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 4974--4983, 2017

  50. [50]

    The graph neural network model

    Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE Transactions on Neural Networks, 20 0 (1): 0 61--80, 2008

  51. [51]

    Collective classification in network data

    Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. Collective classification in network data. AI magazine, 29 0 (3): 0 93--93, 2008

  52. [52]

    Masked label prediction: Unified massage passing model for semi-supervised classification

    Yunsheng Shi, Zhengjie Huang, Shikun Feng, and Yu Sun. Masked label prediction: Unified massage passing model for semi-supervised classification. arXiv preprint arXiv:2009.03509, 2020

  53. [53]

    Attention-based Graph Neural Network for Semi-supervised Learning

    Kiran K Thekumparampil, Chong Wang, Sewoong Oh, and Li-Jia Li. Attention-based graph neural network for semi-supervised learning. arXiv preprint arXiv:1803.03735, 2018

  54. [54]

    Attention is all you need

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000--6010, 2017

  55. [55]

    Graph attention networks

    Petar Veli c kovi \'c , Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li \`o , and Yoshua Bengio. Graph attention networks. In International Conference on Learning Representations, 2018

  56. [56]

    Pointer graph networks

    Petar Veli c kovi \'c , Lars Buesing, Matthew Overlan, Razvan Pascanu, Oriol Vinyals, and Charles Blundell. Pointer graph networks. Advances in Neural Information Processing Systems, 33, 2020

  57. [57]

    Veli c kovi \'c

    Petar et al. Veli c kovi \'c . Graph attention networks. 2018

  58. [58]

    Improving graph attention networks with large margin-based constraints

    Guangtao Wang, Rex Ying, Jing Huang, and Jure Leskovec. Improving graph attention networks with large margin-based constraints. arXiv preprint arXiv:1910.11945, 2019 a

  59. [59]

    Deep graph library: A graph-centric, highly-performant package for graph neural networks

    Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, Tianjun Xiao, Tong He, George Karypis, Jinyang Li, and Zheng Zhang. Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315, 2019 b

  60. [60]

    Heterogeneous graph attention network

    Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S Yu. Heterogeneous graph attention network. In The World Wide Web Conference, pages 2022--2032, 2019 c

  61. [61]

    Bag of tricks of semi-supervised classification with graph neural networks

    Yangkun Wang. Bag of tricks of semi-supervised classification with graph neural networks. arXiv preprint arXiv:2103.13355, 2021

  62. [62]

    On the practical computational power of finite precision rnns for language recognition

    Gail Weiss, Yoav Goldberg, and Eran Yahav. On the practical computational power of finite precision rnns for language recognition. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 740--745, 2018

  63. [63]

    Simplifying graph convolutional networks

    Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. Simplifying graph convolutional networks. In International conference on machine learning, pages 6861--6871. PMLR, 2019

  64. [64]

    A comprehensive survey on graph neural networks

    Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2020

  65. [65]

    How powerful are graph neural networks? In International Conference on Learning Representations, 2019

    Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=ryGs6iA5Km

  66. [66]

    Distilling knowledge from graph convolutional networks

    Yiding Yang, Jiayan Qiu, Mingli Song, Dacheng Tao, and Xinchao Wang. Distilling knowledge from graph convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020

  67. [67]

    Graphsaint: Graph sampling based inductive learning method

    Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. Graphsaint: Graph sampling based inductive learning method. arXiv preprint arXiv:1907.04931, 2019

  68. [68]

    Gaan: Gated attention networks for learning on large and spatiotemporal graphs

    Jiani Zhang, Xingjian Shi, Junyuan Xie, Hao Ma, Irwin King, and Dit - Yan Yeung. Gaan: Gated attention networks for learning on large and spatiotemporal graphs. In Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, pages 339--349, 2018

  69. [69]

    Adaptive structural fingerprints for graph attention networks

    Kai Zhang, Yaokang Zhu, Jun Wang, and Jie Zhang. Adaptive structural fingerprints for graph attention networks. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=BJxWx0NYPr

  70. [70]

    Pairnorm: Tackling oversmoothing in gnns

    Lingxiao Zhao and Leman Akoglu. Pairnorm: Tackling oversmoothing in gnns. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=rkecl1rtwB