Recognition: 2 theorem links
· Lean TheoremHow Attentive are Graph Attention Networks?
Pith reviewed 2026-05-17 02:26 UTC · model grok-4.3
The pith
Graph Attention Networks use static attention that cannot express simple graph problems, fixed by reordering to create dynamic GATv2.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Graph Attention Networks compute attention scores such that the relative ordering among a node's neighbors is independent of the node's own representation, which the authors term static attention. This restriction means GAT cannot represent functions that require the ranking of neighbors to change with the query. The paper exhibits the limitation on a simple synthetic graph problem where the model cannot fit the training data, then shows that moving the linear transformation inside the attention function produces GATv2, which implements dynamic attention and is provably more expressive than the original GAT.
What carries the argument
Static attention in GAT, defined as neighbor ranking that does not depend on the query node representation.
If this is right
- GAT cannot solve graph problems that require attention rankings to depend on the querying node.
- GATv2 is strictly more expressive than GAT while using the same number of parameters.
- GATv2 outperforms the original GAT on 11 graph benchmarks from OGB and other sources.
- The modification is simple enough to integrate into existing GNN libraries.
Where Pith is reading between the lines
- Other attention-based graph models that fix neighbor rankings independently of the query may carry the same limitation.
- Replacing GAT layers with GATv2 in existing pipelines offers a low-cost way to test whether dynamic attention helps on a given dataset.
- Expressiveness checks using small controlled graphs could become a routine step when designing new graph attention variants.
Load-bearing premise
That the controlled synthetic problem captures the expressiveness limits that matter for real graph tasks and that reordering the operations fully converts static attention to dynamic attention without side effects.
What would settle it
A graph task in which each node must rank its neighbors differently according to its own features; GAT will fail to fit the data while GATv2 will succeed.
read the original abstract
Graph Attention Networks (GATs) are one of the most popular GNN architectures and are considered as the state-of-the-art architecture for representation learning with graphs. In GAT, every node attends to its neighbors given its own representation as the query. However, in this paper we show that GAT computes a very limited kind of attention: the ranking of the attention scores is unconditioned on the query node. We formally define this restricted kind of attention as static attention and distinguish it from a strictly more expressive dynamic attention. Because GATs use a static attention mechanism, there are simple graph problems that GAT cannot express: in a controlled problem, we show that static attention hinders GAT from even fitting the training data. To remove this limitation, we introduce a simple fix by modifying the order of operations and propose GATv2: a dynamic graph attention variant that is strictly more expressive than GAT. We perform an extensive evaluation and show that GATv2 outperforms GAT across 11 OGB and other benchmarks while we match their parametric costs. Our code is available at https://github.com/tech-srl/how_attentive_are_gats . GATv2 is available as part of the PyTorch Geometric library, the Deep Graph Library, and the TensorFlow GNN library.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that GAT computes only static attention (attention score rankings independent of the query node) because of the additive form LeakyReLU(a^T [W h_i || W h_j]) = LeakyReLU(c_i + d_j). It introduces a controlled synthetic graph problem in which GAT fails to fit the training data, proposes GATv2 via reordering of operations to obtain dynamic attention, proves GATv2 is strictly more expressive, and reports that GATv2 outperforms GAT on 11 OGB and other benchmarks at matched parameter cost.
Significance. If the central claims hold, the work is significant: it isolates a concrete, previously under-appreciated restriction in a widely adopted GNN, supplies a minimal architectural change that restores dynamic attention, and demonstrates consistent empirical gains. The public code release and integration into PyTorch Geometric, DGL, and TensorFlow GNN are concrete strengths that aid reproducibility.
major comments (2)
- [Controlled synthetic problem / expressivity argument] Synthetic-task section: the claim that GAT cannot express the target function rests on observed non-convergence rather than a proof that no parameter setting exists. Because reordering to GATv2 also changes the computation graph and gradient flow, the training-set failure could be an optimization artifact rather than a pure expressivity limit; a parameter-existence argument or explicit construction showing the required attention ranking is impossible under the static form would be needed to make the inference load-bearing.
- [GATv2 proposal] GATv2 definition and expressivity claim: while the reordering is presented as converting static to dynamic attention, the manuscript should explicitly verify that the new ordering preserves the original parameter count and does not introduce new degrees of freedom that could explain the performance difference independently of the static/dynamic distinction.
minor comments (2)
- [Abstract] Abstract: the phrase '11 OGB and other benchmarks' should either list the datasets or point to the specific table/figure that enumerates them.
- [Evaluation] Experimental section: confirm that all reported improvements include standard deviations over multiple runs and appropriate statistical tests; the current description leaves the strength of the outperformance claim unclear.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate the changes we will make in the revised manuscript.
read point-by-point responses
-
Referee: [Controlled synthetic problem / expressivity argument] Synthetic-task section: the claim that GAT cannot express the target function rests on observed non-convergence rather than a proof that no parameter setting exists. Because reordering to GATv2 also changes the computation graph and gradient flow, the training-set failure could be an optimization artifact rather than a pure expressivity limit; a parameter-existence argument or explicit construction showing the required attention ranking is impossible under the static form would be needed to make the inference load-bearing.
Authors: We agree that the original presentation relies on empirical non-convergence and that a formal argument would make the expressivity claim more robust. In the revised manuscript we will add an explicit construction: we exhibit a small graph and target attention ranking that cannot be realized by any choice of parameters under the static (additive) form LeakyReLU(c_i + d_j), because the ranking is forced to be independent of the query node. This construction is independent of optimization or gradient flow and directly shows that the limitation is architectural rather than an artifact of training dynamics. We retain the synthetic experiment as supporting evidence but now anchor the claim with the parameter-existence argument. revision: yes
-
Referee: [GATv2 proposal] GATv2 definition and expressivity claim: while the reordering is presented as converting static to dynamic attention, the manuscript should explicitly verify that the new ordering preserves the original parameter count and does not introduce new degrees of freedom that could explain the performance difference independently of the static/dynamic distinction.
Authors: We confirm that GATv2 uses exactly the same number and dimensionality of parameters as GAT. The only change is the order of the linear transformation, concatenation, and nonlinearity; no additional weight matrices or biases are introduced. In the revised manuscript we will insert a short paragraph (with an accompanying table) that explicitly counts the parameters for both models on the standard OGB setups, showing they are identical. We will also note that any performance difference must therefore stem from the change in attention expressivity rather than from increased model capacity. revision: yes
Circularity Check
Derivation is self-contained with no circular reductions
full rationale
The paper derives the static attention property directly from the additive decomposition of GAT's scoring function (LeakyReLU(a^T [W h_i || W h_j]) reducing to a query-independent ranking), introduces an independent formal definition of static vs. dynamic attention, and demonstrates the limitation via a new synthetic task where GAT fails to fit while the reordered GATv2 succeeds. GATv2 is obtained by a straightforward reordering of linear and nonlinearity operations that makes the score query-dependent by construction. No step reduces a claimed result to a prior fitted quantity, self-citation chain, or renamed input; the expressivity argument rests on the new definitions and the controlled experiment rather than any self-referential loop.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard assumptions on graph structure, node features, and message passing in GNNs
invented entities (1)
-
GATv2
no independent evidence
Forward citations
Cited by 18 Pith papers
-
Frequency-Space Mechanics: A Sequence and Coordinate-Free Representation for Protein Function Prediction
Vibrational mode graphs from molecular dynamics enable sequence-free protein function prediction via graph neural networks, with entrainment improving signals for collective dynamics.
-
Graphlets as Building Blocks for Structural Vocabulary in Knowledge Graph Foundation Models
Graphlets mined as structural tokens improve zero-shot inductive and transductive link prediction in knowledge graph foundation models across 51 diverse graphs.
-
Graph Transformers and Stabilized Reinforcement Learning for Large-Scale Dynamic Routing Modulation and Spectrum Allocation in Elastic Optical Networks
Graph transformer RL for dynamic RMSA supports up to 13% more traffic than benchmarks on networks up to 143 nodes and 362 links.
-
Evaluating LLMs on Large-Scale Graph Property Estimation via Random Walks
EstGraph benchmark evaluates LLMs on estimating properties of very large graphs from random-walk samples that fit in context limits.
-
Concept Graph Convolutions: Message Passing in the Concept Space
Concept Graph Convolutions perform message passing on node concepts to increase interpretability of graph neural networks without losing task performance.
-
Region-Grounded Report Generation for 3D Medical Imaging: A Fine-Grained Dataset and Graph-Enhanced Framework
Introduces VietPET-RoI dataset with fine-grained RoI annotations for Vietnamese 3D PET/CT and HiRRA graph framework that improves report generation by modeling region dependencies, claiming large gains over prior models.
-
Beyond Nodes vs. Edges: A Multi-View Fusion Framework for Provenance-Based Intrusion Detection
PROVFUSION fuses three complementary views of provenance data with lightweight schemes and voting to achieve higher detection accuracy and lower false positives than node- or edge-only baselines on nine benchmarks.
-
CapBench: A Multi-PDK Dataset for Machine-Learning-Based Post-Layout Capacitance Extraction
CapBench is a new multi-PDK dataset of post-layout 3D windows with high-fidelity capacitance labels and multiple ML-ready representations, plus baseline results showing CNN accuracy versus GNN speed trade-offs.
-
SCOT: Multi-Source Cross-City Transfer with Optimal-Transport Soft-Correspondence Objective
SCOT learns explicit soft region correspondences via entropic optimal transport and a shared prototype hub to improve multi-source cross-city transfer accuracy and robustness.
-
SCOT: Multi-Source Cross-City Transfer with Optimal-Transport Soft-Correspondence Objective
SCOT uses Sinkhorn entropic optimal transport to learn explicit soft correspondences between unequal region sets for multi-source cross-city transfer, adding contrastive sharpening and cycle reconstruction for stabili...
-
ID-PaS+ : Identity-Aware Predict-and-Search for General Mixed-Integer Linear Programs
ID-PaS+ introduces an identity-aware predict-and-search framework for general parametric MIPs that outperforms Gurobi and prior PAS methods on real-world large-scale instances.
-
Random-Set Graph Neural Networks
RS-GNNs predict random sets over classes using belief functions to jointly produce class probabilities and epistemic uncertainty estimates for graph nodes.
-
GEM: Graph-Enhanced Mixture-of-Experts with ReAct Agents for Dialogue State Tracking
GEM achieves 65.19% joint goal accuracy on MultiWOZ 2.2 by routing between a graph neural network expert for dialogue structure and a T5 expert for sequences, plus ReAct agents for value generation, outperforming prio...
-
SOAR: Real-Time Joint Optimization of Order Allocation and Robot Scheduling in Robotic Mobile Fulfillment Systems
SOAR is a unified DRL method using soft allocations, event-driven MDP, and heterogeneous graph transformers that cuts global makespan by 7.5% and average order completion time by 15.4% at sub-100ms latency in RMFS.
-
Qubit-Scalable CVRP via Lagrangian Knapsack Decomposition and Noise-Aware Quantum Execution
A hybrid quantum framework decomposes CVRP into bounded-width knapsack subproblems, trains a reinforcement learning controller for Lagrangian multipliers, and uses a contextual bandit to adapt quantum hardware executi...
-
A Structure-Preserving Graph Neural Solver for Parametric Hyperbolic Conservation Laws
A structure-preserving GNN solver for parametric hyperbolic conservation laws achieves superior long-horizon stability and orders-of-magnitude speedups over high-resolution simulations on supersonic flow benchmarks.
-
Learning Ad Hoc Network Dynamics via Graph-Structured World Models
G-RSSM learns per-node dynamics in wireless ad hoc networks via graph attention and trains clustering policies through imagined rollouts, generalizing from N=50 training to larger networks.
-
A Texture-Generalizable Deep Material Network via Orientation-Aware Interaction Learning for Polycrystal Modeling and Texture Evolution
TACS-GNN-ODMN infers micromechanical parameters from arbitrary polycrystal textures to build generalizable ODMN surrogates that predict nonlinear responses and texture evolution without retraining.
Reference graph
Works this paper leans on
-
[1]
Learning to represent programs with graphs
Miltiadis Allamanis, Marc Brockschmidt, and Mahmoud Khademi. Learning to represent programs with graphs. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=BJOFETxR-
work page 2018
-
[2]
On the bottleneck of graph neural networks and its practical implications
Uri Alon and Eran Yahav. On the bottleneck of graph neural networks and its practical implications. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=i80OPhOCVH2
work page 2021
-
[3]
Diffusion-convolutional neural networks
James Atwood and Don Towsley. Diffusion-convolutional neural networks. In Advances in neural information processing systems, pages 1993--2001, 2016
work page 1993
-
[4]
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. CoRR, abs/1409.0473, 2014. URL http://arxiv.org/abs/1409.0473
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[5]
Interaction networks for learning about objects, relations and physics
Peter Battaglia, Razvan Pascanu, Matthew Lai, Danilo Jimenez Rezende, and Koray kavukcuoglu. Interaction networks for learning about objects, relations and physics. In Proceedings of the 30th International Conference on Neural Information Processing Systems, pages 4509--4517, 2016
work page 2016
-
[6]
Gnn-film: Graph neural networks with feature-wise linear modulation
Marc Brockschmidt. Gnn-film: Graph neural networks with feature-wise linear modulation. Proceedings of the 36th International Conference on Machine Learning, ICML , 2020. URL https://github.com/microsoft/tf-gnn-samples
work page 2020
-
[7]
Geometric deep learning: going beyond euclidean data
Michael M Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. Geometric deep learning: going beyond euclidean data. IEEE Signal Processing Magazine, 34 0 (4): 0 18--42, 2017
work page 2017
-
[8]
Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković
Michael M. Bronstein, Joan Bruna, Taco Cohen, and Petar Veličković. Geometric deep learning: Grids, groups, graphs, geodesics, and gauges, 2021
work page 2021
-
[9]
Relational Graph Attention Networks
Dan Busbridge, Dane Sherburn, Pietro Cavallo, and Nils Y Hammerla. Relational graph attention networks. arXiv preprint arXiv:1904.05811, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1904
-
[10]
Approximation by superpositions of a sigmoidal function
George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2 0 (4): 0 303--314, 1989
work page 1989
-
[11]
Misha Denil, Sergio G \'o mez Colmenarejo, Serkan Cabi, David Saxton, and Nando de Freitas. Programmable agents. arXiv preprint arXiv:1706.06383, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[12]
Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, and Wojciech Zaremba. One-shot imitation learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 1087--1098, 2017
work page 2017
-
[13]
Convolutional networks on graphs for learning molecular fingerprints
David K Duvenaud, Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Al \'a n Aspuru-Guzik, and Ryan P Adams. Convolutional networks on graphs for learning molecular fingerprints. In Advances in neural information processing systems, pages 2224--2232, 2015
work page 2015
-
[14]
A generalization of transformer networks to graphs
Vijay Prakash Dwivedi and Xavier Bresson. A generalization of transformer networks to graphs. arXiv preprint arXiv:2012.09699, 2020
-
[15]
Benchmarking graph neural networks
Vijay Prakash Dwivedi, Chaitanya K Joshi, Thomas Laurent, Yoshua Bengio, and Xavier Bresson. Benchmarking graph neural networks. arXiv preprint arXiv:2003.00982, 2020
-
[16]
Matthias Fey and Jan E. Lenssen. Fast graph representation learning with PyTorch Geometric . In ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019
work page 2019
-
[17]
On the approximate realization of continuous mappings by neural networks
Ken-Ichi Funahashi. On the approximate realization of continuous mappings by neural networks. Neural networks, 2 0 (3): 0 183--192, 1989
work page 1989
-
[18]
Graph representation learning via hard and channel-wise attention networks
Hongyang Gao and Shuiwang Ji. Graph representation learning via hard and channel-wise attention networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 741--749, 2019
work page 2019
-
[19]
Neural message passing for quantum chemistry
Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 1263--1272. JMLR. org, 2017
work page 2017
-
[20]
Aleksa Gordić. pytorch-gat. https://github.com/gordicaleksa/pytorch-GAT, 2020
work page 2020
-
[21]
A new model for learning in graph domains
Marco Gori, Gabriele Monfardini, and Franco Scarselli. A new model for learning in graph domains. In Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005., volume 2, pages 729--734. IEEE, 2005
work page 2005
-
[22]
Inductive representation learning on large graphs
Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. In Advances in neural information processing systems, pages 1024--1034, 2017
work page 2017
-
[23]
Approximation capabilities of multilayer feedforward networks
Kurt Hornik. Approximation capabilities of multilayer feedforward networks. Neural networks, 4 0 (2): 0 251--257, 1991
work page 1991
-
[24]
Multilayer feedforward networks are universal approximators
Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal approximators. Neural networks, 2 0 (5): 0 359--366, 1989
work page 1989
-
[25]
Vain: attentional multi-agent predictive modeling
Yedid Hoshen. Vain: attentional multi-agent predictive modeling. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 2698--2708, 2017
work page 2017
-
[26]
Open graph benchmark: Datasets for machine learning on graphs
Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. Open graph benchmark: Datasets for machine learning on graphs. arXiv preprint arXiv:2005.00687, 2020
-
[27]
Syntax-aware aspect level sentiment classification with graph attention networks
Binxuan Huang and Kathleen M Carley. Syntax-aware aspect level sentiment classification with graph attention networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 5472--5480, 2019
work page 2019
-
[28]
Combining label propagation and simple models out-performs graph neural networks
Qian Huang, Horace He, Abhay Singh, Ser-Nam Lim, and Austin Benson. Combining label propagation and simple models out-performs graph neural networks. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=8E1-f3VhX1o
work page 2021
-
[29]
Transformers are graph neural networks
Chaitanya Joshi. Transformers are graph neural networks. The Gradient, 2020
work page 2020
-
[30]
How to find your friendly neighborhood: Graph attention design with self-supervision
Dongkwan Kim and Alice Oh. How to find your friendly neighborhood: Graph attention design with self-supervision. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=Wi5KUNlqWty
work page 2021
-
[31]
Semi-supervised classification with graph convolutional networks
Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. In ICLR, 2017
work page 2017
-
[32]
Social-bigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks
Vineet Kosaraju, Amir Sadeghian, Roberto Mart\' n-Mart\' n, Ian Reid, Hamid Rezatofighi, and Silvio Savarese. Social-bigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d Alch\' e -Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems, v...
work page 2019
-
[33]
Attention Models in Graphs: A Survey
John Boaz Lee, Ryan A Rossi, Sungchul Kim, Nesreen K Ahmed, and Eunyee Koh. Attention models in graphs: A survey. arXiv preprint arXiv:1807.07984, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[34]
Moshe Leshno, Vladimir Ya Lin, Allan Pinkus, and Shimon Schocken. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural networks, 6 0 (6): 0 861--867, 1993
work page 1993
-
[35]
Guohao Li, Matthias Muller, Ali Thabet, and Bernard Ghanem. Deepgcns: Can gcns go as deep as cnns? In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 9267--9276, 2019
work page 2019
-
[36]
Deeper insights into graph convolutional networks for semi-supervised learning
Qimai Li, Zhichao Han, and Xiao-Ming Wu. Deeper insights into graph convolutional networks for semi-supervised learning. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018
work page 2018
-
[37]
Gated graph sequence neural networks
Yujia Li, Daniel Tarlow, Marc Brockschmidt, and Richard Zemel. Gated graph sequence neural networks. In International Conference on Learning Representations, 2016
work page 2016
-
[38]
Gated relational graph attention networks, 2021
Denis Lukovnikov and Asja Fischer. Gated relational graph attention networks, 2021. URL https://openreview.net/forum?id=v-9E8egy_i
work page 2021
-
[39]
Thang Luong, Hieu Pham, and Christopher D. Manning. Effective approaches to attention-based neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, September 17-21, 2015 , pages 1412--1421, 2015. URL http://aclweb.org/anthology/D/D15/D15-1166.pdf
work page 2015
-
[40]
Entity-aware dependency-based deep graph attention network for comparative preference classification
Nianzu Ma, Sahisnu Mazumder, Hao Wang, and Bing Liu. Entity-aware dependency-based deep graph attention network for comparative preference classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5782--5788, 2020
work page 2020
-
[41]
Geometric deep learning on graphs and manifolds using mixture model cnns
Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodola, Jan Svoboda, and Michael M Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5115--5124, 2017
work page 2017
-
[42]
Learning attention-based embeddings for relation prediction in knowledge graphs
Deepak Nathani, Jatin Chauhan, Charu Sharma, and Manohar Kaul. Learning attention-based embeddings for relation prediction in knowledge graphs. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4710--4723, 2019
work page 2019
-
[43]
Minimum width for universal approximation
Sejun Park, Chulhee Yun, Jaeho Lee, and Jinwoo Shin. Minimum width for universal approximation. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=O-XJwyoIF-k
work page 2021
-
[44]
Approximation theory of the mlp model
Allan Pinkus. Approximation theory of the mlp model. Acta Numerica 1999: Volume 8, 8: 0 143--195, 1999
work page 1999
-
[45]
Deepinf: Social influence prediction with deep learning
Jiezhong Qiu, Jian Tang, Hao Ma, Yuxiao Dong, Kuansan Wang, and Jie Tang. Deepinf: Social influence prediction with deep learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’18), 2018
work page 2018
-
[46]
Quantum chemistry structures and properties of 134 kilo molecules
Raghunathan Ramakrishnan, Pavlo O Dral, Matthias Rupp, and O Anatole Von Lilienfeld. Quantum chemistry structures and properties of 134 kilo molecules. Scientific data, 1: 0 140022, 2014
work page 2014
-
[47]
Self-supervised graph transformer on large-scale molecular data
Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing Huang, and Junzhou Huang. Self-supervised graph transformer on large-scale molecular data. Advances in Neural Information Processing Systems, 33, 2020 a
work page 2020
-
[48]
Dropedge: Towards deep graph convolutional networks on node classification
Yu Rong, Wenbing Huang, Tingyang Xu, and Junzhou Huang. Dropedge: Towards deep graph convolutional networks on node classification. In International Conference on Learning Representations, 2020 b . URL https://openreview.net/forum?id=Hkx1qkrKPr
work page 2020
-
[49]
A simple neural network module for relational reasoning
Adam Santoro, David Raposo, David GT Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, and Timothy Lillicrap. A simple neural network module for relational reasoning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pages 4974--4983, 2017
work page 2017
-
[50]
The graph neural network model
Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. The graph neural network model. IEEE Transactions on Neural Networks, 20 0 (1): 0 61--80, 2008
work page 2008
-
[51]
Collective classification in network data
Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Galligher, and Tina Eliassi-Rad. Collective classification in network data. AI magazine, 29 0 (3): 0 93--93, 2008
work page 2008
-
[52]
Masked label prediction: Unified massage passing model for semi-supervised classification
Yunsheng Shi, Zhengjie Huang, Shikun Feng, and Yu Sun. Masked label prediction: Unified massage passing model for semi-supervised classification. arXiv preprint arXiv:2009.03509, 2020
-
[53]
Attention-based Graph Neural Network for Semi-supervised Learning
Kiran K Thekumparampil, Chong Wang, Sewoong Oh, and Li-Jia Li. Attention-based graph neural network for semi-supervised learning. arXiv preprint arXiv:1803.03735, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[54]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000--6010, 2017
work page 2017
-
[55]
Petar Veli c kovi \'c , Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li \`o , and Yoshua Bengio. Graph attention networks. In International Conference on Learning Representations, 2018
work page 2018
-
[56]
Petar Veli c kovi \'c , Lars Buesing, Matthew Overlan, Razvan Pascanu, Oriol Vinyals, and Charles Blundell. Pointer graph networks. Advances in Neural Information Processing Systems, 33, 2020
work page 2020
- [57]
-
[58]
Improving graph attention networks with large margin-based constraints
Guangtao Wang, Rex Ying, Jing Huang, and Jure Leskovec. Improving graph attention networks with large margin-based constraints. arXiv preprint arXiv:1910.11945, 2019 a
-
[59]
Deep graph library: A graph-centric, highly-performant package for graph neural networks
Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, Tianjun Xiao, Tong He, George Karypis, Jinyang Li, and Zheng Zhang. Deep graph library: A graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315, 2019 b
-
[60]
Heterogeneous graph attention network
Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S Yu. Heterogeneous graph attention network. In The World Wide Web Conference, pages 2022--2032, 2019 c
work page 2022
-
[61]
Bag of tricks of semi-supervised classification with graph neural networks
Yangkun Wang. Bag of tricks of semi-supervised classification with graph neural networks. arXiv preprint arXiv:2103.13355, 2021
-
[62]
On the practical computational power of finite precision rnns for language recognition
Gail Weiss, Yoav Goldberg, and Eran Yahav. On the practical computational power of finite precision rnns for language recognition. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 740--745, 2018
work page 2018
-
[63]
Simplifying graph convolutional networks
Felix Wu, Amauri Souza, Tianyi Zhang, Christopher Fifty, Tao Yu, and Kilian Weinberger. Simplifying graph convolutional networks. In International conference on machine learning, pages 6861--6871. PMLR, 2019
work page 2019
-
[64]
A comprehensive survey on graph neural networks
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and S Yu Philip. A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2020
work page 2020
-
[65]
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=ryGs6iA5Km
work page 2019
-
[66]
Distilling knowledge from graph convolutional networks
Yiding Yang, Jiayan Qiu, Mingli Song, Dacheng Tao, and Xinchao Wang. Distilling knowledge from graph convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2020
work page 2020
-
[67]
Graphsaint: Graph sampling based inductive learning method
Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. Graphsaint: Graph sampling based inductive learning method. arXiv preprint arXiv:1907.04931, 2019
-
[68]
Gaan: Gated attention networks for learning on large and spatiotemporal graphs
Jiani Zhang, Xingjian Shi, Junyuan Xie, Hao Ma, Irwin King, and Dit - Yan Yeung. Gaan: Gated attention networks for learning on large and spatiotemporal graphs. In Proceedings of the Thirty-Fourth Conference on Uncertainty in Artificial Intelligence, pages 339--349, 2018
work page 2018
-
[69]
Adaptive structural fingerprints for graph attention networks
Kai Zhang, Yaokang Zhu, Jun Wang, and Jie Zhang. Adaptive structural fingerprints for graph attention networks. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=BJxWx0NYPr
work page 2020
-
[70]
Pairnorm: Tackling oversmoothing in gnns
Lingxiao Zhao and Leman Akoglu. Pairnorm: Tackling oversmoothing in gnns. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=rkecl1rtwB
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.