Graph Attention Networks

Adriana Romero; Arantxa Casanova; Guillem Cucurull; Petar Veli\v{c}kovi\'c; Pietro Li\`o; Yoshua Bengio

arxiv: 1710.10903 · v3 · submitted 2017-10-30 · 📊 stat.ML · cs.AI· cs.LG· cs.SI

Graph Attention Networks

Petar Veli\v{c}kovi\'c , Guillem Cucurull , Arantxa Casanova , Adriana Romero , Pietro Li\`o , Yoshua Bengio This is my paper

Pith reviewed 2026-05-10 17:59 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LGcs.SI

keywords graph attention networksGATself-attentiongraph neural networksnode classificationtransductive learninginductive learning

0 comments

The pith

Graph attention networks allow nodes to assign different importance weights to their neighbors using masked self-attention layers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces graph attention networks (GATs) as a neural network architecture for graph-structured data. It leverages masked self-attentional layers so nodes can attend to their neighbors and implicitly assign different weights to them. This avoids the need for costly matrix operations or prior knowledge of the graph structure, addressing limitations of graph convolution methods. The approach supports both transductive tasks on fixed graphs and inductive tasks on unseen graphs. Models based on this achieve or match state-of-the-art results on citation network benchmarks and protein-protein interaction data.

Core claim

By stacking layers in which nodes are able to attend over their neighborhoods' features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of costly matrix operation (such as inversion) or depending on knowing the graph structure upfront. In this way, we address several key challenges of spectral-based graph neural networks simultaneously, and make our model readily applicable to inductive as well as transductive problems.

What carries the argument

Masked self-attentional layers that allow nodes to attend over their neighborhoods' features and assign varying weights without costly operations or upfront graph knowledge.

If this is right

GAT models apply to inductive problems where test graphs are not seen during training.
The architecture achieves or matches state-of-the-art on Cora, Citeseer, and Pubmed citation networks.
It performs well on protein-protein interaction datasets with unseen test graphs.
Stacking attentional layers enables learning different node importances in neighborhoods.
The method overcomes shortcomings of prior graph convolution approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This attention mechanism could improve interpretability by showing which connections matter most in a graph.
It may generalize to other graph-based tasks like link prediction or graph classification.
Combining GAT with other neural network components might yield further gains on complex datasets.

Load-bearing premise

That masked self-attentional layers can implicitly specify different weights to different nodes in a neighborhood without any costly matrix operation or knowing the graph structure upfront.

What would settle it

Running the GAT model on the Cora dataset and finding that its accuracy does not reach or exceed previous methods when the attention mechanism is removed or when graph structure knowledge is withheld.

read the original abstract

We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods' features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of costly matrix operation (such as inversion) or depending on knowing the graph structure upfront. In this way, we address several key challenges of spectral-based graph neural networks simultaneously, and make our model readily applicable to inductive as well as transductive problems. Our GAT models have achieved or matched state-of-the-art results across four established transductive and inductive graph benchmarks: the Cora, Citeseer and Pubmed citation network datasets, as well as a protein-protein interaction dataset (wherein test graphs remain unseen during training).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GATs give a clean local attention mechanism over graph neighbors that supports induction and hits the reported benchmarks without spectral overhead.

read the letter

Colleague, the main point is that this paper shows how to replace fixed graph convolutions with masked self-attention computed only over immediate neighbors. Each node learns to weight its neighbors differently using a shared linear map on concatenated features, LeakyReLU, and softmax restricted to the adjacency list. That construction needs no global matrix operations or precomputed eigenbases, which directly enables the inductive setting where test graphs are never seen at training time. They report matching or exceeding prior results on Cora, Citeseer, Pubmed, and the PPI dataset under standard splits. The math is simple and local, the experiments use public benchmarks, and the citation pattern to GCN-style work is appropriate for positioning the difference. What the paper does well is demonstrate that this attention layer stacks cleanly and delivers usable performance on both transductive and inductive tasks without extra machinery. The data side looks reproducible from the described setup, and there is no sign of internal circularity or parameter fitting that would undermine the claims. Soft spots are limited. The abstract and stress-test note give no theoretical analysis or ablations that isolate attention from multi-head choices or the activation function, so the exact source of the gains stays partly opaque. That is a minor gap rather than a flaw, since the empirical numbers hold and the construction itself has no load-bearing problems. This work is for anyone building or extending graph neural networks who wants an inductive alternative to spectral methods. A reader focused on practical GNN architectures will extract the core idea and the benchmark numbers quickly. I would send it to peer review; the empirical grounding and the inductive capability are solid enough to merit referee attention even if revisions add some theory or controls.

Referee Report

2 major / 3 minor

Summary. The paper introduces Graph Attention Networks (GATs), a neural architecture for graph-structured data that stacks masked self-attentional layers. Nodes attend over neighborhood features via a shared linear transformation followed by LeakyReLU and softmax normalization restricted to the adjacency neighborhood, enabling implicit per-neighbor weighting without matrix inversion or global graph knowledge. The approach is positioned as addressing limitations of spectral GNNs while supporting both transductive and inductive settings. Empirical evaluation shows the models match or exceed prior state-of-the-art on the Cora, Citeseer, and Pubmed citation networks (transductive) and a protein-protein interaction dataset (inductive, with unseen test graphs).

Significance. If the reported results hold, GATs provide a practical, inductive-capable alternative to spectral methods by replacing fixed convolution weights with learned attention coefficients computed locally from node features. The architecture requires only the provided adjacency at each forward pass and avoids precomputed bases or costly operations, directly enabling the inductive claim on PPI. Credit is due for the clear experimental protocols, use of public benchmarks, and the multi-head attention formulation that stabilizes training.

major comments (2)

[§3.2] §3.2, Eq. (3)–(5): the masked self-attention coefficient computation is defined using a shared weight vector a and LeakyReLU, but the manuscript provides no ablation that isolates the contribution of the attention mechanism (e.g., replacing it with uniform or degree-based weights) from other design choices such as the two-layer architecture or the specific activation. This leaves open whether the performance gains on Cora/Citeseer/Pubmed are attributable to attention or to the overall model capacity.
[§4.3] §4.3, Table 2: the inductive PPI results report micro-F1 scores, but the paper does not report variance across multiple random seeds or graph splits for the unseen test graphs, making it difficult to assess whether the claimed matching of SOTA is statistically robust given the inductive setting.

minor comments (3)

[§2] §2: the related-work discussion of spectral methods could more explicitly contrast the O(N) per-layer cost of GAT attention with the eigen-decomposition requirements of earlier approaches.
Notation: the symbol W is reused for both the linear transformation in the attention mechanism and the output projection; a distinct symbol would improve readability.
Figure 1: the diagram of the attentional layer would benefit from an explicit arrow or label indicating the masking step that restricts attention to the neighborhood.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment, the recommendation for minor revision, and the constructive comments. We address each major comment below.

read point-by-point responses

Referee: [§3.2] §3.2, Eq. (3)–(5): the masked self-attention coefficient computation is defined using a shared weight vector a and LeakyReLU, but the manuscript provides no ablation that isolates the contribution of the attention mechanism (e.g., replacing it with uniform or degree-based weights) from other design choices such as the two-layer architecture or the specific activation. This leaves open whether the performance gains on Cora/Citeseer/Pubmed are attributable to attention or to the overall model capacity.

Authors: We appreciate the referee's point. While the manuscript demonstrates that GAT outperforms strong non-attentional baselines such as GCN (which uses fixed, degree-normalized weights) on the citation networks, we acknowledge that a direct ablation replacing the learned attention coefficients with uniform weights would more cleanly isolate the mechanism's contribution. We will add this ablation (GAT with uniform aggregation) to the revised manuscript. revision: yes
Referee: [§4.3] §4.3, Table 2: the inductive PPI results report micro-F1 scores, but the paper does not report variance across multiple random seeds or graph splits for the unseen test graphs, making it difficult to assess whether the claimed matching of SOTA is statistically robust given the inductive setting.

Authors: We agree that variance estimates would strengthen the inductive results. The reported numbers follow the single-run protocol used by the PPI dataset creators and prior inductive GNN papers. In the revision we will rerun the PPI experiments across multiple random seeds, reporting mean micro-F1 and standard deviation in the updated Table 2. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces GAT as a novel architecture whose core masked self-attention mechanism is explicitly defined via new equations (e.g., the attention coefficient computation using concatenated transformed features, LeakyReLU, and neighborhood softmax). This definition does not reduce to any prior fitted parameter, self-citation, or input by construction. Performance claims are purely empirical evaluations against external public benchmarks (Cora, Citeseer, Pubmed, PPI) rather than internal predictions. Prior work is cited only for context and shortcomings; the central construction stands independently and is not load-bearing on self-referential steps.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The model rests on standard neural-network differentiability and back-propagation assumptions plus the domain assumption that local neighborhood attention is sufficient to capture graph structure. No new physical entities are postulated; the attention layer is an architectural invention whose independent evidence is the reported benchmark gains.

free parameters (1)

attention weight matrices
Learned parameters of the attention mechanism that are fitted during training on the target task.

axioms (1)

domain assumption Graph neighborhoods can be processed by differentiable attention functions without requiring global graph knowledge
Invoked when the paper states the model is applicable to inductive problems where test graphs are unseen.

invented entities (1)

masked self-attentional layer no independent evidence
purpose: To compute normalized attention coefficients over only the immediate neighbors of each node
New architectural component introduced to replace fixed convolution weights.

pith-pipeline@v0.9.0 · 5477 in / 1273 out tokens · 52291 ms · 2026-05-10T17:59:21.789067+00:00 · methodology

discussion (0)

Forward citations

Cited by 60 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

GraphIP-Bench: How Hard Is It to Steal a Graph Neural Network, and Can We Stop It?
cs.CR 2026-05 accept novelty 8.0

GraphIP-Bench shows stealing GNNs is easy at moderate query budgets, most defenses fail to block or reliably trace extraction, and watermarks lose verification power on surrogates while heterophilic graphs are harder ...
A document is worth a structured record: Principled inductive bias design for document recognition
cs.CV 2025-07 unverdicted novelty 8.0

Introduces a method to design structure-specific relational inductive biases for a base transformer architecture, enabling end-to-end transcription of documents with intrinsic structures, demonstrated on sheet music, ...
Contrast to Detect: Dynamic Graph Contrastive Regularization for Unsupervised Anomaly Detection in Multivariate Time Series
cs.LG 2026-05 unverdicted novelty 7.0

ContrastAD achieves highest mean F1 on all five MTS benchmarks and highest AUC on three by building DTW-based sparse graph snapshots and contrasting divergent pairs with a stable anchor instead of enforcing invariance.
Gaussian Sheaf Neural Networks
cs.LG 2026-05 unverdicted novelty 7.0

Gaussian Sheaf Neural Networks derive a sheaf Laplacian for Gaussian node features on graphs to preserve their geometric structure during message passing.
NeighborDiv: Training-free Zero-shot Generalist Graph Anomaly Detection via Neighbor Diversity
cs.LG 2026-05 unverdicted novelty 7.0

NeighborDiv detects graph anomalies via variance of inter-neighbor feature similarities under a new Neighbor-to-Neighbor Diversity Paradigm, achieving SOTA results with zero volatility in zero-shot cross-domain settings.
Learning over Positive and Negative Edges with Contrastive Message Passing
cs.LG 2026-05 unverdicted novelty 7.0

Contrastive Message Passing lets GNNs apply similarity-preserving transforms to positive edges and dissimilarity-inducing transforms to negative edges via soft positive semidefinite constraints on weights, yielding ga...
Cross-attention-based bipartite graph neural network for coupled nodal and elemental field prediction in large-deformation sheet material forming
cs.CE 2026-05 unverdicted novelty 7.0

A cross-attention-based bipartite GNN predicts coupled nodal displacement increments and elemental thinning directly on their native mesh domains for sheet material forming.
Weather-Robust Cross-View Geo-Localization via Prototype-Based Semantic Part Discovery
cs.CV 2026-05 unverdicted novelty 7.0

SkyPart uses learnable prototypes for patch grouping, altitude modulation only in training, graph-attention readout, and Kendall-weighted loss to set new state-of-the-art single-pass performance on SUES-200, Universit...
TopoU-Net: a U-Net architecture for topological domains
cs.LG 2026-05 unverdicted novelty 7.0

TopoU-Net is a rank-path U-Net for combinatorial complexes that encodes by lifting cochains upward along incidences, decodes by transporting downward, and merges via skip connections at matched ranks.
CTQWformer: A CTQW-based Transformer for Graph Classification
cs.LG 2026-05 unverdicted novelty 7.0

CTQWformer fuses continuous-time quantum walks into a graph transformer and recurrent module to outperform standard GNNs and graph kernels on classification benchmarks.
Structural Interpretations of Protein Language Model Representations via Differentiable Graph Partitioning
cs.LG 2026-05 unverdicted novelty 7.0

SoftBlobGIN combines ESM-2 representations with protein contact graphs via a lightweight GNN and differentiable substructure pooling to achieve 92.8% accuracy on enzyme classification, raise binding-site AUROC to 0.98...
SGC-RML: A reliable and interpretable longitudinal assessment for PD in real-world DNS
cs.LG 2026-05 unverdicted novelty 7.0

SGC-RML creates an 8D symptom atlas from multimodal PD data and integrates conformal calibration to deliver reliable, rejectable longitudinal assessments.
Graphlets as Building Blocks for Structural Vocabulary in Knowledge Graph Foundation Models
cs.AI 2026-05 unverdicted novelty 7.0

Graphlets mined as structural tokens improve zero-shot inductive and transductive link prediction in knowledge graph foundation models across 51 diverse graphs.
Robustness of Graph Self-Supervised Learning to Real-World Noise: A Case Study on Text-Driven Biomedical Graphs
cs.LG 2026-05 unverdicted novelty 7.0

Feature reconstruction in GSSL is robust to noise in text-driven biomedical graphs while relation reconstruction is sensitive, with bidirectional GNN architectures performing better on noisy data and yielding up to 7%...
LUMINA: A Grid Foundation Model for Benchmarking AC Optimal Power Flow Surrogate Learning
cs.LG 2026-05 unverdicted novelty 7.0

LUMINA-Bench is a standardized evaluation framework for ACOPF surrogate models that tests generalization across multiple grid topologies using accuracy and physics-constraint metrics.
Graph Transformers and Stabilized Reinforcement Learning for Large-Scale Dynamic Routing Modulation and Spectrum Allocation in Elastic Optical Networks
cs.NI 2026-05 unverdicted novelty 7.0

A graph transformer with RL stabilizations is the first to exceed benchmarks for dynamic RMSA, supporting up to 13% more traffic load on networks up to 143 nodes.
Graph Transformers and Stabilized Reinforcement Learning for Large-Scale Dynamic Routing Modulation and Spectrum Allocation in Elastic Optical Networks
cs.NI 2026-05 conditional novelty 7.0

Graph transformer RL for dynamic RMSA supports up to 13% more traffic than benchmarks on networks up to 143 nodes and 362 links.
Empowering Heterogeneous Graph Foundation Models via Decoupled Relation Alignment
cs.SI 2026-05 unverdicted novelty 7.0

DRSA provides a plug-and-play alignment framework that decouples features and relations to prevent type collapse and relation confusion in heterogeneous graph foundation models.
Advancing Edge Classification through High-Dimensional Causal Modeling of Node-Edge Interplay
cs.LG 2026-05 unverdicted novelty 7.0

CECF is a new causal framework for edge classification that balances high-dimensional edge features against node influences via GNN embeddings and cross-attention to achieve better performance than standard methods.
PiGGO: Physics-Guided Learnable Graph Kalman Filters for Virtual Sensing of Nonlinear Dynamic Structures under Uncertainty
cs.LG 2026-04 unverdicted novelty 7.0

PiGGO integrates a learned graph neural ODE as the continuous-time dynamics model within an extended Kalman filter to enable online virtual sensing and uncertainty-aware state estimation for nonlinear dynamic systems ...
Hamiltonian Graph Inference Networks: Joint structure discovery and dynamics prediction for lattice Hamiltonian systems from trajectory data
cs.LG 2026-04 unverdicted novelty 7.0

HGIN jointly recovers interaction graphs and predicts trajectories for lattice Hamiltonian systems from data, achieving six to thirteen orders of magnitude lower long-time errors than baselines on Klein-Gordon and dis...
Continual Learning for fMRI-Based Brain Disorder Diagnosis via Functional Connectivity Matrices Generative Replay
q-bio.TO 2026-04 conditional novelty 7.0

A structure-aware VAE generates realistic FC matrices for replay, combined with multi-level knowledge distillation and hierarchical contextual bandit sampling, to enable continual fMRI-based brain disorder diagnosis a...
CapBench: A Multi-PDK Dataset for Machine-Learning-Based Post-Layout Capacitance Extraction
cs.AR 2026-04 accept novelty 7.0

CapBench is a new multi-PDK dataset of post-layout 3D windows with high-fidelity capacitance labels and multiple ML-ready representations, plus baseline results showing CNN accuracy versus GNN speed trade-offs.
Graph-RHO: Critical-path-aware Heterogeneous Graph Network for Long-Horizon Flexible Job-Shop Scheduling
cs.LG 2026-04 unverdicted novelty 7.0

Graph-RHO is a critical-path-aware heterogeneous graph network for rolling horizon optimization in flexible job-shop scheduling that achieves state-of-the-art solution quality and over 30% faster solve times on large ...
SCOT: Multi-Source Cross-City Transfer with Optimal-Transport Soft-Correspondence Objective
cs.LG 2026-04 unverdicted novelty 7.0

SCOT uses Sinkhorn entropic optimal transport to learn explicit soft correspondences between unequal region sets for multi-source cross-city transfer, adding contrastive sharpening and cycle reconstruction for stabili...
SCOT: Multi-Source Cross-City Transfer with Optimal-Transport Soft-Correspondence Objective
cs.LG 2026-04 unverdicted novelty 7.0

SCOT learns explicit soft region correspondences via entropic optimal transport and a shared prototype hub to improve multi-source cross-city transfer accuracy and robustness.
Graph Topology Information Enhanced Heterogeneous Graph Representation Learning
cs.LG 2026-04 unverdicted novelty 7.0

ToGRL learns high-quality graph structures from raw heterogeneous graphs via a two-stage topology extraction process and prompt tuning, outperforming prior methods on five datasets.
Hierarchical Mesh Transformers with Topology-Guided Pretraining for Morphometric Analysis of Brain Structures
cs.CV 2026-04 unverdicted novelty 7.0

A hierarchical mesh transformer using topology-guided pretraining on simplicial complexes achieves state-of-the-art results on Alzheimer's classification, amyloid prediction, and focal cortical dysplasia detection fro...
EndoVGGT: GNN-Enhanced Depth Estimation for Surgical 3D Reconstruction
cs.CV 2026-03 unverdicted novelty 7.0

EndoVGGT uses a dynamic DeGAT graph attention module to improve depth estimation and non-rigid 3D reconstruction in surgery, reporting 24.6% PSNR and 9.1% SSIM gains on SCARED with zero-shot generalization to new domains.
GraphScout: Empowering Large Language Models with Intrinsic Exploration Ability for Agentic Graph Reasoning
cs.AI 2026-03 unverdicted novelty 7.0

GraphScout trains LLMs to autonomously synthesize structured training data from knowledge graphs via flexible exploration tools, enabling a 4B model to outperform larger LLMs by 16.7% on average with fewer inference t...
Learning to Reconstruct: A Differentiable Approach to Muon Tracking at the LHC
hep-ex 2025-12 conditional novelty 7.0

A differentiable end-to-end model combining graph attention networks with clustering and fitting improves muon track reconstruction and pT estimation at the LHC compared to factorized approaches.
Cross-Paradigm Graph Backdoor Attacks with Promptable Subgraph Triggers
cs.CR 2025-10 unverdicted novelty 7.0

CP-GBA distills a queryable repository of promptable subgraph triggers via graph prompt learning to achieve transferable backdoor attacks on GNNs with state-of-the-art success rates across paradigms and defenses.
When LLM Agents Meet Graph Optimization: An Automated Data Quality Improvement Approach
cs.LG 2025-10 unverdicted novelty 7.0

LAGA is a unified multi-agent LLM framework that automates comprehensive quality optimization for text-attributed graphs by running detection, planning, action, and evaluation agents in a closed loop.
Effective Capacitance Modeling Using Graph Neural Networks
cs.LG 2025-07 unverdicted novelty 7.0

GNN-Ceff is the first graph neural network model for post-layout effective capacitance prediction in VLSI circuits, delivering up to 929x speedup over serial state-of-the-art methods with improved accuracy on real benchmarks.
Doloris: Dual Conditional Diffusion Implicit Bridges with Sparsity Masking Strategy for Unpaired Single-Cell Perturbation Estimation
cs.LG 2025-06 unverdicted novelty 7.0

Doloris introduces dual conditional diffusion implicit bridges plus a sparsity masking strategy to model unpaired single-cell perturbation responses and reports state-of-the-art results on public datasets.
Mamba-Based Graph Convolutional Networks: Tackling Over-smoothing with Selective State Space
cs.LG 2025-01 unverdicted novelty 7.0

MbaGCN combines message aggregation, selective state space transitions, and node state prediction to create a more scalable deep graph convolutional network.
Direction for Detection: A Survey of Automated Vulnerability Detection and all of its Pain Points
cs.SE 2024-12 conditional novelty 7.0

ML4AVD research remains locked into binary function-level classification of C/C++ vulnerabilities because twelve pain points in the pipeline reinforce each other through feedback loops.
Convex Compositional Reasoning Models
cs.LG 2026-05 unverdicted novelty 6.0

CCEM parameterizes compositional factors with input-convex neural networks and optimizes the summed energy over a convex relaxation, allowing models trained on small instances to transfer to larger ones.
Skinned Motion Retargeting with Spatially Adaptive Interaction Guidance
cs.GR 2026-05 unverdicted novelty 6.0

A geometry-aware retargeting method uses Transformer-refined adaptive anchors and a graph autoencoder to preserve interaction semantics like self-contact across characters with exaggerated proportions.
Deep Neural Sheaf Diffusion
cs.LG 2026-05 unverdicted novelty 6.0

DNSD replaces the sheaf Laplacian with a sheaf adjacency operator to maintain informative signals in deep layers, outperforming GNN and NSD baselines on long-range synthetic and real graph tasks.
ArtifactLinker: Linking Scientific Artifacts for Automatic State-of-the-Art Discovery
cs.LG 2026-05 unverdicted novelty 6.0

ArtifactLinker frames SOTA discovery as missing-link prediction on an artifact graph of models and datasets, with a two-stage ranking-plus-verification pipeline and a new benchmark of 14k artifacts.
Virtual Nodes Guided Dynamic Graph Neural Network for Brain Tumor Segmentation with Missing Modalities
cs.AI 2026-05 unverdicted novelty 6.0

A one-stage graph framework with modality-specific virtual nodes and dynamic adjacency adjustment for robust brain tumor segmentation under arbitrary missing MRI modalities, outperforming SOTA on BRATS-2018 and BRATS-...
CO-MAP: A Reinforcement Learning Approach to the Qubit Allocation Problem
quant-ph 2026-05 unverdicted novelty 6.0

Reinforcement learning policy for qubit mapping reduces SWAP overhead by 65-85% versus standard quantum compilers on MQTBench and Queko benchmark circuits.
Weather-Robust Cross-View Geo-Localization via Prototype-Based Semantic Part Discovery
cs.CV 2026-05 unverdicted novelty 6.0

SkyPart achieves state-of-the-art single-pass cross-view geo-localization on SUES-200, University-1652, and DenseUAV by using prototype-based part discovery, altitude-conditioned modulation, and Kendall-weighted loss,...
GRASP -- Graph-Based Anomaly Detection Through Self-Supervised Classification
cs.CR 2026-05 unverdicted novelty 6.0

GRASP detects anomalies in system provenance graphs via self-supervised executable prediction from two-hop neighborhoods, outperforming prior PIDS on DARPA datasets by identifying all documented attacks where behavior...
Mid-Circuit Measurements for Clifford Noise Reduction in Hamiltonian Simulations
quant-ph 2026-05 conditional novelty 6.0

Mid-circuit stabilizer verification in six-qubit GSE-encoded Clifford Trotter steps reduces logical error rates by up to 54% on Barium ion hardware, with the gain vanishing if checks are deferred to circuit end.
GCCM: Enhancing Generative Graph Prediction via Contrastive Consistency Model
cs.AI 2026-05 unverdicted novelty 6.0

GCCM prevents shortcut collapse in consistency models for graph prediction by using contrastive negative pairs and input feature perturbation, leading to better performance than deterministic baselines.
A Unified Benchmark for Evaluating Knowledge Graph Construction Methods and Graph Neural Networks
cs.LG 2026-05 unverdicted novelty 6.0

A dual-purpose benchmark supplies two text-derived knowledge graphs and one expert reference graph on the same biomedical corpus to jointly measure construction method quality and GNN robustness via semi-supervised no...
GEM: Graph-Enhanced Mixture-of-Experts with ReAct Agents for Dialogue State Tracking
cs.CL 2026-05 unverdicted novelty 6.0

GEM achieves 65.19% joint goal accuracy on MultiWOZ 2.2 by routing between a graph neural network expert for dialogue structure and a T5 expert for sequences, plus ReAct agents for value generation, outperforming prio...
Exploring Sparse Matrix Multiplication Kernels on the Cerebras CS-3
cs.DC 2026-04 unverdicted novelty 6.0

Cerebras CS-3 achieves up to 100x speedup over CPU for SpMM and 20x for SDDMM at 90% sparsity, with performance improving for larger matrices, but becomes slower than CPU beyond 99% sparsity.
Qubit-Scalable CVRP via Lagrangian Knapsack Decomposition and Noise-Aware Quantum Execution
quant-ph 2026-04 unverdicted novelty 6.0

A hybrid quantum framework decomposes CVRP into bounded-width knapsack subproblems, trains a reinforcement learning controller for Lagrangian multipliers, and uses a contextual bandit to adapt quantum hardware executi...
Robustness of Spatio-temporal Graph Neural Networks for Fault Location in Partially Observable Distribution Grids
cs.LG 2026-04 unverdicted novelty 6.0

Measured-only graph topologies enable STGNNs to achieve up to 11-point F1 gains and 6x faster training versus full-topology GNNs and RNN baselines for fault location in partially observable distribution grids.
ACT: Anti-Crosstalk Learning for Cross-Sectional Stock Ranking via Temporal Disentanglement and Structural Purification
cs.LG 2026-04 unverdicted novelty 6.0

ACT disentangles temporal scales in stock sequences and purifies structural relations in graphs to achieve state-of-the-art cross-sectional stock ranking on CSI300 and CSI500 with up to 74.25% improvement.
TACENR: Task-Agnostic Contrastive Explanations for Node Representations
cs.LG 2026-04 unverdicted novelty 6.0

TACENR introduces a contrastive-learning method that identifies the most influential attribute, proximity, and structural features in node representations in a task-agnostic manner.
LoReC: Rethinking Large Language Models for Graph Data Analysis
cs.LG 2026-04 unverdicted novelty 6.0

LoReC enhances LLMs for graph tasks via attention redistribution, graph re-injection into FFN, and logit rectification, yielding improvements over GraphLLM and GNN baselines on diverse datasets.
Program Structure-aware Language Models: Targeted Software Testing beyond Textual Semantics
cs.SE 2026-04 unverdicted novelty 6.0

GLMTest integrates code property graphs and GNNs with LLMs to steer test case generation toward targeted branches, raising branch accuracy from 27.4% to 50.2% on the TestGenEval benchmark.
TransXion: A High-Fidelity Graph Benchmark for Realistic Anti-Money Laundering
cs.LG 2026-04 unverdicted novelty 6.0

TransXion supplies a 3-million-transaction graph benchmark with profile-aware normal activity and stochastic illicit subgraphs that produces lower detection scores than prior AML datasets.
DuConTE: Dual-Granularity Text Encoder with Topology-Constrained Attention for Text-attributed Graphs
cs.CL 2026-04 unverdicted novelty 6.0

DuConTE is a dual-granularity text encoder that incorporates graph topology into language model attention for improved node representations in text-attributed graphs.
Region-Affinity Attention for Whole-Slide Breast Cancer Classification in Deep Ultraviolet Imaging
cs.CV 2026-04 unverdicted novelty 6.0

A novel Region-Affinity Attention mechanism classifies breast cancer on whole deep ultraviolet slides, achieving 92.67% accuracy and 95.97% AUC on 136 samples while outperforming standard attention methods.
Graph self-supervised learning based on frequency corruption
cs.LG 2026-04 unverdicted novelty 6.0

FC-GSSL improves graph SSL by generating high-frequency biased corrupted graphs via low-frequency contribution-based corruption, reconstructing low-frequency features in an autoencoder, and aligning multi-view represe...

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · cited by 141 Pith papers · 3 internal anchors

[1]

Software avail- able from tensorﬂow.org

URL https://www.tensorflow.org/. Software avail- able from tensorﬂow.org. James Atwood and Don Towsley. Diffusion-convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 1993–2001,

work page 1993
[2]

Long Short-Term Memory-Networks for Machine Reading

Jianpeng Cheng, Li Dong, and Mirella Lapata. Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733,

work page Pith review arXiv
[3]

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Kyunghyun Cho, Bart Van Merri¨enboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Hol- ger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078,

work page internal anchor Pith review arXiv
[4]

Programmable Agents

Misha Denil, Sergio G ´omez Colmenarejo, Serkan Cabi, David Saxton, and Nando de Freitas. Pro- grammable agents. arXiv preprint arXiv:1706.06383,

work page Pith review arXiv
[5]

One-Shot Imitation Learning

Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, and Wojciech Zaremba. One-shot imitation learning. arXiv preprint arXiv:1703.07326,

work page Pith review arXiv
[6]

A general framework for adaptive processing of data structures

10 Published as a conference paper at ICLR 2018 Paolo Frasconi, Marco Gori, and Alessandro Sperduti. A general framework for adaptive processing of data structures. IEEE transactions on Neural Networks, 9(5):768–786,

work page 2018
[8]

A Convolutional Encoder Model for Neural Machine Translation

URL http://arxiv. org/abs/1611.02344. Xavier Glorot and Yoshua Bengio. Understanding the difﬁculty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artiﬁcial Intelligence and Statistics, pp. 249–256,

work page Pith review arXiv
[9]

Deep Convolutional Networks on Graph-Structured Data

Mikael Henaff, Joan Bruna, and Yann LeCun. Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163,

work page Pith review arXiv
[10]

Adam: A Method for Stochastic Optimization

Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

A Structured Self-attentive Sentence Embedding

Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130,

work page Pith review arXiv
[12]

Geometric deep learning on graphs and manifolds using mixture model cnns

Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodol`a, Jan Svoboda, and Michael M Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns. arXiv preprint arXiv:1611.08402,

work page arXiv
[13]

Learning convolutional neural net- works for graphs

Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learning convolutional neural net- works for graphs. In Proceedings of The 33rd International Conference on Machine Learning , volume 48, pp. 2014–2023,

work page 2014
[14]

Deepwalk: Online learning of social repre- sentations

11 Published as a conference paper at ICLR 2018 Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social repre- sentations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710. ACM,

work page 2018
[15]

A simple neural network module for relational reasoning.arXiv preprint arXiv:1706.01427,

Adam Santoro, David Raposo, David GT Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, and Timothy Lillicrap. A simple neural network module for relational reasoning.arXiv preprint arXiv:1706.01427,

work page arXiv
[16]

doi: 10.1109/72.572108

ISSN 1045-9227. doi: 10.1109/72.572108. Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overﬁtting. Journal of machine learning research, 15(1):1929–1958,

work page doi:10.1109/72.572108 1929
[17]

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. arXiv preprint arXiv:1706.03762,

work page internal anchor Pith review Pith/arXiv arXiv
[19]

Memory Networks

URL http://arxiv.org/abs/1410.3916. Zhilin Yang, William Cohen, and Ruslan Salakhudinov. Revisiting semi-supervised learning with graph embeddings. In International Conference on Machine Learning, pp. 40–48,

work page Pith review arXiv

[1] [1]

Software avail- able from tensorﬂow.org

URL https://www.tensorflow.org/. Software avail- able from tensorﬂow.org. James Atwood and Don Towsley. Diffusion-convolutional neural networks. In Advances in Neural Information Processing Systems, pp. 1993–2001,

work page 1993

[2] [2]

Long Short-Term Memory-Networks for Machine Reading

Jianpeng Cheng, Li Dong, and Mirella Lapata. Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733,

work page Pith review arXiv

[3] [3]

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

Kyunghyun Cho, Bart Van Merri¨enboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Hol- ger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078,

work page internal anchor Pith review arXiv

[4] [4]

Programmable Agents

Misha Denil, Sergio G ´omez Colmenarejo, Serkan Cabi, David Saxton, and Nando de Freitas. Pro- grammable agents. arXiv preprint arXiv:1706.06383,

work page Pith review arXiv

[5] [5]

One-Shot Imitation Learning

Yan Duan, Marcin Andrychowicz, Bradly Stadie, Jonathan Ho, Jonas Schneider, Ilya Sutskever, Pieter Abbeel, and Wojciech Zaremba. One-shot imitation learning. arXiv preprint arXiv:1703.07326,

work page Pith review arXiv

[6] [6]

A general framework for adaptive processing of data structures

10 Published as a conference paper at ICLR 2018 Paolo Frasconi, Marco Gori, and Alessandro Sperduti. A general framework for adaptive processing of data structures. IEEE transactions on Neural Networks, 9(5):768–786,

work page 2018

[7] [8]

A Convolutional Encoder Model for Neural Machine Translation

URL http://arxiv. org/abs/1611.02344. Xavier Glorot and Yoshua Bengio. Understanding the difﬁculty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artiﬁcial Intelligence and Statistics, pp. 249–256,

work page Pith review arXiv

[8] [9]

Deep Convolutional Networks on Graph-Structured Data

Mikael Henaff, Joan Bruna, and Yann LeCun. Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163,

work page Pith review arXiv

[9] [10]

Adam: A Method for Stochastic Optimization

Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [11]

A Structured Self-attentive Sentence Embedding

Zhouhan Lin, Minwei Feng, Cicero Nogueira dos Santos, Mo Yu, Bing Xiang, Bowen Zhou, and Yoshua Bengio. A structured self-attentive sentence embedding. arXiv preprint arXiv:1703.03130,

work page Pith review arXiv

[11] [12]

Geometric deep learning on graphs and manifolds using mixture model cnns

Federico Monti, Davide Boscaini, Jonathan Masci, Emanuele Rodol`a, Jan Svoboda, and Michael M Bronstein. Geometric deep learning on graphs and manifolds using mixture model cnns. arXiv preprint arXiv:1611.08402,

work page arXiv

[12] [13]

Learning convolutional neural net- works for graphs

Mathias Niepert, Mohamed Ahmed, and Konstantin Kutzkov. Learning convolutional neural net- works for graphs. In Proceedings of The 33rd International Conference on Machine Learning , volume 48, pp. 2014–2023,

work page 2014

[13] [14]

Deepwalk: Online learning of social repre- sentations

11 Published as a conference paper at ICLR 2018 Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. Deepwalk: Online learning of social repre- sentations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 701–710. ACM,

work page 2018

[14] [15]

A simple neural network module for relational reasoning.arXiv preprint arXiv:1706.01427,

Adam Santoro, David Raposo, David GT Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, and Timothy Lillicrap. A simple neural network module for relational reasoning.arXiv preprint arXiv:1706.01427,

work page arXiv

[15] [16]

doi: 10.1109/72.572108

ISSN 1045-9227. doi: 10.1109/72.572108. Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overﬁtting. Journal of machine learning research, 15(1):1929–1958,

work page doi:10.1109/72.572108 1929

[16] [17]

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. arXiv preprint arXiv:1706.03762,

work page internal anchor Pith review Pith/arXiv arXiv

[17] [19]

Memory Networks

URL http://arxiv.org/abs/1410.3916. Zhilin Yang, William Cohen, and Ruslan Salakhudinov. Revisiting semi-supervised learning with graph embeddings. In International Conference on Machine Learning, pp. 40–48,

work page Pith review arXiv