SigGate-GT applies sigmoid gates to attention outputs in graph transformers to reduce over-smoothing, matching prior best on ZINC and setting new SOTA on ogbg-molhiv with gains over GraphGPS.
Exphormer: Sparse Transformers for Graphs
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
A new distributed framework for graph transformer training auto-selects parallel strategies and optimizes sparse operations to deliver up to 6x speedup on 8 GPUs and 78% memory reduction.
Transductive Sharpening adds an entropy-minimization term on unlabeled-node predictions to the training objective for graph node classification.
citing papers explorer
-
SigGate-GT: Taming Over-Smoothing in Graph Transformers via Sigmoid-Gated Attention
SigGate-GT applies sigmoid gates to attention outputs in graph transformers to reduce over-smoothing, matching prior best on ZINC and setting new SOTA on ogbg-molhiv with gains over GraphGPS.
-
Scalable and Adaptive Parallel Training of Graph Transformer on Large Graphs
A new distributed framework for graph transformer training auto-selects parallel strategies and optimizes sparse operations to deliver up to 6x speedup on 8 GPUs and 78% memory reduction.
-
Graph Transductive Sharpening: Leveraging Unlabeled Predictions in Node Classification
Transductive Sharpening adds an entropy-minimization term on unlabeled-node predictions to the training objective for graph node classification.