A new distributed framework for graph transformer training auto-selects parallel strategies and optimizes sparse operations to deliver up to 6x speedup on 8 GPUs and 78% memory reduction.
InProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining(Washington DC, USA) (KDD ’22)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Scalable and Adaptive Parallel Training of Graph Transformer on Large Graphs
A new distributed framework for graph transformer training auto-selects parallel strategies and optimizes sparse operations to deliver up to 6x speedup on 8 GPUs and 78% memory reduction.