A new distributed framework for graph transformer training auto-selects parallel strategies and optimizes sparse operations to deliver up to 6x speedup on 8 GPUs and 78% memory reduction.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.DC 2verdicts
UNVERDICTED 2representative citing papers
A method for adjoint differentiation of stencil loops that preserves their structure and parallelizability via combined AD and loop transformations, released as the PerforAD tool with seismic and CFD test cases.
citing papers explorer
-
Scalable and Adaptive Parallel Training of Graph Transformer on Large Graphs
A new distributed framework for graph transformer training auto-selects parallel strategies and optimizes sparse operations to deliver up to 6x speedup on 8 GPUs and 78% memory reduction.
-
Automatic Differentiation for Adjoint Stencil Loops
A method for adjoint differentiation of stencil loops that preserves their structure and parallelizability via combined AD and loop transformations, released as the PerforAD tool with seismic and CFD test cases.