SparseBalance dynamically adjusts sparsity and batches workloads to load-balance sparse attention training, delivering up to 1.33x speedup and 0.46% better long-context performance on LongBench.
Gpipe: Efficient training of giant neural networks using pipeline parallelism
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention
SparseBalance dynamically adjusts sparsity and batches workloads to load-balance sparse attention training, delivering up to 1.33x speedup and 0.46% better long-context performance on LongBench.