Gpipe: Efficient training of giant neural networks using pipeline parallelism

· 2019 · arXiv 4287.345429

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention

cs.LG · 2026-04-15 · unverdicted · novelty 5.0

SparseBalance dynamically adjusts sparsity and batches workloads to load-balance sparse attention training, delivering up to 1.33x speedup and 0.46% better long-context performance on LongBench.

citing papers explorer

Showing 1 of 1 citing paper.

SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention cs.LG · 2026-04-15 · unverdicted · none · ref 10
SparseBalance dynamically adjusts sparsity and batches workloads to load-balance sparse attention training, delivering up to 1.33x speedup and 0.46% better long-context performance on LongBench.

Gpipe: Efficient training of giant neural networks using pipeline parallelism

fields

years

verdicts

representative citing papers

citing papers explorer