A new tabular abstraction for pipeline schedules shows communication can reverse rankings from bubble analysis alone, with GPipe and 1F1B runtime-equivalent but 1F1B lower in activation memory.
Gpipe: Efficient training of giant neural networks using pipeline parallelism
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
SparseBalance dynamically adjusts sparsity and batches workloads to load-balance sparse attention training, delivering up to 1.33x speedup and 0.46% better long-context performance on LongBench.
citing papers explorer
-
A Tabular Schedule Abstraction for Communication-Aware Evaluation of Pipeline-Parallel LLM Training
A new tabular abstraction for pipeline schedules shows communication can reverse rankings from bubble analysis alone, with GPipe and 1F1B runtime-equivalent but 1F1B lower in activation memory.
-
SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention
SparseBalance dynamically adjusts sparsity and batches workloads to load-balance sparse attention training, delivering up to 1.33x speedup and 0.46% better long-context performance on LongBench.