Gpipe: Efficient training of giant neural networks using pipeline parallelism

· 2019 · arXiv 4287.345429

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

A Tabular Schedule Abstraction for Communication-Aware Evaluation of Pipeline-Parallel LLM Training

cs.DC · 2026-05-19 · unverdicted · novelty 6.0

A new tabular abstraction for pipeline schedules shows communication can reverse rankings from bubble analysis alone, with GPipe and 1F1B runtime-equivalent but 1F1B lower in activation memory.

SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention

cs.LG · 2026-04-15 · unverdicted · novelty 5.0

SparseBalance dynamically adjusts sparsity and batches workloads to load-balance sparse attention training, delivering up to 1.33x speedup and 0.46% better long-context performance on LongBench.

citing papers explorer

Showing 2 of 2 citing papers after filters.

A Tabular Schedule Abstraction for Communication-Aware Evaluation of Pipeline-Parallel LLM Training cs.DC · 2026-05-19 · unverdicted · none · ref 6
A new tabular abstraction for pipeline schedules shows communication can reverse rankings from bubble analysis alone, with GPipe and 1F1B runtime-equivalent but 1F1B lower in activation memory.
SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention cs.LG · 2026-04-15 · unverdicted · none · ref 10
SparseBalance dynamically adjusts sparsity and batches workloads to load-balance sparse attention training, delivering up to 1.33x speedup and 0.46% better long-context performance on LongBench.

Gpipe: Efficient training of giant neural networks using pipeline parallelism

fields

years

verdicts

representative citing papers

citing papers explorer