Gpipe: Efficient training of giant neural networks using pipeline parallelism

Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V Le, Yonghui Wu, et al · 2019

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved Offloading

cs.LG · 2024-10-26 · unverdicted · novelty 6.0

Deep Optimizer States splits LLMs into subgroups and uses a performance model to schedule optimizer updates on CPU or GPU, achieving 2.5x faster iterations than prior offloading methods when integrated with DeepSpeed.

Advancing Intelligent Sequence Modeling: Evolution, Trade-offs, and Applications of State- Space Architectures from S4 to Mamba

cs.LG · 2025-03-22 · unverdicted · novelty 0.0

A survey tracing the evolution of state-space models like S4 and Mamba, their efficiency trade-offs, and applications in NLP, vision, and other domains.

citing papers explorer

Showing 2 of 2 citing papers.

Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved Offloading cs.LG · 2024-10-26 · unverdicted · none · ref 11
Deep Optimizer States splits LLMs into subgroups and uses a performance model to schedule optimizer updates on CPU or GPU, achieving 2.5x faster iterations than prior offloading methods when integrated with DeepSpeed.
Advancing Intelligent Sequence Modeling: Evolution, Trade-offs, and Applications of State- Space Architectures from S4 to Mamba cs.LG · 2025-03-22 · unverdicted · none · ref 52
A survey tracing the evolution of state-space models like S4 and Mamba, their efficiency trade-offs, and applications in NLP, vision, and other domains.

Gpipe: Efficient training of giant neural networks using pipeline parallelism

fields

years

verdicts

representative citing papers

citing papers explorer