Gpipe: Efficient training of giant neural networks using pipeline parallelism

Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V Le, Yonghui Wu, et al · 2019

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

unclear 1

representative citing papers

Ring Attention with Blockwise Transformers for Near-Infinite Context

cs.CL · 2023-10-03 · unverdicted · novelty 7.0

Ring Attention uses blockwise computation and ring communication to let Transformers process sequences up to device-count times longer than prior memory-efficient methods.

On the Convergence Theory of Pipeline Gradient-based Analog In-memory Training

cs.LG · 2024-10-19 · unverdicted · novelty 6.0

Analog-SGD-AP converges with iteration complexity O(ε^{-2} + ε^{-1}) for multi-layer DNNs on AIMC hardware despite analog weight-update imperfections and asynchronous stale gradients.

citing papers explorer

Showing 2 of 2 citing papers.

Ring Attention with Blockwise Transformers for Near-Infinite Context cs.CL · 2023-10-03 · unverdicted · none · ref 15
Ring Attention uses blockwise computation and ring communication to let Transformers process sequences up to device-count times longer than prior memory-efficient methods.
On the Convergence Theory of Pipeline Gradient-based Analog In-memory Training cs.LG · 2024-10-19 · unverdicted · none · ref 14
Analog-SGD-AP converges with iteration complexity O(ε^{-2} + ε^{-1}) for multi-layer DNNs on AIMC hardware despite analog weight-update imperfections and asynchronous stale gradients.

Gpipe: Efficient training of giant neural networks using pipeline parallelism

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer