Deepspeed: System optimizations enable training deep learning models with over 100 billion parameters

Jeff Rasley, Samyam Rajbhandari, Olatunji Ruwase, Yuxiong He · 2020

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

cs.LG · 2026-04-26 · conditional · novelty 6.0

Correcting DeepSpeed optimizer and OpenRLHF loss bugs reveals SFT-then-RL outperforms mixed-policy methods by 3.8-22.2 points on math benchmarks.

Showing 1 of 1 citing paper.

SFT-then-RL Outperforms Mixed-Policy Methods for LLM Reasoning cs.LG · 2026-04-26 · conditional · none · ref 23
Correcting DeepSpeed optimizer and OpenRLHF loss bugs reveals SFT-then-RL outperforms mixed-policy methods by 3.8-22.2 points on math benchmarks.