MegaTrain enables reliable full-precision training of up to 120B parameter LLMs on one H200 GPU with 1.5TB host memory via host-memory streaming, pipelined double-buffered execution, and stateless layer templates, achieving 1.84x throughput over DeepSpeed ZeRO-3 for 14B models.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
MegaTrain: Full Precision Training of 100B+ Parameter Large Language Models on a Single GPU
MegaTrain enables reliable full-precision training of up to 120B parameter LLMs on one H200 GPU with 1.5TB host memory via host-memory streaming, pipelined double-buffered execution, and stateless layer templates, achieving 1.84x throughput over DeepSpeed ZeRO-3 for 14B models.