Le, Yonghui Wu, and Zhifeng Chen

Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Mia Xu Chen, Dehao Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V · 2019

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

ResiHP: Taming LLM Training Failures with Dynamic Hybrid Parallelism

cs.DC · 2026-05-07 · unverdicted · novelty 4.0

ResiHP introduces a workload-aware failure detector and dynamic scheduler for hybrid-parallel LLM training that achieves 1.04-4.39x higher throughput than prior resilient systems under failures on a 256-GPU cluster.

DFLOP: A Data-driven Framework for Multimodal LLM Training Pipeline Optimization

cs.DC · 2026-03-26 · unverdicted · novelty 4.0

DFLOP is a data-driven framework that profiles data-induced computation variance and uses predictive scheduling to balance workloads in multimodal LLM training pipelines, claiming up to 3.6x faster training than existing frameworks.

citing papers explorer

Showing 2 of 2 citing papers.

ResiHP: Taming LLM Training Failures with Dynamic Hybrid Parallelism cs.DC · 2026-05-07 · unverdicted · none · ref 19
ResiHP introduces a workload-aware failure detector and dynamic scheduler for hybrid-parallel LLM training that achieves 1.04-4.39x higher throughput than prior resilient systems under failures on a 256-GPU cluster.
DFLOP: A Data-driven Framework for Multimodal LLM Training Pipeline Optimization cs.DC · 2026-03-26 · unverdicted · none · ref 30
DFLOP is a data-driven framework that profiles data-induced computation variance and uses predictive scheduling to balance workloads in multimodal LLM training pipelines, claiming up to 3.6x faster training than existing frameworks.

Le, Yonghui Wu, and Zhifeng Chen

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer