Title resolution pending

Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V Le, Yonghui Wu, et al · 2019

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Exploiting Multicast for Accelerating Collective Communication

cs.DC · 2026-05-21 · unverdicted · novelty 6.0

MultiWrite is a new many-to-many transmission semantic that uses multicast principles to eliminate redundant packets in collective operations, delivering up to 33% lower latency for AllGather and AlltoAll on Ascend NPUs.

HARP: Orchestrating Automated Parallel Training on Heterogeneous GPU Clusters

cs.DC · 2025-09-29 · unverdicted · novelty 6.0

HARP provides a fine-grained inter-operator parallel planner and a heterogeneity-aware 1F1B scheduler that together improve training throughput by 1.3x-1.6x on mixed GPU clusters compared with current homogeneous-oriented frameworks.

The Carrier Pigeon Internet Protocol: An Algorithmic (and Lighthearted) Perspective

cs.NI · 2026-05-10 · unverdicted · novelty 4.0

Minimizing pigeons for 2-hop and multihop demands is NP-hard, but a polynomial-time demand-aggregation algorithm achieves a 2-approximation.

CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model Training

cs.DC · 2026-05-06 · unverdicted · novelty 4.0

CCL-D detects slow/hang anomalies in CCL for distributed training via lightweight tracing probes and an intelligent analyzer, achieving near-complete coverage and 6-minute rank localization on a 4000-GPU cluster over one year.

citing papers explorer

Showing 4 of 4 citing papers.

Exploiting Multicast for Accelerating Collective Communication cs.DC · 2026-05-21 · unverdicted · none · ref 13
MultiWrite is a new many-to-many transmission semantic that uses multicast principles to eliminate redundant packets in collective operations, delivering up to 33% lower latency for AllGather and AlltoAll on Ascend NPUs.
HARP: Orchestrating Automated Parallel Training on Heterogeneous GPU Clusters cs.DC · 2025-09-29 · unverdicted · none · ref 7
HARP provides a fine-grained inter-operator parallel planner and a heterogeneity-aware 1F1B scheduler that together improve training throughput by 1.3x-1.6x on mixed GPU clusters compared with current homogeneous-oriented frameworks.
The Carrier Pigeon Internet Protocol: An Algorithmic (and Lighthearted) Perspective cs.NI · 2026-05-10 · unverdicted · none · ref 25
Minimizing pigeons for 2-hop and multihop demands is NP-hard, but a polynomial-time demand-aggregation algorithm achieves a 2-approximation.
CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model Training cs.DC · 2026-05-06 · unverdicted · none · ref 17
CCL-D detects slow/hang anomalies in CCL for distributed training via lightweight tracing probes and an intelligent analyzer, achieving near-complete coverage and 6-minute rank localization on a 4000-GPU cluster over one year.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer