Bridge reduces All-to-All completion time by typically 3x to 10x and improves AllReduce by up to 6.6x over Ring by reusing optical subrings across multiple steps in reconfigurable networks.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
UCCL-Zip adds lossless compression to GPU communication to reduce LLM bottlenecks while preserving exact numerical correctness.
LatencyScope models 5G RAN latency sources across the protocol stack and provides a configuration analyzer that identifies settings meeting latency-reliability targets, validated on open-source testbeds and commercial network measurements where it outperforms prior models and simulators.
DBLP is a training-phase-aware bounded-loss transport protocol that reduces end-to-end distributed ML training time by 24.4% on average (up to 33.9%) and achieves up to 5.88x communication speedup during microbursts while maintaining comparable test accuracy.
citing papers explorer
-
Bridge: Optimizing Collective Communication Schedules in Reconfigurable Networks with Reusable Subrings
Bridge reduces All-to-All completion time by typically 3x to 10x and improves AllReduce by up to 6.6x over Ring by reusing optical subrings across multiple steps in reconfigurable networks.
-
UCCL-Zip: Lossless Compression Supercharged GPU Communication
UCCL-Zip adds lossless compression to GPU communication to reduce LLM bottlenecks while preserving exact numerical correctness.
-
LatencyScope: A System-Level Mathematical Framework for 5G RAN Latency
LatencyScope models 5G RAN latency sources across the protocol stack and provides a configuration analyzer that identifies settings meeting latency-reliability targets, validated on open-source testbeds and commercial network measurements where it outperforms prior models and simulators.
-
DBLP: Phase-Aware Bounded-Loss Transport for Burst-Resilient Distributed ML Training
DBLP is a training-phase-aware bounded-loss transport protocol that reduces end-to-end distributed ML training time by 24.4% on average (up to 33.9%) and achieves up to 5.88x communication speedup during microbursts while maintaining comparable test accuracy.