DODOCO measurements show MoE routing imbalance is intrinsic to architecture and real text, not correctable by EP scaling or represented by mock tokens, forming two persistent Gini bands.
In2024 IEEE Symposium on High-Performance Interconnects (HOTI)
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.DC 3verdicts
UNVERDICTED 3representative citing papers
PICO is a benchmarking framework for collective operations that decouples portable setup from platform execution, supplies reference MPI implementations, and shows default choices can be up to 5x slower with up to 44% end-to-end training time reductions in simulator replays.
Flint generates compiler-derived workload graphs that support cluster-free design space exploration for distributed machine learning systems.
citing papers explorer
-
Diagnosing Overhead in Dispatch Operations: Cross-architecture Observatory
DODOCO measurements show MoE routing imbalance is intrinsic to architecture and real text, not correctable by EP scaling or represented by mock tokens, forming two persistent Gini bands.
-
PICO: Performance Insights for Collective Operations
PICO is a benchmarking framework for collective operations that decouples portable setup from platform execution, supplies reference MPI implementations, and shows default choices can be up to 5x slower with up to 44% end-to-end training time reductions in simulator replays.
-
Flint: Compiler Enabled Cluster-Free Design Space Exploration for Distributed ML
Flint generates compiler-derived workload graphs that support cluster-free design space exploration for distributed machine learning systems.