DODOCO measurements show MoE routing imbalance is intrinsic to architecture and real text, not correctable by EP scaling or represented by mock tokens, forming two persistent Gini bands.
Towards a standardized representation for deep learning collective algorithms
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.DC 3verdicts
UNVERDICTED 3representative citing papers
PICO is a benchmarking framework for collective operations that decouples portable setup from platform execution, supplies reference MPI implementations, and shows default choices can be up to 5x slower with up to 44% end-to-end training time reductions in simulator replays.
Flint generates compiler-derived workload graphs that support cluster-free design space exploration for distributed machine learning systems.
citing papers explorer
-
Diagnosing Overhead in Dispatch Operations: Cross-architecture Observatory
DODOCO measurements show MoE routing imbalance is intrinsic to architecture and real text, not correctable by EP scaling or represented by mock tokens, forming two persistent Gini bands.
-
PICO: Performance Insights for Collective Operations
PICO is a benchmarking framework for collective operations that decouples portable setup from platform execution, supplies reference MPI implementations, and shows default choices can be up to 5x slower with up to 44% end-to-end training time reductions in simulator replays.
-
Flint: Compiler Enabled Cluster-Free Design Space Exploration for Distributed ML
Flint generates compiler-derived workload graphs that support cluster-free design space exploration for distributed machine learning systems.