Towards a standardized representation for deep learning collective algorithms

· 2024 · arXiv 3208.2024

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Diagnosing Overhead in Dispatch Operations: Cross-architecture Observatory

cs.DC · 2026-05-20 · unverdicted · novelty 6.0

DODOCO measurements show MoE routing imbalance is intrinsic to architecture and real text, not correctable by EP scaling or represented by mock tokens, forming two persistent Gini bands.

PICO: Performance Insights for Collective Operations

cs.DC · 2025-08-22 · unverdicted · novelty 6.0

PICO is a benchmarking framework for collective operations that decouples portable setup from platform execution, supplies reference MPI implementations, and shows default choices can be up to 5x slower with up to 44% end-to-end training time reductions in simulator replays.

Flint: Compiler Enabled Cluster-Free Design Space Exploration for Distributed ML

cs.DC · 2026-04-19 · unverdicted · novelty 5.0

Flint generates compiler-derived workload graphs that support cluster-free design space exploration for distributed machine learning systems.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Diagnosing Overhead in Dispatch Operations: Cross-architecture Observatory cs.DC · 2026-05-20 · unverdicted · none · ref 37
DODOCO measurements show MoE routing imbalance is intrinsic to architecture and real text, not correctable by EP scaling or represented by mock tokens, forming two persistent Gini bands.
PICO: Performance Insights for Collective Operations cs.DC · 2025-08-22 · unverdicted · none · ref 52
PICO is a benchmarking framework for collective operations that decouples portable setup from platform execution, supplies reference MPI implementations, and shows default choices can be up to 5x slower with up to 44% end-to-end training time reductions in simulator replays.
Flint: Compiler Enabled Cluster-Free Design Space Exploration for Distributed ML cs.DC · 2026-04-19 · unverdicted · none · ref 37
Flint generates compiler-derived workload graphs that support cluster-free design space exploration for distributed machine learning systems.

Towards a standardized representation for deep learning collective algorithms

fields

years

verdicts

representative citing papers

citing papers explorer