Deepspeed: System opti- mizations enable training deep learning models with over 100 billion parameters

11 Under review as a conference paper at ICLR · 2026

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

GRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE Inference

cs.DC · 2025-09-29 · unverdicted · novelty 6.0

GRACE-MoE integrates expert grouping, dynamic replication, and locality-aware routing with hierarchical sparse communication to reduce end-to-end latency in distributed SMoE inference.

citing papers explorer

Showing 1 of 1 citing paper.

GRACE-MoE: Grouping and Replication with Locality-Aware Routing for Efficient Distributed MoE Inference cs.DC · 2025-09-29 · unverdicted · none · ref 11
GRACE-MoE integrates expert grouping, dynamic replication, and locality-aware routing with hierarchical sparse communication to reduce end-to-end latency in distributed SMoE inference.

Deepspeed: System opti- mizations enable training deep learning models with over 100 billion parameters

fields

years

verdicts

representative citing papers

citing papers explorer