Optimizing mixture-of-experts inference time combining model deployment and communication scheduling

· 2024 · arXiv 2410.17043

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

SpaceMoE: Realizing Distributed Mixture-of-Experts Inference over Space Networks

cs.DC · 2026-05-01 · unverdicted · novelty 6.0 · 2 refs

SpaceMoE partitions MoE layers across orbiting satellite subnets in a ring and optimizes expert placement by activation probability and path latency, yielding at least 3x lower inference latency in thousand-satellite simulations versus random baselines.

Patterns behind Chaos: Forecasting Data Movement for Efficient Large-Scale MoE LLM Inference

cs.DC · 2025-10-07 · conditional · novelty 6.0

Comprehensive profiling of expert selection in frontier MoE models reveals temporal and spatial patterns that enable 6.6x speedup on wafer-scale GPUs and 1.25x on existing systems via targeted optimizations.

citing papers explorer

Showing 2 of 2 citing papers.

SpaceMoE: Realizing Distributed Mixture-of-Experts Inference over Space Networks cs.DC · 2026-05-01 · unverdicted · none · ref 15 · 2 links
SpaceMoE partitions MoE layers across orbiting satellite subnets in a ring and optimizes expert placement by activation probability and path latency, yielding at least 3x lower inference latency in thousand-satellite simulations versus random baselines.
Patterns behind Chaos: Forecasting Data Movement for Efficient Large-Scale MoE LLM Inference cs.DC · 2025-10-07 · conditional · none · ref 33
Comprehensive profiling of expert selection in frontier MoE models reveals temporal and spatial patterns that enable 6.6x speedup on wafer-scale GPUs and 1.25x on existing systems via targeted optimizations.

Optimizing mixture-of-experts inference time combining model deployment and communication scheduling

fields

years

verdicts

representative citing papers

citing papers explorer