ELDR reduces median TPOT by 5.9-13.9% in PD-disaggregated MoE serving by routing decode requests via prefill-derived expert signatures and K-means locality partitioning over load-balancing baselines.
Opportunistic expert activation: Batch-aware expert routing for faster decode without retraining.arXiv preprint arXiv:2511.02237, 2025
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
dMoE aggregates token expert distributions to block level in dLLMs, cutting unique experts from 69.5 to 14.6, memory by 76-80%, and latency by 1.14-1.66x while retaining 99.11% performance.
citing papers explorer
-
dMoE: dLLMs with Learnable Block Experts
dMoE aggregates token expert distributions to block level in dLLMs, cutting unique experts from 69.5 to 14.6, memory by 76-80%, and latency by 1.14-1.66x while retaining 99.11% performance.