LAER-MoE: Load-adaptive expert re-layout for efficient mixture- of-experts training,

· 2026 · arXiv 9212.379018

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

NanoCP: Request-Level Dynamic Context Parallelism for Data-Expert Parallel Decoding

cs.DC · 2026-05-20 · unverdicted · novelty 6.0

NanoCP introduces request-level dynamic context parallelism to decouple MoE communication from KV cache placement in hybrid data-expert parallel serving, reporting up to 3.27x higher request rates and 2.12x lower P99 latency under TPOT SLOs.

Diagnosing Overhead in Dispatch Operations: Cross-architecture Observatory

cs.DC · 2026-05-20 · unverdicted · novelty 6.0

DODOCO measurements show MoE routing imbalance is intrinsic to architecture and real text, not correctable by EP scaling or represented by mock tokens, forming two persistent Gini bands.

citing papers explorer

Showing 2 of 2 citing papers.

NanoCP: Request-Level Dynamic Context Parallelism for Data-Expert Parallel Decoding cs.DC · 2026-05-20 · unverdicted · none · ref 25
NanoCP introduces request-level dynamic context parallelism to decouple MoE communication from KV cache placement in hybrid data-expert parallel serving, reporting up to 3.27x higher request rates and 2.12x lower P99 latency under TPOT SLOs.
Diagnosing Overhead in Dispatch Operations: Cross-architecture Observatory cs.DC · 2026-05-20 · unverdicted · none · ref 3
DODOCO measurements show MoE routing imbalance is intrinsic to architecture and real text, not correctable by EP scaling or represented by mock tokens, forming two persistent Gini bands.

LAER-MoE: Load-adaptive expert re-layout for efficient mixture- of-experts training,

fields

years

verdicts

representative citing papers

citing papers explorer