Expert threshold routing for autoregressive language modeling with dynamic computation allocation and load balancing.arXiv preprint arXiv:2603.11535,

Hanchi Sun, Yixin Liu, Yonghui Wu, Lichao Sun · arXiv 2603.11535

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Post-Trained MoE Can Skip Half Experts via Self-Distillation

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

ZEDA injects zero-output experts and uses two-stage self-distillation to adapt post-trained MoE models into dynamic ones that skip over half the experts, yielding 1.2x inference speedup with small accuracy drops.

citing papers explorer

Showing 1 of 1 citing paper.

Post-Trained MoE Can Skip Half Experts via Self-Distillation cs.LG · 2026-05-18 · unverdicted · none · ref 31
ZEDA injects zero-output experts and uses two-stage self-distillation to adapt post-trained MoE models into dynamic ones that skip over half the experts, yielding 1.2x inference speedup with small accuracy drops.

Expert threshold routing for autoregressive language modeling with dynamic computation allocation and load balancing.arXiv preprint arXiv:2603.11535,

fields

years

verdicts

representative citing papers

citing papers explorer