arXiv preprint arXiv:2510.23027 , year=

Towards Stable, Effective Reinforcement Learning for Mixture-of-Experts , author= · arXiv 2510.23027

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

PADD: Path-Aligned Decompression Distillation for Non-Router Teacher to Guide MoE Student Learning

cs.CL · 2026-06-09 · unverdicted · novelty 5.0

PADD distills from dense teachers to MoE students via neuron clustering, expert warmup, online adaptive distillation, path-refined policy optimization, and reward-augmented load balancing, yielding gains on math reasoning benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

PADD: Path-Aligned Decompression Distillation for Non-Router Teacher to Guide MoE Student Learning cs.CL · 2026-06-09 · unverdicted · none · ref 17
PADD distills from dense teachers to MoE students via neuron clustering, expert warmup, online adaptive distillation, path-refined policy optimization, and reward-augmented load balancing, yielding gains on math reasoning benchmarks.

arXiv preprint arXiv:2510.23027 , year=

fields

years

verdicts

representative citing papers

citing papers explorer