ForeMoE uses routing foresight from the rollout stage to enable micro-step load balancing in MoE RL post-training via a hierarchical planner and transfer engine, claiming up to 1.45x speedup on 64 GPUs.
Math-beyond: A benchmark for rl to expand beyond the base model, 2025.https://arxiv.org/abs/2510.11653
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
LC-ERD frames LLM self-alignment as latent structure mining via a Variational Logic Potential and Multi-Agent Value Decomposition to provide granular, logic-consistent supervision.
citing papers explorer
-
Harnessing Routing Foresight for Micro-step-level MoE load balancing in RL Post-training
ForeMoE uses routing foresight from the rollout stage to enable micro-step load balancing in MoE RL post-training via a hierarchical planner and transfer engine, claiming up to 1.45x speedup on 64 GPUs.
-
LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition
LC-ERD frames LLM self-alignment as latent structure mining via a Variational Logic Potential and Multi-Agent Value Decomposition to provide granular, logic-consistent supervision.