Math-beyond: A benchmark for rl to expand beyond the base model, 2025.https://arxiv.org/abs/2510.11653

Prasanna Mayilvahanan, Ricardo Dominguez-Olmedo, Thaddäus Wiedemer, Wieland Brendel · 2025 · arXiv 2510.11653

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Harnessing Routing Foresight for Micro-step-level MoE load balancing in RL Post-training

cs.DC · 2026-06-10 · unverdicted · novelty 7.0

ForeMoE uses routing foresight from the rollout stage to enable micro-step load balancing in MoE RL post-training via a hierarchical planner and transfer engine, claiming up to 1.45x speedup on 64 GPUs.

LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition

cs.AI · 2026-05-19 · unverdicted · novelty 6.0

LC-ERD frames LLM self-alignment as latent structure mining via a Variational Logic Potential and Multi-Agent Value Decomposition to provide granular, logic-consistent supervision.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Harnessing Routing Foresight for Micro-step-level MoE load balancing in RL Post-training cs.DC · 2026-06-10 · unverdicted · none · ref 33
ForeMoE uses routing foresight from the rollout stage to enable micro-step load balancing in MoE RL post-training via a hierarchical planner and transfer engine, claiming up to 1.45x speedup on 64 GPUs.
LC-ERD: Mining Latent Logic for Self-Evolving Reasoning via Consistency-Regulated Reward Decomposition cs.AI · 2026-05-19 · unverdicted · none · ref 25
LC-ERD frames LLM self-alignment as latent structure mining via a Variational Logic Potential and Multi-Agent Value Decomposition to provide granular, logic-consistent supervision.

Math-beyond: A benchmark for rl to expand beyond the base model, 2025.https://arxiv.org/abs/2510.11653

fields

years

verdicts

representative citing papers

citing papers explorer