LoopMoE is a looped MoE language model that outperforms matched vanilla MoE on 8 of 9 downstream benchmarks at 3B scale and continues to outperform at 9B scale under strictly controlled budgets.
arXiv preprint arXiv:2408.13359 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Optimal hyperparameters for LLM continued pre-training follow predictable scaling laws derived from proxy models, enabling a two-stage framework that predicts settings from compute budget and checkpoint state to reduce search overhead by 90%.
citing papers explorer
-
Predictable Scaling Laws of Optimal Hyperparameters for LLM Continued Pre-training
Optimal hyperparameters for LLM continued pre-training follow predictable scaling laws derived from proxy models, enabling a two-stage framework that predicts settings from compute budget and checkpoint state to reduce search overhead by 90%.