All research ideas, methods, experiments, and analyses were fully developed and con- ducted by the authors

13 Preprint A USE OFLARGELANGUAGEMODELS Large Language Models (LLMs) were used only to polish the writing (e · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Beyond Sunk Costs: Boosting LLM Pre-training Efficiency via Orthogonal Growth of Mixture-of-Experts

cs.LG · 2025-10-09 · unverdicted · novelty 5.0

Orthogonal growth recycles pre-trained MoE checkpoints via layer copying and noisy expert duplication, delivering 10.6% higher accuracy than training from scratch with equivalent extra compute.

citing papers explorer

Showing 1 of 1 citing paper.

Beyond Sunk Costs: Boosting LLM Pre-training Efficiency via Orthogonal Growth of Mixture-of-Experts cs.LG · 2025-10-09 · unverdicted · none · ref 29
Orthogonal growth recycles pre-trained MoE checkpoints via layer copying and noisy expert duplication, delivering 10.6% higher accuracy than training from scratch with equivalent extra compute.

All research ideas, methods, experiments, and analyses were fully developed and con- ducted by the authors

fields

years

verdicts

representative citing papers

citing papers explorer