A 30B model trained via reverse-perplexity SFT, two-stage RL, and test-time scaling achieves gold-medal-level results on IMO 2025 and IPhO 2024/2025.
The number of ways to reach each state for lengthi+ 1is therefore: Ai+1 = 1·A i + 0·B i + 0·C i =A i, Bi+1 = 1·A i + 2·B i + 0·C i =A i + 2Bi, Ci+1 = 8·A i + 8·B i + 10·C i
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
A 30B model trained via reverse-perplexity SFT, two-stage RL, and test-time scaling achieves gold-medal-level results on IMO 2025 and IPhO 2024/2025.