A 30B model trained via reverse-perplexity SFT, two-stage RL, and test-time scaling achieves gold-medal-level results on IMO 2025 and IPhO 2024/2025.
From (26), OI= |a+ 2k| 2 +L |∆| =R+ L |∆| .(29) Thus if we can showL=±r|∆|, we will haveOI=R±r, which implies tangency
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
A 30B model trained via reverse-perplexity SFT, two-stage RL, and test-time scaling achieves gold-medal-level results on IMO 2025 and IPhO 2024/2025.