A 30B model trained via reverse-perplexity SFT, two-stage RL, and test-time scaling achieves gold-medal-level results on IMO 2025 and IPhO 2024/2025.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling
A 30B model trained via reverse-perplexity SFT, two-stage RL, and test-time scaling achieves gold-medal-level results on IMO 2025 and IPhO 2024/2025.