OlymMATH is a 350-problem Olympiad math benchmark combining bilingual natural-language evaluation with Lean 4 formal verification to test LLM reasoning.
Acemath: Advancing frontier math reasoning with post-training and reward modeling
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models
OlymMATH is a 350-problem Olympiad math benchmark combining bilingual natural-language evaluation with Lean 4 formal verification to test LLM reasoning.