MathArena evaluates over 50 LLMs on 162 fresh competition problems across seven contests, detects contamination in AIME 2024, and reports top models scoring below 40 percent on IMO 2025 proof tasks.
Measuring mathematical problem solving with the MATH dataset
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2025 2verdicts
UNVERDICTED 2representative citing papers
Machine unlearning in LLMs is often reversible via fine-tuning, indicating suppression not deletion, and a new representation-level framework identifies four forgetting regimes based on reversibility and catastrophicity.
citing papers explorer
-
MathArena: Evaluating LLMs on Uncontaminated Math Competitions
MathArena evaluates over 50 LLMs on 162 fresh competition problems across seven contests, detects contamination in AIME 2024, and reports top models scoring below 40 percent on IMO 2025 proof tasks.
-
Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs
Machine unlearning in LLMs is often reversible via fine-tuning, indicating suppression not deletion, and a new representation-level framework identifies four forgetting regimes based on reversibility and catastrophicity.