Measuring mathematical problem solving with the MATH dataset

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt · 2021

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

MathArena: Evaluating LLMs on Uncontaminated Math Competitions

cs.AI · 2025-05-29 · unverdicted · novelty 7.0

MathArena evaluates over 50 LLMs on 162 fresh competition problems across seven contests, detects contamination in AIME 2024, and reports top models scoring below 40 percent on IMO 2025 proof tasks.

Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs

cs.CL · 2025-05-22 · unverdicted · novelty 6.0

Machine unlearning in LLMs is often reversible via fine-tuning, indicating suppression not deletion, and a new representation-level framework identifies four forgetting regimes based on reversibility and catastrophicity.

citing papers explorer

Showing 2 of 2 citing papers.

MathArena: Evaluating LLMs on Uncontaminated Math Competitions cs.AI · 2025-05-29 · unverdicted · none · ref 17
MathArena evaluates over 50 LLMs on 162 fresh competition problems across seven contests, detects contamination in AIME 2024, and reports top models scoring below 40 percent on IMO 2025 proof tasks.
Unlearning Isn't Deletion: Investigating Reversibility of Machine Unlearning in LLMs cs.CL · 2025-05-22 · unverdicted · none · ref 9
Machine unlearning in LLMs is often reversible via fine-tuning, indicating suppression not deletion, and a new representation-level framework identifies four forgetting regimes based on reversibility and catastrophicity.

Measuring mathematical problem solving with the MATH dataset

fields

years

verdicts

representative citing papers

citing papers explorer