Numinamath: The largest public dataset in ai4maths with 860k pairs of competition math problems and solutions

Jia Li, Edward Beeching, Lewis Tunstall, Ben Lipkin, Roman Soletskyi, Shengyi Huang, Kashif Rasul, Longhui Yu, Albert Q Jiang, Ziju Shen, et al · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Process Reinforcement through Implicit Rewards

cs.LG · 2025-02-03 · conditional · novelty 6.0

PRIME enables online process reward model updates in LLM RL using implicit rewards from rollouts and outcome labels, yielding 15.1% average gains on reasoning benchmarks and surpassing a stronger instruct model with 10% of the data.

citing papers explorer

Showing 1 of 1 citing paper.

Process Reinforcement through Implicit Rewards cs.LG · 2025-02-03 · conditional · none · ref 26
PRIME enables online process reward model updates in LLM RL using implicit rewards from rollouts and outcome labels, yielding 15.1% average gains on reasoning benchmarks and surpassing a stronger instruct model with 10% of the data.

Numinamath: The largest public dataset in ai4maths with 860k pairs of competition math problems and solutions

fields

years

verdicts

representative citing papers

citing papers explorer