Pith Number

pith:7DMISRS7

pith:2025:7DMISRS7T2TMSVT23ZQWMYQBL7

not attested not anchored not stored refs resolved

DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning

Dian Yu, Dong Yu, Haitao Mi, Jiahao Xu, Linfeng Song, Qiuzhi Liu, Rui Wang, Tian Liang, Wenxuan Wang, Xingyu Chen, Yue Wang, Zhaopeng Tu, Zhenwen Liang, Zhiwei He, Zhuosheng Zhang

DeepMath-103K supplies 103K hard, clean math problems that let reinforcement learning reach state-of-the-art reasoning performance.

arxiv:2504.11456 v2 · 2025-04-15 · cs.CL · cs.AI

Open paper page JSON Open Graph Bundle Merged state What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{7DMISRS7T2TMSVT23ZQWMYQBL7}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

models trained on DeepMath-103K achieve state-of-the-art results on challenging mathematical benchmarks and demonstrate generalization beyond math such as biology, physics and chemistry

C2weakest assumption

The decontamination process fully removes overlap with numerous benchmarks and the selected problems remain sufficiently challenging and verifiable to produce genuine gains in reasoning capability.

C3one line summary

DeepMath-103K is a new 103K-problem mathematical dataset with high difficulty, rigorous decontamination, and verifiable answers to support RL training of language-model reasoning.

References

22 extracted · 22 resolved · 11 Pith anchors

[1] Marthe Ballon, Brecht Verbeken, Vincent Ginis, and Andres Algaba

[2] SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model 2023 · arXiv:2502.02737

[3] doi: 10.18653/v1/2023.emnlp-main.468 2023 · doi:10.18653/v1/2023.emnlp-main.468

[4] Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs · arXiv:2412.21187

[5] Training Verifiers to Solve Math Word Problems · arXiv:2110.14168

Formal links

1 machine-checked theorem link

Cited by

26 papers in Pith

AIPO: Learning to Reason from Active Interaction

OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles

A Survey of Reinforcement Learning for Large Reasoning Models

SPHINX: A Synthetic Environment for Visual Perception and Reasoning

SCALER:Synthetic Scalable Adaptive Learning Environment for Reasoning

Receipt and verification

First computed	2026-05-17T23:38:48.188069Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

f8d889465f9ea6c9567ade616662015fcf8899325d78cadaaa64e4f713039c33

Aliases

arxiv: 2504.11456 · arxiv_version: 2504.11456v2 · doi: 10.48550/arxiv.2504.11456 · pith_short_12: 7DMISRS7T2TM · pith_short_16: 7DMISRS7T2TMSVT2 · pith_short_8: 7DMISRS7

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/7DMISRS7T2TMSVT23ZQWMYQBL7 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: f8d889465f9ea6c9567ade616662015fcf8899325d78cadaaa64e4f713039c33

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "07c276a3630efe36049efdcd5a7b0393561f3daaeb3ad81157d140ec6a7b5b34",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-04-15T17:59:51Z",
    "title_canon_sha256": "53e33e5e58a1bb68b4dc91d0f24d06d1a094d9a28713b715d476eeb3c32a2cf6"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2504.11456",
    "kind": "arxiv",
    "version": 2
  }
}