Pith Number

pith:Q4F7RUBK

pith:2026:Q4F7RUBKK2IZHFPIAJTVEUKXGM

not attested not anchored not stored refs resolved

GRACE: Gradient-aligned Reasoning Data Curation for Efficient Post-training

Jianghong Ma, Junjie Li, Ningxuan Ma, Xiaofeng Zhang, Ziao Wang

GRACE scores each reasoning step by its alignment with the answer gradient and trajectory consistency to select data subsets that match or exceed full performance with 5-20 percent of the samples.

arxiv:2605.13130 v1 · 2026-05-13 · cs.AI

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{Q4F7RUBKK2IZHFPIAJTVEUKXGM}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Post-training Qwen3-VL-2B-Instruct on MMathCoT-1M, GRACE reaches 108.8% of the full-data performance with 20% of the data and retains 100.2% with only 5%, with subsets that transfer effectively across model backbones.

C2weakest assumption

That the representation-level gradient proxy accurately captures step-level alignment with the answer-oriented gradient and that the two signals (alignment and consistency) reliably identify valuable reasoning steps without external reward models or step annotations.

C3one line summary

GRACE scores reasoning steps via gradient alignment and trajectory consistency to select data subsets that match full performance with 5% of the data on Qwen3-VL-2B-Instruct.

References

37 extracted · 37 resolved · 2 Pith anchors

[1] Chain-of-thought prompting elicits reasoning in large language models 2022

[3] Unlocking mul- timodal mathematical reasoning via process reward model 2025

[4] LIMA: less is more for alignment 2023

[5] Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V . Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. InThe Elev 2023

[6] Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Mille 2022

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-18T03:08:57.800483Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

870bf8d02a56919395e802675251573321a41d43e4aff08d897bd86a8641ff65

Aliases

arxiv: 2605.13130 · arxiv_version: 2605.13130v1 · doi: 10.48550/arxiv.2605.13130 · pith_short_12: Q4F7RUBKK2IZ · pith_short_16: Q4F7RUBKK2IZHFPI · pith_short_8: Q4F7RUBK

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/Q4F7RUBKK2IZHFPIAJTVEUKXGM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 870bf8d02a56919395e802675251573321a41d43e4aff08d897bd86a8641ff65

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "4227439fb8a5dcf0e7f10a0e15349f543aa9bcf7c65d3f3e5d9a89473a50b7bf",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2026-05-13T07:55:39Z",
    "title_canon_sha256": "8d985cead699e72f8e855647819a7df04204444661269631e7f85e2af3ba6cf1"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13130",
    "kind": "arxiv",
    "version": 1
  }
}