pith:CE5ZU6SW
Boosting LLM Reasoning via Human-Inspired Reward Shaping
T2T dual-phase rewards improve LLM math reasoning by shifting from broad exploration to concise condensation.
arxiv:2602.04265 v3 · 2026-02-04 · cs.LG · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{CE5ZU6SWGIGY2PET4BES5OKHEF}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
T2T significantly outperforms standard GRPO and recent baselines, achieving superior performance on mathematical benchmarks (MATH-500, AIME, AMC) across 5 mainstream LLMs.
The assumption that the dual-phase thickening-to-thinning reward mechanism, motivated by human learning patterns, will reliably translate into improved LLM reasoning without introducing training instabilities or overfitting to the specific benchmarks.
T2T reward framework boosts LLM math reasoning performance by shifting from exploration incentives on errors to length penalties on successes across multiple models and benchmarks.
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:39:16.376509Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
113b9a7a56320d8d3c93e0492eb947215bbb33dc2317939fdece3e5c3e4780be
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/CE5ZU6SWGIGY2PET4BES5OKHEF \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 113b9a7a56320d8d3c93e0492eb947215bbb33dc2317939fdece3e5c3e4780be
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "ea21a5de64b75dabca5c7e6a2f4f5b3acb3e56537ed3ceb5d164e8b737812ba7",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-02-04T06:55:58Z",
"title_canon_sha256": "c48d32d0cf21a054412cc2fd9d973c7771bf25b2909c0a018c5029eb86fbb10b"
},
"schema_version": "1.0",
"source": {
"id": "2602.04265",
"kind": "arxiv",
"version": 3
}
}