pith:EMHCOMD2
Transformation-Augmented GRPO for Enhancing Exploration in Reasoning of Large Language Models
Augmenting GRPO training with automatic rephrasings of each question improves pass rates on competition math and science benchmarks by enabling mixed rewards and diverse reasoning paths.
arxiv:2601.22478 v4 · 2026-01-30 · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{EMHCOMD2XZCIYW3CLHAGC32DRL}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
TA-GRPO consistently improves pass@k on competition-level benchmarks (AMC, OlympiadBench, AIME24, AIME25) and out-of-distribution benchmarks (Minerva, GPQA-Diamond). Notably, it improves the average pass@32 of Qwen3-1.7B and Qwen3-4B by 4.97 and 4.34 points, respectively, and matches the exploration quality of baselines trained on up to 2.5× more data.
The automatically generated rephrasings preserve semantic equivalence while meaningfully shifting the model's perceived difficulty, and that aligning importance ratios to the original question while computing advantages over the pooled set produces stable and beneficial policy updates without introducing bias or instability.
TA-GRPO improves exploration in GRPO by rephrasing questions to mix rewards and reasoning paths, raising pass@32 scores by 4-5 points on math benchmarks while matching models trained on 2.5x more data.
Receipt and verification
| First computed | 2026-05-20T00:03:03.641485Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
230e27307abe448c5b6259c0616f438ace50ec874fdb19ca78e4f48447a49b66
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/EMHCOMD2XZCIYW3CLHAGC32DRL \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 230e27307abe448c5b6259c0616f438ace50ec874fdb19ca78e4f48447a49b66
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "629044055f32005f5e10844de48f60721ced72c81ec6a8d238e98998d4435d5a",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-01-30T02:43:29Z",
"title_canon_sha256": "97d11adf4cf025a2b6cffb03864adc93d7e01fb7365007886fd97144780b2129"
},
"schema_version": "1.0",
"source": {
"id": "2601.22478",
"kind": "arxiv",
"version": 4
}
}