Pith Number

pith:EMHCOMD2

pith:2026:EMHCOMD2XZCIYW3CLHAGC32DRL

not attested not anchored not stored refs pending

Transformation-Augmented GRPO for Enhancing Exploration in Reasoning of Large Language Models

Chi-Heng Lin, Khiem Le, Nitesh V. Chawla, Phuc Nguyen, Shangqian Gao, Ting Hua, Youssef Mroueh

Augmenting GRPO training with automatic rephrasings of each question improves pass rates on competition math and science benchmarks by enabling mixed rewards and diverse reasoning paths.

arxiv:2601.22478 v4 · 2026-01-30 · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{EMHCOMD2XZCIYW3CLHAGC32DRL}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

TA-GRPO consistently improves pass@k on competition-level benchmarks (AMC, OlympiadBench, AIME24, AIME25) and out-of-distribution benchmarks (Minerva, GPQA-Diamond). Notably, it improves the average pass@32 of Qwen3-1.7B and Qwen3-4B by 4.97 and 4.34 points, respectively, and matches the exploration quality of baselines trained on up to 2.5× more data.

C2weakest assumption

The automatically generated rephrasings preserve semantic equivalence while meaningfully shifting the model's perceived difficulty, and that aligning importance ratios to the original question while computing advantages over the pooled set produces stable and beneficial policy updates without introducing bias or instability.

C3one line summary

TA-GRPO improves exploration in GRPO by rephrasing questions to mix rewards and reasoning paths, raising pass@32 scores by 4-5 points on math benchmarks while matching models trained on 2.5x more data.

Receipt and verification

First computed	2026-05-20T00:03:03.641485Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

230e27307abe448c5b6259c0616f438ace50ec874fdb19ca78e4f48447a49b66

Aliases

arxiv: 2601.22478 · arxiv_version: 2601.22478v4 · doi: 10.48550/arxiv.2601.22478 · pith_short_12: EMHCOMD2XZCI · pith_short_16: EMHCOMD2XZCIYW3C · pith_short_8: EMHCOMD2

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/EMHCOMD2XZCIYW3CLHAGC32DRL \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 230e27307abe448c5b6259c0616f438ace50ec874fdb19ca78e4f48447a49b66

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "629044055f32005f5e10844de48f60721ced72c81ec6a8d238e98998d4435d5a",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-01-30T02:43:29Z",
    "title_canon_sha256": "97d11adf4cf025a2b6cffb03864adc93d7e01fb7365007886fd97144780b2129"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2601.22478",
    "kind": "arxiv",
    "version": 4
  }
}