pith. sign in
Pith Number

pith:DCUGBOVH

pith:2025:DCUGBOVH4X7AOLTNYRNNTYVDSM
not attested not anchored not stored refs pending

TTRL: Test-Time Reinforcement Learning

Biqing Qi, Bowen Zhou, Ermo Hua, Ganqu Cui, Haozhan Li, Kaiyan Zhang, Lifan Yuan, Li Sheng, Ning Ding, Shang Qu, Xinwei Long, Xuekai Zhu, Youbang Sun, Yuchen Zhang, Yuxin Zuo, Zhiyuan Ma

TTRL lets LLMs improve reasoning on unlabeled test data by treating majority voting as an RL reward.

arxiv:2504.16084 v3 · 2025-04-22 · cs.CL · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{DCUGBOVH4X7AOLTNYRNNTYVDSM}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

TTRL boosts the pass@1 performance of Qwen-2.5-Math-7B by approximately 211% on the AIME 2024 with only unlabeled test data. Furthermore, although TTRL is only supervised by the maj@n metric, TTRL has demonstrated performance to consistently surpass the upper limit of the initial model maj@n.

C2weakest assumption

Common practices in Test-Time Scaling, such as majority voting, yield surprisingly effective rewards suitable for driving RL training on data without explicit labels.

C3one line summary

TTRL lets LLMs self-improve on reasoning tasks via RL driven by majority-voting rewards from unlabeled test data, yielding large gains such as a 211% boost in pass@1 on AIME 2024.

Formal links

2 machine-checked theorem links

Cited by

41 papers in Pith

Receipt and verification
First computed 2026-05-17T23:39:19.658694Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

18a860baa7e5fe072e6dc45ad9e2a3933b005082d791b4c1d229328c30a609cf

Aliases

arxiv: 2504.16084 · arxiv_version: 2504.16084v3 · doi: 10.48550/arxiv.2504.16084 · pith_short_12: DCUGBOVH4X7A · pith_short_16: DCUGBOVH4X7AOLTN · pith_short_8: DCUGBOVH
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/DCUGBOVH4X7AOLTNYRNNTYVDSM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 18a860baa7e5fe072e6dc45ad9e2a3933b005082d791b4c1d229328c30a609cf
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "a67ddc1e54252959d60b587e637eac83185a9cf4d0ef9e415b12b836eb2efb19",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-04-22T17:59:56Z",
    "title_canon_sha256": "d4d589bc9a2a2b582c5f2586c59afb84a0b79a0a71a479fb110208a5754aaf83"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2504.16084",
    "kind": "arxiv",
    "version": 3
  }
}