pith:DCUGBOVH
TTRL: Test-Time Reinforcement Learning
TTRL lets LLMs improve reasoning on unlabeled test data by treating majority voting as an RL reward.
arxiv:2504.16084 v3 · 2025-04-22 · cs.CL · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{DCUGBOVH4X7AOLTNYRNNTYVDSM}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
TTRL boosts the pass@1 performance of Qwen-2.5-Math-7B by approximately 211% on the AIME 2024 with only unlabeled test data. Furthermore, although TTRL is only supervised by the maj@n metric, TTRL has demonstrated performance to consistently surpass the upper limit of the initial model maj@n.
Common practices in Test-Time Scaling, such as majority voting, yield surprisingly effective rewards suitable for driving RL training on data without explicit labels.
TTRL lets LLMs self-improve on reasoning tasks via RL driven by majority-voting rewards from unlabeled test data, yielding large gains such as a 211% boost in pass@1 on AIME 2024.
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:39:19.658694Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
18a860baa7e5fe072e6dc45ad9e2a3933b005082d791b4c1d229328c30a609cf
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/DCUGBOVH4X7AOLTNYRNNTYVDSM \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 18a860baa7e5fe072e6dc45ad9e2a3933b005082d791b4c1d229328c30a609cf
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "a67ddc1e54252959d60b587e637eac83185a9cf4d0ef9e415b12b836eb2efb19",
"cross_cats_sorted": [
"cs.LG"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CL",
"submitted_at": "2025-04-22T17:59:56Z",
"title_canon_sha256": "d4d589bc9a2a2b582c5f2586c59afb84a0b79a0a71a479fb110208a5754aaf83"
},
"schema_version": "1.0",
"source": {
"id": "2504.16084",
"kind": "arxiv",
"version": 3
}
}