pith:OQOO75CB
ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment
Energy-guided test-time scaling samples directly from the optimal RL policy without any training.
arxiv:2601.21484 v3 · 2026-01-29 · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{OQOO75CB2EXUFQZBNSUDGQQNXB}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Our algorithm, Energy-Guided Test-Time Scaling (ETS), estimates the key energy term via online Monte Carlo, with a provable convergence rate. Moreover, to ensure practical efficiency, ETS leverages modern acceleration frameworks alongside tailored importance sampling estimators, substantially reducing inference latency while provably preserving sampling quality.
The energy term derived from the reference policy and optimal RL policy can be estimated accurately enough via online Monte Carlo to approximate the target distribution without introducing substantial bias or requiring post-hoc adjustments that affect the claimed convergence.
ETS enables direct sampling from the optimal RL policy for language models at inference time by estimating the energy term with online Monte Carlo and acceleration techniques.
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-20T01:05:07.225942Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
741ceff441d12f42c3216ca833420db8509f0e7658d8c4506b84587f5585c154
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/OQOO75CB2EXUFQZBNSUDGQQNXB \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 741ceff441d12f42c3216ca833420db8509f0e7658d8c4506b84587f5585c154
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "ec86568629ee249a0e9eb8b9c23ed7b5ad0b7ac0449eed7b9729614d306082f9",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-01-29T10:06:52Z",
"title_canon_sha256": "4d007177e9e248bbe1695479fc361a14ae94f9d4e6133f3af6b2bf9dac369cff"
},
"schema_version": "1.0",
"source": {
"id": "2601.21484",
"kind": "arxiv",
"version": 3
}
}