pith. sign in
Pith Number

pith:OQOO75CB

pith:2026:OQOO75CB2EXUFQZBNSUDGQQNXB
not attested not anchored not stored refs pending

ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment

Jinkai Zhang, Ju Fan, Longqiang Wang, Mingyang Yi, Xiuyu Li, Yue Wang, Yu Li

Energy-guided test-time scaling samples directly from the optimal RL policy without any training.

arxiv:2601.21484 v3 · 2026-01-29 · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{OQOO75CB2EXUFQZBNSUDGQQNXB}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our algorithm, Energy-Guided Test-Time Scaling (ETS), estimates the key energy term via online Monte Carlo, with a provable convergence rate. Moreover, to ensure practical efficiency, ETS leverages modern acceleration frameworks alongside tailored importance sampling estimators, substantially reducing inference latency while provably preserving sampling quality.

C2weakest assumption

The energy term derived from the reference policy and optimal RL policy can be estimated accurately enough via online Monte Carlo to approximate the target distribution without introducing substantial bias or requiring post-hoc adjustments that affect the claimed convergence.

C3one line summary

ETS enables direct sampling from the optimal RL policy for language models at inference time by estimating the energy term with online Monte Carlo and acceleration techniques.

Formal links

2 machine-checked theorem links

Cited by

1 paper in Pith

Receipt and verification
First computed 2026-05-20T01:05:07.225942Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

741ceff441d12f42c3216ca833420db8509f0e7658d8c4506b84587f5585c154

Aliases

arxiv: 2601.21484 · arxiv_version: 2601.21484v3 · doi: 10.48550/arxiv.2601.21484 · pith_short_12: OQOO75CB2EXU · pith_short_16: OQOO75CB2EXUFQZB · pith_short_8: OQOO75CB
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/OQOO75CB2EXUFQZBNSUDGQQNXB \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 741ceff441d12f42c3216ca833420db8509f0e7658d8c4506b84587f5585c154
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "ec86568629ee249a0e9eb8b9c23ed7b5ad0b7ac0449eed7b9729614d306082f9",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-01-29T10:06:52Z",
    "title_canon_sha256": "4d007177e9e248bbe1695479fc361a14ae94f9d4e6133f3af6b2bf9dac369cff"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2601.21484",
    "kind": "arxiv",
    "version": 3
  }
}