pith:DURSVGYQ
Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training
A closed-form Learning-Zone Energy score selects prompts aligned with large policy gradient updates for efficient LLM RL post-training.
arxiv:2605.17003 v1 · 2026-05-16 · cs.LG · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{DURSVGYQFNESH3UGIEJ3NBYVEX}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
A closed-form Learning-Zone Energy Score that fuses an initial-difficulty anchor, a normalized outcome-uncertainty term, and a pass-rate momentum into a single scalar that is provably aligned with the expected magnitude of group-relative policy gradient updates.
The forward pruner with replay maintains training stability and does not introduce harmful distributional shift or undetected forgetting, even while permanently skipping rollout generation for persistently solved prompts (as described in the framework section of the abstract).
LZE is an online data selection method for RL post-training that fuses difficulty, uncertainty, and momentum signals into a closed-form score aligned with policy gradient magnitude, retaining 40% of data while matching or exceeding full-data baselines on math tasks with 36% fewer FLOPs.
References
Receipt and verification
| First computed | 2026-05-20T00:03:35.504868Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
1d232a9b102b4923ee864113b6871525e22debc7d25b111a44127be0bfd54e28
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/DURSVGYQFNESH3UGIEJ3NBYVEX \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 1d232a9b102b4923ee864113b6871525e22debc7d25b111a44127be0bfd54e28
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "1911ce215d4d6d854cfa5b80d1b0ff7a29068dbc8f9081e7df3daf2d84478c7f",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-05-16T14:01:12Z",
"title_canon_sha256": "c2e93a4e65f78ce28ae390ea3dd8dff3c9f8d6403b9e3cd714e3addd54e1401b"
},
"schema_version": "1.0",
"source": {
"id": "2605.17003",
"kind": "arxiv",
"version": 1
}
}