pith. sign in
Pith Number

pith:DURSVGYQ

pith:2026:DURSVGYQFNESH3UGIEJ3NBYVEX
not attested not anchored not stored refs resolved

Learning-Zone Energy: Online Data Selection for Efficient RL Post-Training

Boyao Yang, Jun Zhu, Peng Cui

A closed-form Learning-Zone Energy score selects prompts aligned with large policy gradient updates for efficient LLM RL post-training.

arxiv:2605.17003 v1 · 2026-05-16 · cs.LG · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{DURSVGYQFNESH3UGIEJ3NBYVEX}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

A closed-form Learning-Zone Energy Score that fuses an initial-difficulty anchor, a normalized outcome-uncertainty term, and a pass-rate momentum into a single scalar that is provably aligned with the expected magnitude of group-relative policy gradient updates.

C2weakest assumption

The forward pruner with replay maintains training stability and does not introduce harmful distributional shift or undetected forgetting, even while permanently skipping rollout generation for persistently solved prompts (as described in the framework section of the abstract).

C3one line summary

LZE is an online data selection method for RL post-training that fuses difficulty, uncertainty, and momentum signals into a closed-form score aligned with policy gradient magnitude, retaining 40% of data while matching or exceeding full-data baselines on math tasks with 36% fewer FLOPs.

References

50 extracted · 50 resolved · 1 Pith anchors

[1] SemDeDup: Data-efficient learning at web-scale through semantic deduplication 2023 · arXiv:2303.09540
[2] Deepseek-r1 incentivizes reasoning in llms through reinforcement learning 2025 · doi:10.1038/s41586-025-094
[3] NAACL-LONG.102 2024 · doi:10.18653/v1/2024
[4] In Proceedings of the 26th Annual International Conference on Machine Learning (Montreal, Quebec, Canada) (ICML ’09) 2009 · doi:10.1145/1553374.1553380
[5] Language models are few-shot learners 1901
Receipt and verification
First computed 2026-05-20T00:03:35.504868Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

1d232a9b102b4923ee864113b6871525e22debc7d25b111a44127be0bfd54e28

Aliases

arxiv: 2605.17003 · arxiv_version: 2605.17003v1 · doi: 10.48550/arxiv.2605.17003 · pith_short_12: DURSVGYQFNES · pith_short_16: DURSVGYQFNESH3UG · pith_short_8: DURSVGYQ
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/DURSVGYQFNESH3UGIEJ3NBYVEX \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 1d232a9b102b4923ee864113b6871525e22debc7d25b111a44127be0bfd54e28
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "1911ce215d4d6d854cfa5b80d1b0ff7a29068dbc8f9081e7df3daf2d84478c7f",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-16T14:01:12Z",
    "title_canon_sha256": "c2e93a4e65f78ce28ae390ea3dd8dff3c9f8d6403b9e3cd714e3addd54e1401b"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.17003",
    "kind": "arxiv",
    "version": 1
  }
}