pith. sign in
Pith Number

pith:DAATHSCB

pith:2025:DAATHSCBPPTSQC6F3DVN7HDE2A
not attested not anchored not stored refs pending

Reasoning with Exploration: An Entropy Perspective

Bo Dai, Daixuan Cheng, Furu Wei, Shaohan Huang, Wayne Xin Zhao, Xuekai Zhu, Zhenliang Zhang

Augmenting the RL advantage function with an entropy term improves LLM reasoning on Pass@K by encouraging longer exploratory chains.

arxiv:2506.14758 v4 · 2025-06-17 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{DAATHSCBPPTSQC6F3DVN7HDE2A}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

our method achieves significant gains on the Pass@K metric -- an upper-bound estimator of LLM reasoning capabilities -- even when evaluated with extremely large K values, pushing the boundaries of LLM reasoning.

C2weakest assumption

The observed positive correlations between high-entropy regions and beneficial exploratory actions (pivotal tokens, reflection, rare behaviors) will translate into improved downstream reasoning performance when the entropy term is added to the advantage function.

C3one line summary

Augmenting the RL advantage with an entropy term promotes deeper LLM reasoning chains and raises Pass@K scores.

Formal links

2 machine-checked theorem links

Cited by

37 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:48.849568Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

180133c8417be7280bc5d8eadf9c64d018a7830496441949e808ed3313acc502

Aliases

arxiv: 2506.14758 · arxiv_version: 2506.14758v4 · doi: 10.48550/arxiv.2506.14758 · pith_short_12: DAATHSCBPPTS · pith_short_16: DAATHSCBPPTSQC6F · pith_short_8: DAATHSCB
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/DAATHSCBPPTSQC6F3DVN7HDE2A \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 180133c8417be7280bc5d8eadf9c64d018a7830496441949e808ed3313acc502
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "5bec794faeb56b17b0c7c956ea9d005937306fb3a32dab0f0ba478a155f70bf3",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-06-17T17:54:03Z",
    "title_canon_sha256": "034df4168332dcc08d8cea9f107cc98a5b3c3e9ff5e87576f78fdb2d99b5faf0"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2506.14758",
    "kind": "arxiv",
    "version": 4
  }
}