Pith Number

pith:KL2TUAL7

pith:2026:KL2TUAL7MQORU3LTIQHPSU7GKN

not attested not anchored not stored refs pending

Entropy Aware Reward Guidance for Diffusion Language Model Alignment

Atula Tejaswi, Constantine Caramanis, Litu Rout, Sanjay Shakkottai, Sujay Sanghavi

EntRGi uses predictive entropy to interpolate between continuous relaxations and hard tokens, enabling reward guidance for discrete diffusion language models.

arxiv:2602.05000 v2 · 2026-02-04 · cs.LG · cs.AI · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{KL2TUAL7MQORU3LTIQHPSU7GKN}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We introduce a novel mechanism called EntRGi (Entropy aware Reward Guidance) to address this issue. EntRGi dynamically interpolates between continuous token relaxations and sampled hard tokens, on a token-by-token basis, using the diffusion model's predictive entropy. We demonstrate that EntRGi maintains both reward model reliability and optimization accuracy, while existing approaches sacrifice one for the other.

C2weakest assumption

That the entropy threshold and interpolation schedule can be chosen so the method simultaneously preserves reward model reliability and optimization accuracy without introducing new biases or instability in the discrete sampling process.

C3one line summary

EntRGi uses predictive entropy to dynamically switch between relaxed and hard tokens for reward guidance in discrete diffusion LMs, yielding consistent gains over prior methods in adaptation and RGRL post-training.

Formal links

1 machine-checked theorem link

Receipt and verification

First computed	2026-05-18T03:09:23.885118Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

52f53a017f641d1a6d73440ef953e65344d4522a3dfb2baf1bd39d4f26451846

Aliases

arxiv: 2602.05000 · arxiv_version: 2602.05000v2 · doi: 10.48550/arxiv.2602.05000 · pith_short_12: KL2TUAL7MQOR · pith_short_16: KL2TUAL7MQORU3LT · pith_short_8: KL2TUAL7

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/KL2TUAL7MQORU3LTIQHPSU7GKN \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 52f53a017f641d1a6d73440ef953e65344d4522a3dfb2baf1bd39d4f26451846

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "db6b7b6772cf9aef1b71b9f426c1ebd3c57562aeb90da44aacb6dc94a5dd5fe2",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-02-04T19:37:14Z",
    "title_canon_sha256": "5851463b43d9893bc27c49b78d4a6e2bba924a915535379ef76ace1f7ad43e3b"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2602.05000",
    "kind": "arxiv",
    "version": 2
  }
}