pith. sign in
Pith Number

pith:KL2TUAL7

pith:2026:KL2TUAL7MQORU3LTIQHPSU7GKN
not attested not anchored not stored refs pending

Entropy Aware Reward Guidance for Diffusion Language Model Alignment

Atula Tejaswi, Constantine Caramanis, Litu Rout, Sanjay Shakkottai, Sujay Sanghavi

EntRGi uses predictive entropy to interpolate between continuous relaxations and hard tokens, enabling reward guidance for discrete diffusion language models.

arxiv:2602.05000 v2 · 2026-02-04 · cs.LG · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{KL2TUAL7MQORU3LTIQHPSU7GKN}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We introduce a novel mechanism called EntRGi (Entropy aware Reward Guidance) to address this issue. EntRGi dynamically interpolates between continuous token relaxations and sampled hard tokens, on a token-by-token basis, using the diffusion model's predictive entropy. We demonstrate that EntRGi maintains both reward model reliability and optimization accuracy, while existing approaches sacrifice one for the other.

C2weakest assumption

That the entropy threshold and interpolation schedule can be chosen so the method simultaneously preserves reward model reliability and optimization accuracy without introducing new biases or instability in the discrete sampling process.

C3one line summary

EntRGi uses predictive entropy to dynamically switch between relaxed and hard tokens for reward guidance in discrete diffusion LMs, yielding consistent gains over prior methods in adaptation and RGRL post-training.

Formal links

1 machine-checked theorem link

Receipt and verification
First computed 2026-05-18T03:09:23.885118Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

52f53a017f641d1a6d73440ef953e65344d4522a3dfb2baf1bd39d4f26451846

Aliases

arxiv: 2602.05000 · arxiv_version: 2602.05000v2 · doi: 10.48550/arxiv.2602.05000 · pith_short_12: KL2TUAL7MQOR · pith_short_16: KL2TUAL7MQORU3LT · pith_short_8: KL2TUAL7
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/KL2TUAL7MQORU3LTIQHPSU7GKN \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 52f53a017f641d1a6d73440ef953e65344d4522a3dfb2baf1bd39d4f26451846
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "db6b7b6772cf9aef1b71b9f426c1ebd3c57562aeb90da44aacb6dc94a5dd5fe2",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-02-04T19:37:14Z",
    "title_canon_sha256": "5851463b43d9893bc27c49b78d4a6e2bba924a915535379ef76ace1f7ad43e3b"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2602.05000",
    "kind": "arxiv",
    "version": 2
  }
}