pith:KL2TUAL7
Entropy Aware Reward Guidance for Diffusion Language Model Alignment
EntRGi uses predictive entropy to interpolate between continuous relaxations and hard tokens, enabling reward guidance for discrete diffusion language models.
arxiv:2602.05000 v2 · 2026-02-04 · cs.LG · cs.AI · cs.CL
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{KL2TUAL7MQORU3LTIQHPSU7GKN}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
We introduce a novel mechanism called EntRGi (Entropy aware Reward Guidance) to address this issue. EntRGi dynamically interpolates between continuous token relaxations and sampled hard tokens, on a token-by-token basis, using the diffusion model's predictive entropy. We demonstrate that EntRGi maintains both reward model reliability and optimization accuracy, while existing approaches sacrifice one for the other.
That the entropy threshold and interpolation schedule can be chosen so the method simultaneously preserves reward model reliability and optimization accuracy without introducing new biases or instability in the discrete sampling process.
EntRGi uses predictive entropy to dynamically switch between relaxed and hard tokens for reward guidance in discrete diffusion LMs, yielding consistent gains over prior methods in adaptation and RGRL post-training.
Formal links
Receipt and verification
| First computed | 2026-05-18T03:09:23.885118Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
52f53a017f641d1a6d73440ef953e65344d4522a3dfb2baf1bd39d4f26451846
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/KL2TUAL7MQORU3LTIQHPSU7GKN \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 52f53a017f641d1a6d73440ef953e65344d4522a3dfb2baf1bd39d4f26451846
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "db6b7b6772cf9aef1b71b9f426c1ebd3c57562aeb90da44aacb6dc94a5dd5fe2",
"cross_cats_sorted": [
"cs.AI",
"cs.CL"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.LG",
"submitted_at": "2026-02-04T19:37:14Z",
"title_canon_sha256": "5851463b43d9893bc27c49b78d4a6e2bba924a915535379ef76ace1f7ad43e3b"
},
"schema_version": "1.0",
"source": {
"id": "2602.05000",
"kind": "arxiv",
"version": 2
}
}