pith:F4JIFFKU
Differences in Text Generated by Diffusion and Autoregressive Language Models
Diffusion language models generate text with higher semantic coherence and diversity than autoregressive models due to bidirectional context in training, while lower entropy stems from their decoding algorithms.
arxiv:2605.12522 v1 · 2026-04-04 · cs.CL · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{F4JIFFKUB327TU2XITLUJGKXLE}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Results suggest that the DLM training objective contributes to the increases in semantic coherence and semantic diversity, but has a minor influence on entropy. These differences are primarily driven by the bidirectional context; the reduction in entropy stems from DLMs' decoding algorithms, particularly confidence-based remasking strategies.
That the controlled experiments can cleanly decouple training-objective effects from decoding-algorithm effects without confounding factors from implementation choices or data selection.
DLMs exhibit lower n-gram entropy, higher semantic coherence, and higher semantic diversity than ARMs, primarily due to bidirectional context and remasking decoding strategies.
References
Formal links
Receipt and verification
| First computed | 2026-05-18T03:10:02.854950Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
2f128295540ef5f9d35744d7449957592839c7d431bff461bb36a16f9aa02b56
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/F4JIFFKUB327TU2XITLUJGKXLE \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 2f128295540ef5f9d35744d7449957592839c7d431bff461bb36a16f9aa02b56
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "6aa56592c21a371401bacc3f41678d5bfb82eb89e857097159b95715d7521f4b",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by-nc-sa/4.0/",
"primary_cat": "cs.CL",
"submitted_at": "2026-04-04T17:30:35Z",
"title_canon_sha256": "d214cd220beeda91852a4ea93e3a7195d797345c2a044b3aa19e0ff6c26949ac"
},
"schema_version": "1.0",
"source": {
"id": "2605.12522",
"kind": "arxiv",
"version": 1
}
}