pith. sign in
Pith Number

pith:WMJWGC5E

pith:2026:WMJWGC5EG4WOZOKDBKSPTP2CSM
not attested not anchored not stored refs resolved

Understanding and Accelerating the Training of Masked Diffusion Language Models

Chieh-Hsin Lai, Chunsan Hong, Jong Chul Ye, Sanghyun Lee, Satoshi Hayakawa, Seungryong Kim, Yuhta Takida, Yuki Mitsufuji

Bell-shaped time sampling accelerates masked diffusion language models to target performance up to four times faster.

arxiv:2605.13026 v1 · 2026-05-13 · cs.LG · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{WMJWGC5EG4WOZOKDBKSPTP2CSM}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

MDMs trained with our training recipe reach the same validation negative log-likelihood (NLL) up to ∼4× faster than standard training on One Billion Word Benchmark (LM1B). We also show faster improvements in generative perplexity, zero-shot perplexity, and downstream task performance on various benchmarks.

C2weakest assumption

The locality bias of language is the dominant cause of slow MDM training, and bell-shaped time sampling directly mitigates it without introducing new optimization pathologies or degrading final performance.

C3one line summary

Bell-shaped time sampling accelerates masked diffusion language model training by roughly 4x on LM1B by countering locality bias in language data.

References

83 extracted · 83 resolved · 9 Pith anchors

[1] Chiu, Zhihan Yang, Zhixuan Qi, Jiaqi Han, Subham Sekhar Sahoo, and V olodymyr Kuleshov 2025
[2] Struc- tured denoising diffusion models in discrete state-spaces 2021
[3] LLaDA2.0: Scaling Up Diffusion Language Models to 100B 2025 · arXiv:2512.15745
[4] Piqa: Reasoning about phys- ical commonsense in natural language 2020
[5] One billion word benchmark for measuring progress in statistical language modeling 2014 · doi:10.21437/interspeech.2014-564
Receipt and verification
First computed 2026-05-18T03:08:59.893399Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

b313630ba4372cecb9430aa4f9bf4293345e7f91f5dd1af0efafc01bcda15d7a

Aliases

arxiv: 2605.13026 · arxiv_version: 2605.13026v1 · doi: 10.48550/arxiv.2605.13026 · pith_short_12: WMJWGC5EG4WO · pith_short_16: WMJWGC5EG4WOZOKD · pith_short_8: WMJWGC5E
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/WMJWGC5EG4WOZOKDBKSPTP2CSM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b313630ba4372cecb9430aa4f9bf4293345e7f91f5dd1af0efafc01bcda15d7a
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "4c06a8085fff6c80d14c8d29003b3d7560e6e52b0b270b4fda82145441072e5d",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-13T05:29:38Z",
    "title_canon_sha256": "3d0550aa8e6ce25dee32e15e7b0b08246d96347d234a7cb6a8f121b878032dd5"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13026",
    "kind": "arxiv",
    "version": 1
  }
}