Pith Number

pith:WMJWGC5E

pith:2026:WMJWGC5EG4WOZOKDBKSPTP2CSM

not attested not anchored not stored refs resolved

Understanding and Accelerating the Training of Masked Diffusion Language Models

Chieh-Hsin Lai, Chunsan Hong, Jong Chul Ye, Sanghyun Lee, Satoshi Hayakawa, Seungryong Kim, Yuhta Takida, Yuki Mitsufuji

Bell-shaped time sampling accelerates masked diffusion language models to target performance up to four times faster.

arxiv:2605.13026 v1 · 2026-05-13 · cs.LG · cs.AI · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{WMJWGC5EG4WOZOKDBKSPTP2CSM}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

MDMs trained with our training recipe reach the same validation negative log-likelihood (NLL) up to ∼4× faster than standard training on One Billion Word Benchmark (LM1B). We also show faster improvements in generative perplexity, zero-shot perplexity, and downstream task performance on various benchmarks.

C2weakest assumption

The locality bias of language is the dominant cause of slow MDM training, and bell-shaped time sampling directly mitigates it without introducing new optimization pathologies or degrading final performance.

C3one line summary

Bell-shaped time sampling accelerates masked diffusion language model training by roughly 4x on LM1B by countering locality bias in language data.

References

83 extracted · 83 resolved · 9 Pith anchors

[1] Chiu, Zhihan Yang, Zhixuan Qi, Jiaqi Han, Subham Sekhar Sahoo, and V olodymyr Kuleshov 2025

[2] Struc- tured denoising diffusion models in discrete state-spaces 2021

[3] LLaDA2.0: Scaling Up Diffusion Language Models to 100B 2025 · arXiv:2512.15745

[4] Piqa: Reasoning about phys- ical commonsense in natural language 2020

[5] One billion word benchmark for measuring progress in statistical language modeling 2014 · doi:10.21437/interspeech.2014-564

Receipt and verification

First computed	2026-05-18T03:08:59.893399Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

b313630ba4372cecb9430aa4f9bf4293345e7f91f5dd1af0efafc01bcda15d7a

Aliases

arxiv: 2605.13026 · arxiv_version: 2605.13026v1 · doi: 10.48550/arxiv.2605.13026 · pith_short_12: WMJWGC5EG4WO · pith_short_16: WMJWGC5EG4WOZOKD · pith_short_8: WMJWGC5E

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/WMJWGC5EG4WOZOKDBKSPTP2CSM \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b313630ba4372cecb9430aa4f9bf4293345e7f91f5dd1af0efafc01bcda15d7a

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "4c06a8085fff6c80d14c8d29003b3d7560e6e52b0b270b4fda82145441072e5d",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-13T05:29:38Z",
    "title_canon_sha256": "3d0550aa8e6ce25dee32e15e7b0b08246d96347d234a7cb6a8f121b878032dd5"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13026",
    "kind": "arxiv",
    "version": 1
  }
}