pith. sign in
Pith Number

pith:GV7LXWOK

pith:2025:GV7LXWOKGBGVM7HK5NHM7HGVSB
not attested not anchored not stored refs resolved

d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models

Aiwei Liu, Bolin Ding, Leyi Pan, Liancheng Fang, Lijie Wen, Lingzhe Zhang, Minghua He, Shuchang Tao, Yunpeng Zhai, Zhaoyang Liu, Zheyu Fu

Tree-structured rollouts with verifiable rewards and scheduled self-distillation deliver reliable step-wise advantages for diffusion language models.

arxiv:2512.09675 v3 · 2025-12-10 · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{GV7LXWOKGBGVM7HK5NHM7HGVSB}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experiments demonstrate that d-TreeRPO outperforms existing baselines and achieves significant improvements across multiple reasoning benchmarks. Specifically, it achieves +86.2% on Sudoku, +51.6% on Countdown, +4.5% on GSM8K, and +5.3% on Math500 compared to the base model.

C2weakest assumption

The assumption that tree-structured rollouts based on verifiable outcome rewards can be computed efficiently while still providing unbiased fine-grained step-wise advantage estimates that generalize beyond the sampled trees.

C3one line summary

d-TreeRPO uses tree rollouts for fine-grained verifiable rewards and time-scheduled self-distillation to reduce probability estimation gaps in diffusion LLMs, delivering substantial gains on Sudoku, Countdown, GSM8K, and Math500 benchmarks.

References

18 extracted · 18 resolved · 3 Pith anchors

[1] Training Verifiers to Solve Math Word Problems 2025 · arXiv:2110.14168
[2] Let's Verify Step by Step 2025 · arXiv:2305.20050
[3] Scaling up masked diffusion models on text 2025
[4] arXiv preprint arXiv:2510.08554 , year= 2017
[5] Dream 7B: Diffusion Large Language Models 2023 · arXiv:2508.15487

Cited by

7 papers in Pith

Receipt and verification
First computed 2026-05-18T03:09:32.809039Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

357ebbd9ca304d567ceaeb4ecf9cd5904104ecc1e84d6cea35936d42def7cb39

Aliases

arxiv: 2512.09675 · arxiv_version: 2512.09675v3 · doi: 10.48550/arxiv.2512.09675 · pith_short_12: GV7LXWOKGBGV · pith_short_16: GV7LXWOKGBGVM7HK · pith_short_8: GV7LXWOK
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/GV7LXWOKGBGVM7HK5NHM7HGVSB \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 357ebbd9ca304d567ceaeb4ecf9cd5904104ecc1e84d6cea35936d42def7cb39
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "c4bd192b984bf8a15d602a98877b8dd1a914d13181161bed1b9dd5c6aa836b7e",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-12-10T14:20:07Z",
    "title_canon_sha256": "5673ae3477bc9076392b812683e6a943c3cb2cabb2cd24fcbcff524a05dd645d"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2512.09675",
    "kind": "arxiv",
    "version": 3
  }
}