Pith Number

pith:GV7LXWOK

pith:2025:GV7LXWOKGBGVM7HK5NHM7HGVSB

not attested not anchored not stored refs resolved

d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models

Aiwei Liu, Bolin Ding, Leyi Pan, Liancheng Fang, Lijie Wen, Lingzhe Zhang, Minghua He, Shuchang Tao, Yunpeng Zhai, Zhaoyang Liu, Zheyu Fu

Tree-structured rollouts with verifiable rewards and scheduled self-distillation deliver reliable step-wise advantages for diffusion language models.

arxiv:2512.09675 v3 · 2025-12-10 · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{GV7LXWOKGBGVM7HK5NHM7HGVSB}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Experiments demonstrate that d-TreeRPO outperforms existing baselines and achieves significant improvements across multiple reasoning benchmarks. Specifically, it achieves +86.2% on Sudoku, +51.6% on Countdown, +4.5% on GSM8K, and +5.3% on Math500 compared to the base model.

C2weakest assumption

The assumption that tree-structured rollouts based on verifiable outcome rewards can be computed efficiently while still providing unbiased fine-grained step-wise advantage estimates that generalize beyond the sampled trees.

C3one line summary

d-TreeRPO uses tree rollouts for fine-grained verifiable rewards and time-scheduled self-distillation to reduce probability estimation gaps in diffusion LLMs, delivering substantial gains on Sudoku, Countdown, GSM8K, and Math500 benchmarks.

References

18 extracted · 18 resolved · 3 Pith anchors

[1] Training Verifiers to Solve Math Word Problems 2025 · arXiv:2110.14168

[2] Let's Verify Step by Step 2025 · arXiv:2305.20050

[3] Scaling up masked diffusion models on text 2025

[4] arXiv preprint arXiv:2510.08554 , year= 2017

[5] Dream 7B: Diffusion Large Language Models 2023 · arXiv:2508.15487

Cited by

7 papers in Pith

Sketch Then Paint: Hierarchical Reinforcement Learning for Diffusion Multi-Modal Large Language Models

DMax: Aggressive Parallel Decoding for dLLMs

From Feedback Loops to Policy Updates: Reinforcement Fine-Tuning for LLM-Based Alpha Factor Discovery

Relative Score Policy Optimization for Diffusion Language Models

E2E-REME: Towards End-to-End Microservices Auto-Remediation via Experience-Simulation Reinforcement Fine-Tuning

Receipt and verification

First computed	2026-05-18T03:09:32.809039Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

357ebbd9ca304d567ceaeb4ecf9cd5904104ecc1e84d6cea35936d42def7cb39

Aliases

arxiv: 2512.09675 · arxiv_version: 2512.09675v3 · doi: 10.48550/arxiv.2512.09675 · pith_short_12: GV7LXWOKGBGV · pith_short_16: GV7LXWOKGBGVM7HK · pith_short_8: GV7LXWOK

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/GV7LXWOKGBGVM7HK5NHM7HGVSB \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 357ebbd9ca304d567ceaeb4ecf9cd5904104ecc1e84d6cea35936d42def7cb39

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "c4bd192b984bf8a15d602a98877b8dd1a914d13181161bed1b9dd5c6aa836b7e",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CL",
    "submitted_at": "2025-12-10T14:20:07Z",
    "title_canon_sha256": "5673ae3477bc9076392b812683e6a943c3cb2cabb2cd24fcbcff524a05dd645d"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2512.09675",
    "kind": "arxiv",
    "version": 3
  }
}