Pith Number

pith:2DDT7GQN

pith:2026:2DDT7GQNFROWPUW73U6X7Y5INL

not attested not anchored not stored refs resolved

Beyond Mode-Seeking RL: Trajectory-Balance Post-Training for Diffusion Language Models

Prasanna Parthasarathi, Saba Ahmadi, Yufei Cui

A trajectory-balance objective stops diffusion language models from locking onto narrow denoising paths during post-training.

arxiv:2605.13935 v1 · 2026-05-13 · cs.LG · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{2DDT7GQNFROWPUW73U6X7Y5INL}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

TraFL is the only evaluated post-training method that improves over the base model in every benchmark-length setting, with gains that persist as the sampling budget increases.

C2weakest assumption

The diffusion-compatible sequence-level surrogate and learned prompt-dependent normalization faithfully approximate the trajectory-balance objective without introducing new collapse modes or requiring task-specific tuning.

C3one line summary

TraFL applies trajectory flow balancing to post-train diffusion language models, preventing mode collapse and delivering consistent gains on reasoning tasks that hold under increased sampling.

References

32 extracted · 32 resolved · 7 Pith anchors

[1] Natural gradient works efficiently in learning.Neural computation, 10(2):251– 276, 1998 1998

[2] Program Synthesis with Large Language Models 2021 · arXiv:2108.07732

[3] Mirror descent and nonlinear projected subgradient methods for convex optimization.Operations Research Letters, 31(3):167–175 2003

[4] Hu, Mo Tiwari, and Emmanuel Bengio 2023

[5] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Mich 2021

Formal links

2 machine-checked theorem links

Receipt and verification

First computed	2026-05-17T23:39:13.930271Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

d0c73f9a0d2c5d67d2dfdd3d7fe3a86af9d4b1b9f5f46ece921496f831f4370f

Aliases

arxiv: 2605.13935 · arxiv_version: 2605.13935v1 · doi: 10.48550/arxiv.2605.13935 · pith_short_12: 2DDT7GQNFROW · pith_short_16: 2DDT7GQNFROWPUW7 · pith_short_8: 2DDT7GQN

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/2DDT7GQNFROWPUW73U6X7Y5INL \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: d0c73f9a0d2c5d67d2dfdd3d7fe3a86af9d4b1b9f5f46ece921496f831f4370f

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "62d9efd6af57211c0680900539c73ebdea02a5deb0ebf6f19c97d75a22120c32",
    "cross_cats_sorted": [
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.LG",
    "submitted_at": "2026-05-13T16:14:46Z",
    "title_canon_sha256": "0845ff7196936e93f23f3c82f8bbcc355bf9f08909a3858c46573b9b7ca8c249"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13935",
    "kind": "arxiv",
    "version": 1
  }
}