T3d: Few-step diffusion lan- guage models via trajectory self-distillation with direct discriminative optimization

Tunyu Zhang, Xinxi Zhang, Ligong Han, Haizhou Shi, Xiaoxiao He, Zhuowei Li, Hao Wang, Kai Xu, Akash Srivastava, Vladimir Pavlovic, et al · 2026 · arXiv 2602.12262

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 2 baseline 1 dataset 1

citation-polarity summary

background 2 baseline 1 use dataset 1

representative citing papers

Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models

cs.CL · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

TABOM is a trajectory-aligned Boltzmann modeling framework that turns self-distilled inference paths into a pairwise ranking loss to close the training-inference gap in diffusion language models and expand their effective capabilities.

TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.

Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Energy-navigated trajectory shaping during training produces 8-step discrete flow matching students that achieve 32% lower perplexity than 1024-step teachers on 170M language models with unchanged inference cost.

DMax: Aggressive Parallel Decoding for dLLMs

cs.LG · 2026-04-09 · conditional · novelty 7.0 · 2 refs

DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.

Coupling Models for One-Step Discrete Generation

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Coupling Models enable single-step discrete sequence generation via learned couplings to Gaussian latents and outperform prior one-step baselines on text perplexity, biological FBD, and image FID metrics.

Drift Flow Matching

cs.LG · 2026-05-17 · unverdicted · novelty 5.0

Drift Flow Matching connects direct transport maps from Drift Models with flow-based iterative refinement to enable adaptive computation in generative modeling.

ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion

cs.LG · 2026-04-10 · unverdicted · novelty 5.0 · 2 refs

ECHO introduces one-step block diffusion via Direct Conditional Distillation and Response-Asymmetric Diffusion to generate chest X-ray reports faster than autoregressive models while improving clinical metrics.

citing papers explorer

Showing 7 of 7 citing papers.

Self-Distilled Trajectory-Aware Boltzmann Modeling: Bridging the Training-Inference Discrepancy in Diffusion Language Models cs.CL · 2026-05-12 · unverdicted · none · ref 14 · 2 links
TABOM is a trajectory-aligned Boltzmann modeling framework that turns self-distilled inference paths into a pairwise ranking loss to close the training-inference gap in diffusion language models and expand their effective capabilities.
TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM cs.CL · 2026-05-10 · unverdicted · none · ref 14
TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.
Trajectory as the Teacher: Few-Step Discrete Flow Matching via Energy-Navigated Distillation cs.LG · 2026-05-08 · unverdicted · none · ref 86
Energy-navigated trajectory shaping during training produces 8-step discrete flow matching students that achieve 32% lower perplexity than 1024-step teachers on 170M language models with unchanged inference cost.
DMax: Aggressive Parallel Decoding for dLLMs cs.LG · 2026-04-09 · conditional · none · ref 101 · 2 links
DMax uses On-Policy Uniform Training and Soft Parallel Decoding to enable aggressive parallelism in dLLMs, raising TPF on GSM8K from 2.04 to 5.47 and on MBPP from 2.71 to 5.86 while preserving accuracy.
Coupling Models for One-Step Discrete Generation cs.LG · 2026-05-08 · unverdicted · none · ref 46
Coupling Models enable single-step discrete sequence generation via learned couplings to Gaussian latents and outperform prior one-step baselines on text perplexity, biological FBD, and image FID metrics.
Drift Flow Matching cs.LG · 2026-05-17 · unverdicted · none · ref 32
Drift Flow Matching connects direct transport maps from Drift Models with flow-based iterative refinement to enable adaptive computation in generative modeling.
ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion cs.LG · 2026-04-10 · unverdicted · none · ref 60 · 2 links
ECHO introduces one-step block diffusion via Direct Conditional Distillation and Response-Asymmetric Diffusion to generate chest X-ray reports faster than autoregressive models while improving clinical metrics.

T3d: Few-step diffusion lan- guage models via trajectory self-distillation with direct discriminative optimization

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer