Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules

Amr Mohamed, Yang Zhang, Michalis Vazirgiannis, Guokan Shang · 2025 · arXiv 2512.02892

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM

cs.CL · 2026-05-10 · unverdicted · novelty 7.0

TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.

Multi-Token Residual Prediction

cs.LG · 2026-05-12 · unverdicted · novelty 5.0 · 2 refs

MRP predicts logit residuals between adjacent denoising steps in DLMs from backbone hidden states to support efficient multi-token denoising, yielding up to 1.4x lossless speedup or 22.6-point accuracy gains on code and math tasks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM cs.CL · 2026-05-10 · unverdicted · none · ref 13
TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.
Multi-Token Residual Prediction cs.LG · 2026-05-12 · unverdicted · none · ref 25 · 2 links
MRP predicts logit residuals between adjacent denoising steps in DLMs from backbone hidden states to support efficient multi-token denoising, yielding up to 1.4x lossless speedup or 22.6-point accuracy gains on code and math tasks.

Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer