TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.
Fast-Decoding Diffusion Language Models via Progress-Aware Confidence Schedules
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
MRP predicts logit residuals between adjacent denoising steps in DLMs from backbone hidden states to support efficient multi-token denoising, yielding up to 1.4x lossless speedup or 22.6-point accuracy gains on code and math tasks.
citing papers explorer
-
TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM
TAD improves the accuracy-parallelism trade-off in diffusion LLMs via temporal-aware self-distillation that applies hard labels to soon-to-be-decoded tokens and soft supervision to future tokens.
-
Multi-Token Residual Prediction
MRP predicts logit residuals between adjacent denoising steps in DLMs from backbone hidden states to support efficient multi-token denoising, yielding up to 1.4x lossless speedup or 22.6-point accuracy gains on code and math tasks.